It is for this reason that most decoding/encoding tests are done on slower hardware (either single-core or simultaneous dual-core) in order to max out each processor. If the decoding speed is faster, it follows that it must be more efficient as well.
Exactly... The tests are completely
artificial. It's like using the old Winbench suite, or modern games set to high resolution, to try to test modern processors. You just can't get anything usable out of them anymore (all games are GPU limited, not CPU limited, and productivity apps are limited much more by internal timers and user interaction than anything the CPU does). You are forced to artificially limit the test parameters in order to even glean
any information out of the test, which makes its relevance to practical applications shady at best. Audio decode time differences, even on old and slow hardware, are measured in milliseconds. In some applications, an extra handful of milliseconds matters. In audio decoding, it does not.
On any modern hardware, audio decode performance is completely irrelevant. Even for transcodes, as Jim mentioned above, the decode time is almost completely a non-factor. You are going to be FAR more limited by disk throughput and RAM access latencies than by any microscopic difference in the CPU instruction set from a decode routine. Heck, we're getting to the point where encode times are starting to become fairly irrelevant.
Also, if you look at those efficiency differences you listed above (compression size), most of the differences are measured in 10-20 kb out of a total of a few hundred thousand kb total. Take the "Carlo Siliotto - The Punisher Score" example: the difference between APE and TAK (better) is 0.03% improvement. When you can buy 3.5" 1TB hard drives for $80-90, and even for your laptop you can get 2.5" 500GB drives for $90, I hardly think a 0.03% compression improvement is relevant. Heck, 0.03% is probably even within the margin of error for the testing procedure! How did they get those results? Are the repeatable? Did they do a standard compliment of 5-10 tests and average the results, present the median, or something else? And does it even matter since the block size of your hard drive will probably mask most of the differences in final file size?
In other words... P-L-A-C-E-B-O. Just like reaction to Matt's license.