Let me offer an embarassing personal anecdote to explain my point of view about listening tests and the fallability of the ear:
Several years ago I built a pair of home-made bi-amped speakers. They're each the size of a large washing machine and they took me the better part of a year to build (more than a month of Sundays). Because they were entirely home-made and I was trying to do an active crossover from scratch, even after they were structurally complete, they still required quite a bit of tweaking to get the crossovers dialed in and the EQ set.
So I started by just dialing in the EQ that seemed to make sense based on the specifications of the drivers, and taking a couple of quick RTAs with pink noise. That sounded alright, and all of my friends dutifully told me how great it sounded. I kept getting headaches whenever listening to the speakers though, and the headaches would go away right after I turned them off. So I tried tweaking some frequencies, and I'd think I'd made some progress (it sounded better!), and everyone who heard it thought the new EQ sounded better. Eventually, I even started dutifully "blindly" A/Bing the new EQ with the original (I'd switch between them during playback without telling my guests what I was switching, which isn't blind at all), and my guests would invariably swear the new EQ sounded better. And I kept going down this "tuning by ear" method, often reversing previous decisions, backing and forthing and adding more and more convoluted filters.
The most embarassing moment (and something of a turning point) was when I was A/Bing a filter, and a friend and I were convinced we were on to something excellent. After ten minutes of this, we realized that the filter bank as a whole (PEQ2) was disabled
. I had been toggling the individual filter, but it wasn't actually even affecting playback. And we had been convinced we heard a difference. And the headaches never went away.
Eventually the headaches (and a growing skepticism) prompted me to stop screwing around and take some real logsweep measurements (which were then a relatively new thing for me), and I realized that there was apparently a huge (10+dB) semi-ultrasonic resonant peak at 18.5KHz that I couldn't even actually hear. So I fixed it. And then my headaches went away.
And then I took an agonizing look at the rest of the measurement and noticed that my "tuning by ear" which I (and my friends) all felt was clearly superior had turned the frequency response into a staggering sawtooth. So I systematically removed the EQ that was pushing things away from "flat," and kept the EQ that contributed to flatness. The result sounded so different, and so much more natural that I was embarassed to have wasted months screwing around trying to use my "golden ears" to tune my speakers. And my wife (who had been encouraging, but politely non-commital about my EQ adventure) came home and asked unprompted if I had done something different with the speakers, and said they sounded much better. And she was right; they did. In a few afternoons, I had done more to move things forward than I had in months of paddling around.
The point of this anecdote is not to try and "prove" that my measurement derived EQ "sounded better" than my ear-derived EQ or that a flat frequency response will sound best [as it happens, I ultimately prefered a frequency slope that isn't perfectly flat, but I couldn't even get that far by ear].
The point is that taking actual measurements had allowed me to:
1) Cure my ultrasonic frequency induced headaches;
2) Improve the fidelity of my system (in the literal sense of audio fidelity as "faithfulness to the source"); and
3) Ultimately find the EQ curve that I liked best.
My ears (and the inadvertantly biased ears of my friends) did not allow me to do any of those things, and in fact led me far astray on 2). My ears couldn't even really get me to 3) because I kept reversing myself and getting tangled up in incremental changes. My ears were not even reliably capable of detecting
no change if I thought there was a change to be heard.
Once I realized all this, it was still surprisingly hard to admit that I had been fooling myself and that I was so easily fooled! So I have sympathy for other people who don't want to believe that their own ears are equally unreliable, and I understand why folks get mad at any suggestion that their perception may be fallible. I've been accused by many indignant audiophiles of having a tin ear, and if I could only hear what they hear, then I'd be immediately persuaded. But my problem is not that I am unpersuaded that there's a difference: it's that I'm too easily persuaded! I'll concede, of course, that it's possible that I do have tin ears and other people's ears are more reliable than mine, but the literature concerning the placebo effect, expectation bias, and confirmation bias in scientific studies suggests that I'm not so very alone.
I've seen the exact same phenomenon played out with other people (often very bright people with very good ears) enough times that I find it embarassing to watch sighted listening tests of any kind because they are so rarely conducted in a way designed to produce any meaningful information and lead into dark serpentines of false information and conclusions.
---------------------------------------------------------------
So to bring things back around: if some bitperfect audio players have devised a way to improve their sound they have presumably done so through careful testing, in which case they should be able to provide measurements (whether distortion measurements on the output, digital loopback measurements, measurements of the data stream going to the DAC, or something) that validates that claim. If they claim that their output "sounds better" but does not actually measure better using current standards of measurement, they should be able to at least articulate a hypothetical test that would show their superiority. If they claim that the advantage isn't measurable, or that you should "just trust your ears" than they are either fooling themselves or you.
In a well-established field of engineering in which a great deal of research and development has been done, and in which there is a mature, thriving commercial market, one generally does not stumble blindly into mysterious gains in performance. Once upon a time you could discover penicillin by accident, or build an automobile engine at home. But you do not get to the moon, cure cancer, or improve a modern car's fuel efficiency by inexplicable accident. In an era where cheap-o motherboard DACs have better SNR's than the best studio equipment from 30 years ago, you don't improve audio performance by inexplicable accident either. If someone has engineered a "better than bit perfect" player they
should be able to prove it, as they likely did their own testing as part of the design process. If they can't rigorously explain
why (or haven't measured their own product), let them at least explain
what they have done in a way that is susceptible of proof and repetition. Otherwise what they are selling is not penicillin, it's patent medicine.
Bottom line: if you and a group of other people hear a difference, there may be a difference, but there may not be. Measurements are the way to find out if there is really a difference. Once you've actually established that there is a real, measurable difference, only then does it make sense to do a properly conducted listening test to determine if that difference is audible. Otherwise you're just eating random mold to find out if it will help your cough.
If you want to get to the bottom of it, I'd ask your host if he'd be willing to let you take measurements of the different players with a calibrated microphone, an SPL meter, and some test clips of your own choosing. He'll probably be game, and you'll likely be able to get some really useful information about whether the players sound different and, if so, why.
Or you could just relax and enjoy the music