There are two camps with regards to audio quality:
A) The original is perfect, and all processing is bad processing
B) Good sound is the goal, and sometimes processing can help
[edit -- after thinking more, there's also a C) Processing can be good, but I want to do it in my external hardware because, well, it was expensive]
I'm in B. I don't believe the original engineer was magic, and my speakers and ears certainly aren't the same.
For this camp, 64-bit processing is wonderful.
If you're in A, 64-bit processing is irrelevant (unless you want volume).
There are still lots of other important features to consider like format support, gapless, user interface, audio output modes, stability, etc. I'll ignore these for now.
For the people that only want A, and think it's possible to have still better audio quality, I see two main claims:
1) Timing matters
The best interfaces (ASIO, WASAPI Event Style) work like this: they periodically call and ask for data. The _only_ timing that can matter is how fast you can fulfill this request. I believe Media Center has the most efficient buffer fill that can be implemented on a computer (a no-lock circle buffer with no additional processing). In other words, there's no room for improvement here.
2) Other processing (background threads in the player, OS background processes, etc.) hurts audio quality
This is a more slippery claim. I've heard machines where moving the mouse cursor was audible on the sound card, or where CPU or GPU usage caused playback hiccups (especially with USB interfaces). Hard drives can also cause audible noise on poorly shielded outputs.
However, with good hardware this simply doesn't happen. But since it does with bad hardware, it's hard to dismiss completely.
So how much of the CPU usage is from audio playback? On my machine, even doing 7.1 JRSS, Room Correction, Bass Management, and Parametric Equalization, audio playback never takes more than 1% of the CPU. With no processing for 2.0 output, it's at least twice as fast. Even if this were twice as fast again (JRiver has a well optimized audio chain, so this isn't realistic), why would 0.25% be better than 0.5% CPU?
It's possible we could push further down this road (a mode that blacks the screen, hides the cursor, disable system services, etc.), but it's also possible it's a rabbit hole that's best avoided.