Description of Replay Gain calculation given on
http://www.replaygain.org/ contains a number of errors and inconsistencies and statements made without any scientific proof whatsoever. Let’s analyze some of those.
1. Calculation of dB values.
The formula given on
http://replaygain.hydrogenaudio.org/rms_energy.html does not convert RMS value to dB. The given formula is:
10*log10(Vrms_all+10^-10) (1)
This is not how dB value is defined and therefore not how it should be calculated. The dB definition gives:
10*log10(Vrms_all/Vrms_ref) (2)
where Vrms_ref is RMS of reference signal used in dB calculation – a dB value without reference level is meaningless. Instead of using correct formula the author adds a small value to RMS value (10^-10) “in order to prevent calculation of log(0) which would give an error”.
Formula given by (2) is used when computing dB value given some known reference level as when the sound pressure (level) is measured by a microphone where the sound pressure at the threshold of hearing is used as reference. Normally this gives positive dB values.
When the sound is digitized the used reference level is instead the maximum RMS value that can be represented by the used quantization, giving normally negative dB values for other signal levels. This is because the “distance” from the lowest possible value will vary depending on the number of quantization bits and the resulting numbers would not be comparable. I.e. the maximum possible value is 0 dB and all others are negative. In this case Vrms_ref is given by:
Amax*sqrt(2) (3)
where Amax is the maximum amplitude possible for a particular quantization.
If MC uses formula (1) then it may result in wrong RG being calculated.
2. Representative RMS value
Next problem is the specified method of picking up a representative RMS value given on
http://replaygain.hydrogenaudio.org/statistical_process.html. The only information given is:
“How far down the sorted list should we look for a representative value? I tried values from 70% to 95%. For highly compressed pop music (e.g. the middle graph above, where there are many values near the top), the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level.”
The author tried a number of tracks and subjectively judged by him perceived loudness and this is supposed to be representative for billions of people and tracks? This is not how science or engineering is done.
No other statistical measurements such as median or percentile values or integral or power spectrum are even mentioned.
3. Reference (target) value.
The author writes:
“Having calculated a representative RMS energy value for the audio file, we now need to reference this to a real world sound pressure level.”
Do we? Why?
Then the author chooses 83 dB as the target value because this value is numerically equal to the value specified by movie industry as adequate listening level in a cinema.
First of all, as described above, in digitally stored audio you don’t measure dB up from the threshold of hearing but down from the maximum level possible at used quantization width. So the question is how much below this maximum is that 83 dB. The answer is that we do not know. The reason for this is that while 83 dB in a cinema represents a real and therefore an absolute value, the values stored in digital audio are only relative numbers that relate one sample to another. The relation between a binary number feed into D/A converter (in your soundcard) to the sound pressure produced by your loudspeakers (or by your headphones) connected either to the soundcard or an amplifier connected to the card is not known. It depends not only on actual (analog) gain of the amplifier but also on the effect of the amplifier, size, effectiveness, and load of the loudspeakers (headphones) and the environment in which these things including you are when listening.
Therefore the 83 dB cinema recommendation is simply not relevant here. And therefore the right approach is to target 100% or 0 dB. If you do that then you will probably regain some of the -10 dB that we (Dr. C., me and others) are complaining about.
4. On the site there is a number of other statements given without any proof whatsoever that, in my opinion, reflect author’s personal preferences rather than scientific and empirically verified truths. On other occasions the author tries to describe for centuries known statistical knowledge as his own discoveries, such as for example that you don’t add RMS values but the squares and then you compute the resulting RMS.
All this together doesn’t exactly increase my confidence in what the RG author proposes the rest of the world should do.
/Mikael