I don't think there's any standard for analyzing multichannel files. This is what the Replay Gain standard says about stereo files:
http://replaygain.hydrogenaudio.org/rms_energy.htmlStereo files
The only difficulty lies in what to do with stereo files. We could sum them to mono before calculating the RMS energy, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how we perceive them, so it's not a good solution.
The alternative is to calculate two RMS values (once for each channel) and then add them. Unfortunately a Linear addition still doesn't give the same effect as our ears. To demonstrate this, consider a mono (single channel) audio track. We replay it over 1 loudspeaker, and remember how loud it sounds. If we now replay it over 2 loudspeakers, how large should the signal to each speaker be such that, overall, the sound is still as loud as before? You'd think the answer would be half as large (since we have two speakers - that's what a linear addition would suggest) but if you try it, you'll find that the answer is about 3/4.
We get the right answer if we add the means of the channel-signals before calculating the square root. In mixing pan-pot terms, we're using "equal power" rather than "equal voltage". If we also assume that any mono (single channel) signal will always be replayed over two loudspeakers, we can treat a mono signal as a pair of identical stereo signals. Hence a mono signal gives (a+a)/2 (i.e. a), while a stereo signal gives (a+b)/2, where a and b are the mean squared values for each channel. After this, we carry out the square root and conversion to dB.
My suggestion would be:
Analyze only the three front channels (L, R, C) and calculate:
(a+b+c)/3, where a, b and c are the mean squared values for each channel. Perhaps if the center channel is silent or significantly quieter than the left and right channels it should taken out of the equation. "Significantly" could be e.g. a difference of 12 dB or more.
If the file is "quadraphonic" and contains four main channels (4.0 or 4.1) analyze only the two front channels. Though I don't know if the file headers can actually inform about the correct channel mapping in this case.
EDIT
I am suggesting leaving the other channels out because the signal that comes from the side and back channels is normally very uneven and don't usually build the main sound field.
A small minority of audio mixes place the listener more or less in the center, but only a very odd mix would have the main audio source behind the listener. For instance, an exactly "centered" 5.1 mix would subjectively be a bit louder than what the Replay Gain Value indicates because in order to place the source in the center the two surround channels need to be slightly louder than the three front channels, but probably that would not be a practical problem.