How would this behave with a scene setup like this:
Quiet Dialog, maybe going to 50% of the volume range, then a scene with massive action, explosions, all ranging up to nearly 100% of volume, then going back to the quiet dialog.
The massive action will turn down the volume for the rest of playback of that file. The majority of movies show their cards with regards to the volume levels they use reasonably early, but there will be exceptions. In these cases, the one-time volume adjustment is the trade-off. Many videos, and especially HDTV, will benefit from a volume boost and never be loud enough to turn the boost down.
Most volume normalizer i have tested will most likely increase the volume of the first dialog, then turn off the correction for the action scene, and keep the second dialog quiet (or in a even worse case, slowly increase its volume again) - causing a massive difference to the first dialog.
If the volume gets reduced (or more accurately the gain reduced), it will stay reduced for the rest of the video.
I do
not want this to be a dynamic range compression feature, so growing the gain again is not desirable. The feature is instead designed to claim the huge amount of unused headroom in the signal of many videos.
I don't think this can be properly dealt with in a 1-pass approach, just wondering how your would deal with it.
2-pass removes the compromise of a possible volume decrease, but it won't work with live sources.
I considered databasing the peak level encountered so that it could be used on the next playback. However, that gets complicated by this design goal:
Adjustments should happen as late as possible in the DSP chain so that any headroom gained from using Room Correction, etc. can be utilized.
In other words, changing other DSP Settings like the level of a particular speaker will change the signal the normalizer sees since it's late in the chain.