Topic: Audio analysis for video files (Read 5562 times)

Matt · « **on:** July 09, 2013, 04:41:12 pm »

As discussed here, we're opening up the Audio Analyzer for video files in MC19.

Currently it analyzes these values:
Volume Level (R128)
Dynamic Range (R128)
Peak Level

My question is if there are any other useful metrics people would like for audio analysis?

Thanks for any suggestions.

MrC · « **Reply #1 on:** July 09, 2013, 04:55:04 pm »

I suppose I've been waiting for this type of question, to follow-up with one regarding BPM. I'm wondering if something might not be right. Fleetwood Mac's Over & Over (from Tusk) is analyzed to have 166 BPM. It is a very slow BPM song (with Intensity 1), so I'm wondering what beats are being counted.

Lowest priority.

Matt · « **Reply #2 on:** July 09, 2013, 05:05:59 pm »

BPM algorithms are tricky.

It's not easy, sometimes even for a human, to tell between multiples like 50, 100, and 150 bpm.

I think we could improve our algorithm, and it's the type of things I love getting lost on, but I also don't think it's very important compared to some of the other opportunities we have.

mojave · « **Reply #3 on:** July 09, 2013, 05:07:17 pm »

Average Level
Crest Factor (probably not really necessary since you have Dynamic Range)

Will it provide data for each channel?

Matt · « **Reply #4 on:** July 09, 2013, 05:29:13 pm »

Quote

Average Level

Isn't that similar to the loudness (ie. ReplayGain / R128 Loudness)?

Quote from: mojave on July 09, 2013, 05:07:17 pm

Will it provide data for each channel?

The R128 numbers are one overall number.

What do you have in mind?

mojave · « **Reply #5 on:** July 09, 2013, 06:00:46 pm »

Quote from: Matt on July 09, 2013, 05:29:13 pm

Isn't that similar to the loudness (ie. ReplayGain / R128 Loudness)?

Yes it is. Nevermind.

Quote

The R128 numbers are one overall number.

What do you have in mind?

I was thinking that when using active crossovers or other system setup it could be useful to know the peak level of each channel so you can easily find the worst case scenario with at least your current media. I like how the analyzer displays RMS levels of each channel, but have often wanted it to also show the maximum RMS level during playback. Having that done during audio analysis makes more sense since real time analysis takes too much time.

I'm also wondering if a channel peak level could better help with convolution normalization. I've seen reports on both the Audiolense and Acourate forums that JRiver's "normalize filter volume" causes clipping and is recommended to be unchecked.

Matt · « **Reply #6 on:** July 09, 2013, 06:27:30 pm »

Quote from: mojave on July 09, 2013, 06:00:46 pm

I've seen reports on both the Audiolense and Acourate forums that JRiver's "normalize filter volume" causes clipping and is recommended to be unchecked.

I know about this. Currently the normalize system pushes pink noise through the convolution engine and target -6dB of change.

However, with some filters this pushes them too hard and leads to clipping.

A more complicated normalization method would probably work better, but I'm not sure what it would be.

It's possible just changing the default to off and removing the words "recommended" would be enough for now.

6233638 · « **Reply #7 on:** July 09, 2013, 10:37:08 pm »

I think it's very important to also calculate a "downmix loudness" value for when videos are being downmixed to stereo, rather than just at the "native" number of channels.
If anything, downmixing needs normalization more than outputting multichannel does.

Matt · « **Reply #8 on:** July 10, 2013, 12:12:40 pm »

We'll switch the 'Peak Level' field to be a string, use decibels, and show per-channel levels.

Here's an example from a 5.1 music video I just tested:
-0.1 dB (-0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR)

Packing it into a single string might make it a little harder to manipulate the data, but the alternative was to create a pile of peak level fields for each channel which seemed a little messy.

Matt · « **Reply #9 on:** July 10, 2013, 12:29:49 pm »

Quote from: 6233638 on July 09, 2013, 10:37:08 pm

I think it's very important to also calculate a "downmix loudness" value for when videos are being downmixed to stereo, rather than just at the "native" number of channels.
If anything, downmixing needs normalization more than outputting multichannel does.

I think this is just a math problem. Proper down-mixing doesn't change the energy balance, so I don't believe you need two analyze numbers.

The downmixer knows if it turns the volume down to prevent clipping, so it could provide that information to the Volume Leveling code for it to take into account.

Matt · « **Reply #10 on:** July 10, 2013, 12:44:55 pm »

Quote from: Matt on July 10, 2013, 12:12:40 pm

We'll switch the 'Peak Level' field to be a string, use decibels, and show per-channel levels.

Here's an example from a 5.1 music video I just tested:
-0.1 dB (-0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR)

Packing it into a single string might make it a little harder to manipulate the data, but the alternative was to create a pile of peak level fields for each channel which seemed a little messy.

Should mono videos show:
-5.7 dB (-5.7 Mono)

Or just:
-5.7 dB

MrC · « **Reply #11 on:** July 10, 2013, 01:08:17 pm »

Why bother with the parens at all. How about just making it a list field of the form:

value description[; ...]

Examples:

-0.1 dB; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
-0.1 dB Total; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
-5.7 dB Mono

This makes it very easy to parse, show the first value, or select any value (by position or channel pattern). Format display could be treated like Artist (first only for ,1, all for ,0).

Matt · « **Reply #12 on:** July 10, 2013, 01:23:03 pm »

Quote from: MrC on July 10, 2013, 01:08:17 pm

Why bother with the parens at all. How about just making it a list field of the form:

value description[; ...]

Examples:

-0.1 dB; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
-0.1 dB Total; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
-5.7 dB Mono

This makes it very easy to parse, show the first value, or select any value (by position or channel pattern). Format display could be treated like Artist (first only for ,1, all for ,0).

Good advice. This is why I ask.

We'll try this format:
-0.1 dB; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
-5.7 dB; -5.7 Mono

I prefer denoting mono or else we'll get questions "why don't I have per-channel for this video." Using "-5.7 dB Mono" instead would work, but then mono works differently than all other channel counts which I don't like.

6233638 · « **Reply #13 on:** July 10, 2013, 01:29:46 pm »

Quote from: Matt on July 10, 2013, 12:29:49 pm

I think this is just a math problem. Proper down-mixing doesn't change the energy balance, so I don't believe you need two analyze numbers.
The downmixer knows if it turns the volume down to prevent clipping, so it could provide that information to the Volume Leveling code for it to take into account.

According to Tech 3343, downmixes need separate analysis - at least that's how I understood it.

Quote from: Tech 3343

It is also again pointed out that the surround channels are weighted with +1.5 dB⁵ during a loudness measurement according to ITU-R BS.1770. After an automatic downmix this weighting is not applied, as the result is only frontal 2-ch-stereo (Left and Right front). Programmes with a lot of surround content will consequently exhibit potentially larger variations of the loudness of the surround mix vs. the 2-ch-stereo downmix than programmes with more ‘conservative’ use of the surround channels.

Or is this something you are able to calculate after-the-fact? (I was under the impression it would not be)

They also recommend this for the downmix, when the downmix metadata is missing. Currently, LAV filters doesn't read this information. I don't know what JRSS uses for its standard downmix.
L, R front: 0 dB
C, LS, RS: -3 dB

MrC · « **Reply #14 on:** July 10, 2013, 01:59:56 pm »

Quote from: Matt on July 10, 2013, 01:23:03 pm

I prefer denoting mono or else we'll get questions "why don't I have per-channel for this video." Using "-5.7 dB Mono" instead would work, but then mono works differently than all other channel counts which I don't like.

Agreed. So ListCount([Peak Level]) always returns the number of channels + 1 (a summary value, followed by the channel values), and ListItem([Peak Level], 0) always gives you a simple summary value in the form #.# dB. I wonder if [Peak Value,1] returns a formatted string #.# dB, or a decimal #.#. The latter makes Math() easier of course, but then makes your column display harder, but I think you're special casing this anyway. [No reply necessary]

Matt · « **Reply #15 on:** July 10, 2013, 02:11:54 pm »

Quote from: MrC on July 10, 2013, 01:59:56 pm

Agreed. So ListCount([Peak Level]) always returns the number of channels + 1 (a summary value, followed by the channel values), and ListItem([Peak Level], 0) always gives you a simple summary value in the form #.# dB.

Yes.

Quote

I wonder if [Peak Value,1] returns a formatted string #.# dB, or a decimal #.#. The latter makes Math() easier of course, but then makes your column display harder, but I think you're special casing this anyway.

For now, it's a regular string field. That means you'd get the string "#.# dB" or "#.# Channel". However, our string to number handler will ignore stuff after the number so you could make it a number as necessary with an expression.

mojave · « **Reply #16 on:** February 25, 2015, 03:52:25 pm »

Quote from: Matt on July 09, 2013, 06:27:30 pm

I know about this. Currently the normalize system pushes pink noise through the convolution engine and target -6dB of change.

However, with some filters this pushes them too hard and leads to clipping.

A more complicated normalization method would probably work better, but I'm not sure what it would be.

It's possible just changing the default to off and removing the words "recommended" would be enough for now.

Nothing was ever done about "normalize filter volume" in Convolution. I thought I'd bring it up again.

INTERACT FORUM

Author Topic: Audio analysis for video files (Read 5562 times)

Matt

Audio analysis for video files

MrC

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

mojave

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

mojave

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

6233638

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

MrC

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

6233638

Re: Audio analysis for video files

MrC

Re: Audio analysis for video files

Matt

Re: Audio analysis for video files

mojave

Re: Audio analysis for video files