INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Audio analysis for video files  (Read 4686 times)

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Audio analysis for video files
« on: July 09, 2013, 04:41:12 pm »

As discussed here, we're opening up the Audio Analyzer for video files in MC19.

Currently it analyzes these values:
Volume Level (R128)
Dynamic Range (R128)
Peak Level

My question is if there are any other useful metrics people would like for audio analysis?

Thanks for any suggestions.
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Audio analysis for video files
« Reply #1 on: July 09, 2013, 04:55:04 pm »

I suppose I've been waiting for this type of question, to follow-up with one regarding BPM.  I'm wondering if something might not be right.  Fleetwood Mac's Over & Over (from Tusk) is analyzed to have 166 BPM.  It is a very slow BPM song (with Intensity 1), so I'm wondering what beats are being counted.

Lowest priority.
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #2 on: July 09, 2013, 05:05:59 pm »

BPM algorithms are tricky.

It's not easy, sometimes even for a human, to tell between multiples like 50, 100, and 150 bpm.

I think we could improve our algorithm, and it's the type of things I love getting lost on, but I also don't think it's very important compared to some of the other opportunities we have.
Logged
Matt Ashland, JRiver Media Center

mojave

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3732
  • Requires "iTunes or better" so I installed JRiver
Re: Audio analysis for video files
« Reply #3 on: July 09, 2013, 05:07:17 pm »

Average Level
Crest Factor (probably not really necessary since you have Dynamic Range)

Will it provide data for each channel?
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #4 on: July 09, 2013, 05:29:13 pm »

Quote
Average Level

Isn't that similar to the loudness (ie. ReplayGain / R128 Loudness)?


Will it provide data for each channel?

The R128 numbers are one overall number.

What do you have in mind?
Logged
Matt Ashland, JRiver Media Center

mojave

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3732
  • Requires "iTunes or better" so I installed JRiver
Re: Audio analysis for video files
« Reply #5 on: July 09, 2013, 06:00:46 pm »

Isn't that similar to the loudness (ie. ReplayGain / R128 Loudness)?
Yes it is. Nevermind.

Quote
The R128 numbers are one overall number.

What do you have in mind?
I was thinking that when using active crossovers or other system setup it could be useful to know the peak level of each channel so you can easily find the worst case scenario with at least your current media. I like how the analyzer displays RMS levels of each channel, but have often wanted it to also show the maximum RMS level during playback. Having that done during audio analysis makes more sense since real time analysis takes too much time.

I'm also wondering if a channel peak level could better help with convolution normalization. I've seen reports on both the Audiolense and Acourate forums that JRiver's "normalize filter volume" causes clipping and is recommended to be unchecked.
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #6 on: July 09, 2013, 06:27:30 pm »

I've seen reports on both the Audiolense and Acourate forums that JRiver's "normalize filter volume" causes clipping and is recommended to be unchecked.

I know about this.  Currently the normalize system pushes pink noise through the convolution engine and target -6dB of change.

However, with some filters this pushes them too hard and leads to clipping.

A more complicated normalization method would probably work better, but I'm not sure what it would be.

It's possible just changing the default to off and removing the words "recommended" would be enough for now.
Logged
Matt Ashland, JRiver Media Center

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Audio analysis for video files
« Reply #7 on: July 09, 2013, 10:37:08 pm »

I think it's very important to also calculate a "downmix loudness" value for when videos are being downmixed to stereo, rather than just at the "native" number of channels.
If anything, downmixing needs normalization more than outputting multichannel does.
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #8 on: July 10, 2013, 12:12:40 pm »

We'll switch the 'Peak Level' field to be a string, use decibels, and show per-channel levels.

Here's an example from a 5.1 music video I just tested:
-0.1 dB (-0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR)

Packing it into a single string might make it a little harder to manipulate the data, but the alternative was to create a pile of peak level fields for each channel which seemed a little messy.

Logged
Matt Ashland, JRiver Media Center

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #9 on: July 10, 2013, 12:29:49 pm »

I think it's very important to also calculate a "downmix loudness" value for when videos are being downmixed to stereo, rather than just at the "native" number of channels.
If anything, downmixing needs normalization more than outputting multichannel does.

I think this is just a math problem.  Proper down-mixing doesn't change the energy balance, so I don't believe you need two analyze numbers.

The downmixer knows if it turns the volume down to prevent clipping, so it could provide that information to the Volume Leveling code for it to take into account.
Logged
Matt Ashland, JRiver Media Center

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #10 on: July 10, 2013, 12:44:55 pm »

We'll switch the 'Peak Level' field to be a string, use decibels, and show per-channel levels.

Here's an example from a 5.1 music video I just tested:
-0.1 dB (-0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR)

Packing it into a single string might make it a little harder to manipulate the data, but the alternative was to create a pile of peak level fields for each channel which seemed a little messy.

Should mono videos show:
-5.7 dB (-5.7 Mono)

Or just:
-5.7 dB
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Audio analysis for video files
« Reply #11 on: July 10, 2013, 01:08:17 pm »

Why bother with the parens at all.  How about just making it a list field of the form:

   value description[; ...]

Examples:

   -0.1 dB; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
   -0.1 dB Total; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
   -5.7 dB Mono

This makes it very easy to parse, show the first value, or select any value (by position or channel pattern).  Format display could be treated like Artist (first only for ,1, all for ,0).
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #12 on: July 10, 2013, 01:23:03 pm »

Why bother with the parens at all.  How about just making it a list field of the form:

   value description[; ...]

Examples:

   -0.1 dB; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
   -0.1 dB Total; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
   -5.7 dB Mono

This makes it very easy to parse, show the first value, or select any value (by position or channel pattern).  Format display could be treated like Artist (first only for ,1, all for ,0).

Good advice.  This is why I ask.

We'll try this format:
   -0.1 dB; -0.1 Left; -0.1 Right; -1.5 Center; -6.9 Sub; -12.3 SL; -12.4 SR
   -5.7 dB; -5.7 Mono

I prefer denoting mono or else we'll get questions "why don't I have per-channel for this video."  Using "-5.7 dB Mono" instead would work, but then mono works differently than all other channel counts which I don't like.
Logged
Matt Ashland, JRiver Media Center

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Audio analysis for video files
« Reply #13 on: July 10, 2013, 01:29:46 pm »

I think this is just a math problem.  Proper down-mixing doesn't change the energy balance, so I don't believe you need two analyze numbers.
The downmixer knows if it turns the volume down to prevent clipping, so it could provide that information to the Volume Leveling code for it to take into account.
According to Tech 3343, downmixes need separate analysis - at least that's how I understood it.

Quote from: Tech 3343
It is also again pointed out that the surround channels are weighted with +1.5 dB⁵ during a loudness measurement according to ITU-R BS.1770. After an automatic downmix this weighting is not applied, as the result is only frontal 2-ch-stereo (Left and Right front). Programmes with a lot of surround content will consequently exhibit potentially larger variations of the loudness of the surround mix vs. the 2-ch-stereo downmix than programmes with more ‘conservative’ use of the surround channels.

Or is this something you are able to calculate after-the-fact? (I was under the impression it would not be)

They also recommend this for the downmix, when the downmix metadata is missing. Currently, LAV filters doesn't read this information. I don't know what JRSS uses for its standard downmix.
L, R front: 0 dB
C, LS, RS: -3 dB
Logged

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Audio analysis for video files
« Reply #14 on: July 10, 2013, 01:59:56 pm »

I prefer denoting mono or else we'll get questions "why don't I have per-channel for this video."  Using "-5.7 dB Mono" instead would work, but then mono works differently than all other channel counts which I don't like.

Agreed.  So ListCount([Peak Level]) always returns the number of channels + 1 (a summary value, followed by the channel values), and ListItem([Peak Level], 0) always gives you a simple summary value in the form #.# dB.  I wonder if [Peak Value,1] returns a formatted string #.# dB, or a decimal #.#.  The latter makes Math() easier of course, but then makes your column display harder, but I think you're special casing this anyway.  [No reply necessary]
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: Audio analysis for video files
« Reply #15 on: July 10, 2013, 02:11:54 pm »

Agreed.  So ListCount([Peak Level]) always returns the number of channels + 1 (a summary value, followed by the channel values), and ListItem([Peak Level], 0) always gives you a simple summary value in the form #.# dB.

Yes.


Quote
I wonder if [Peak Value,1] returns a formatted string #.# dB, or a decimal #.#.  The latter makes Math() easier of course, but then makes your column display harder, but I think you're special casing this anyway.

For now, it's a regular string field.  That means you'd get the string "#.# dB" or "#.# Channel".  However, our string to number handler will ignore stuff after the number so you could make it a number as necessary with an expression.
Logged
Matt Ashland, JRiver Media Center

mojave

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3732
  • Requires "iTunes or better" so I installed JRiver
Re: Audio analysis for video files
« Reply #16 on: February 25, 2015, 03:52:25 pm »

I know about this.  Currently the normalize system pushes pink noise through the convolution engine and target -6dB of change.

However, with some filters this pushes them too hard and leads to clipping.

A more complicated normalization method would probably work better, but I'm not sure what it would be.

It's possible just changing the default to off and removing the words "recommended" would be enough for now.
Nothing was ever done about "normalize filter volume" in Convolution. I thought I'd bring it up again.
Logged
Pages: [1]   Go Up