Topic: How to Automatically Parse Composition and Movement info in Classical Music (Read 5206 times)

wer · « **on:** March 12, 2021, 03:18:35 am »

Intro
I've been asked how I implement automatic parsing of information for Classical music, most recently about Movement and Movement Number information, so I'm going to try to explain a bit about that here. I thought it would make more sense if I present this in conjunction the Composition field. I've talked about Composition before in a previous tutorial, and this will cover some of the same information so as to be more cohesive.

First, a little disclaimer: I consider parts of this a work in progress. It's been evolving over time, and although Composition is well nailed down, Movement information is more complicated and might continue to be improved. Also, Matt is working on some changes to the expression language I suggested that will make certain aspects of this sort of thing easier, so there will be opportunities to simplify some of the expressions when those updates are publicly available.

So for those who are interested, I'll describe the system I am using. I think it's a good system, and is definitely better than some alternatives, but it is far from the only system that would work. If you want to have a different system, you can adapt what I'm doing here to fit the way you want to work.

The most important thing to understand is that you have to have a structured naming system for your classical music, and you have to enforce it. If you want your information to be automatically parsed out, it has to be systematized. That is essential.

So why do we need this? It should be understood while in popular/rock/jazz music, most pieces music are a single track, classical is different. A lot of classical music consists of multiple tracks that taken together form a single piece of music, a Composition. For example, Mozart's 1st Piano Concerto consists of 3 movements, each in a separate track, and you would listen to these three tracks together as a single Composition. JRiver doesn't natively have a concept of a Composition in this way, so I created one. Often, although not always, when the music is put onto CD, each Movement of the Composition is a track. So it's useful to adopt this form, and it can often be leveraged for pieces of music that don't perfectly fit the structure, if we're careful with our system.

So let's start by looking at Mozart's 1st Piano Concerto, which will serve as a typical piece. It has three tracks, and if we have them named as follows, we can see a pattern:

Concerto No.1 in F major, K.37: I. Allegro
Concerto No.1 in F major, K.37: II. Andante
Concerto No.1 in F major, K.37: III. Allegro

Each track name has a colon. Everything before the colon is common to all the tracks. Everything after the colon distinguishes that track from the others. This format is used frequently in online databases; it's quite common. If I acquire tracks that aren't named with this type of structure, I use the expression language to adjust them to conform with it. Enforcing that consistency pays dividends.

Composition
Everything before the colon is the Composition. Everything after the colon is movement information. So in this system, we have this definition:
Name=Composition:Movement
The Movement Number is in dotted notation at the start of Movement

For the example above, we get a [Composition] of
Concerto No.1 in F major, K.37

All of the three tracks share that same value.

This pattern allows us to automatically define Composition as a calculated field:

Code: [Select]

ListItem([Name],0,:)
The clever thing about this definition is that if you leave out the colon (you wouldn't have separate tracks that make up a rock song) then [Composition]=[Name] In other words, if there's no colon, the name of the track is the name of the composition. So if Name="Stairway to Heaven" then Composition="Stairway to Heaven" Easy.

So you create a new field called Composition, in Options->Library & Folders->Manage Library Fields, and the dialog looks like this:

Movement Name
Now, getting the movement information is a little more complicated.

First, MC has two built in fields: [Movement] and [Movement Number] Because they are built-in fields, we cannot change their type to Calculate Data.

So we make two new fields to use instead: [Movement Name] and [Movement #]
Use the same dialog box as you did before for [Composition].

[Movement Name] is also calculated data, defined as follows:

Code: [Select]

If(IsEqual([Composition],[Name]),,ListItem([Name],1,:))
Basically, if the Composition is different from the Name, it takes whatever's after the colon. Make sure you have no more than one colon.

For the three tracks I showed you before, it would give Movement Names of:
I. Allegro
II. Andante
III. Allegro

Movement #
Now we can also extract the Movement #. This is even more complicated, because this data varies a bit more. [Movement #] is a field of type calculated data, defined like this:

Code: [Select]

If(IsEqual([Movement Name],.,8),If(IsEqual([Movement Name],No.,8),regex([Movement Name],/#(.+\d+)#/,1,0),regex([Movement Name],/#(^([^.]+))#/,1,0)),)

My system for [Movement #] looks for two different patterns. First, it expects to see a period as a separator. If there is no period (a dot) then the [Movement #] field will be empty. This is appropriate for single track pieces that to not have movements or multiple parts.

The first pattern it looks for is the use of "No." as an abbreviation, which happens so often I made a special case for it. As in:
"9 Etudes Tableaux Op. 39: No. 1 C minor"

If it sees this, it will take the "No. 1" as a Movement #.

The second pattern it looks for is some other term set off by a dot, in which case it will take everything up to the dot. Some examples:
Concerto No.1 in F major, K.37: II. Andante [Movement #]=II Bagatelles (11) for piano, Op. 119: VI. Andante (G major) [Movement #]=VI Orchestral Suite No. 1 in C major, BWV 1066: 1. Ouverture [Movement #]=1

This approach is flexible enough to reasonably handle pieces of music that don't fit the standard 1-track-per-movement recording paradigm.

For example, Mahler's 3rd Symphony has more of an operatic structure, with 6 large movements split across 26 tracks. The first part of the 6th movement (track 21) looks like this:
Symphony No. 3: VI-1. Langsam. Ruhevoll. Empfunden [Movement #]=VI-1

It can also work for opera, which technically has acts and scenes rather than movements. Look at a track from La Traviata:
La Traviata: Act 2 Scene III. Alfredo solo [Movement #]=Act 2 Scene III

Because it is looking for the period, anything before that first dot it will use as a Movement #. If you don't want a Movement #, leave out the period.

How it looks
You can see how all this plays out in a view here:

I don't have to key in any of the Composition, Movement Name or Movement # fields. I make sure my [Name] field is fixed when I rip or immediately after I import, and the rest is done automatically. That works well, because I don't like to key in more than I have to.

So that's how you can parse out the information. Once you have the Composition information, you can make views based on it, and also collect summary statistics like ratings and duration. Here's an example:

Hopefully JRiver will enhance MC to provide built in support for Composition in the future, so that they can be properly handled in playlists and smartlists, etc. If you'd like that to happen, voice your support in the feature request thread over here: https://yabb.jriver.com/interact/index.php/topic,128860.0.html

People have been using the [Composition] approach for a while, but since there has recently been additional interest in the Movement info, I thought I'd put this up.

Anyway, I hope people will find this useful, and you can adapt the technique to your needs...

Manfred · « **Reply #1 on:** March 12, 2021, 08:04:20 am »

Great post!

I have done something similar. I used the name work instead of composition, also I don't put the composer in the Artist field. I use composer, conductor, orchestra. In the Artist field is typically something like "Pittsburgh Symphony Orchestra (orchestra);Manfred Honeck".

Is your attempt for ripped CD's and/or also for digital downloads?

I personally don't like to change the imported fields for purchased albums from HigRes Audio, Native DSD, Reference Recordings etc..

For me a bigger problem is, if you have a composition/work e.g. Götterdämmerung from Richard Wagner and several albums by diffrent conductors, Artist etc. and the movement after ripping is different (a terrible work to correct it manually ) e.g.
Name: Prolog - Welch Licht Leuchtet Dort?
Name: Welch Licht Leuchtet Dort (1. Norn)

But with your method this is not automatically corrected?

HaWi · « **Reply #2 on:** March 12, 2021, 01:24:34 pm »

wer, this is so clear and informative. Thank you very much.

wer · « **Reply #3 on:** March 12, 2021, 03:37:54 pm »

Quote from: wer on March 12, 2021, 12:38:53 pm

The Composition can't be the source for interpreted data, because it is the destination.

Quote from: hoyt on March 12, 2021, 02:20:59 pm

Can you explain your workflow with this, or link to your prior discussion? I searched around a bit, but didn't find something that I thought was step-by-step enough to understand why you would rather go from Name -> Composition, Movement, Movement # vs the other way around. Or are you saying that when you rip the CD and use YADB, it is automatically formatting the name like you're stating? I'm going through the classical albums I have now and am finding it much easier to shift+click 6 tracks, note them as Composition = Cello Suite No. 1 in G Major, Catalog Reference = BWV 1007, then go through each one and enter the Movement and Movement #. Once I'm done with that, I enter the Name as =[Composition]: [Movement Number]. [Movement]. I find that easier to make the name consistent, but you've put more thought and work into this, so I'd like to better understand your reasoning. Plus my CDs are all already ripped and I don't recall how it named things.

Hoyt asked me this question in another thread, but I'm responding here to try and get that other thread back on topic...

So why am I saying Composition is destination, and not the source? Think about something simple, like Album. For MC to consider songs to be part of the same Album, there must be something IN the Album field. It's the contents of the Album field that identify the Album. How does something get in the Album field? You have to put it there. Maybe you typed it in, maybe you retrieved metadata from a database. But you put something in there.

But all these other fields are not part of the metadata databases. When you retrieve metadata for a song, you do not get a field called Composition. Or Work, or any other synonym you might think of. You get nothing like that. What do you get? You get Album, Name, Composer, Genre, Artist, Album Artist, Date, not much else. There is no Composition field in these databases.

What you definitely do not get is anything like Composition. The only thing you get that is even remotely like it is NAME. Name is the only stock field that contains information unique to that piece of music that distinguishes it from others (other than track number, which is not useful for our purposes).

Imagine you download metadata for an Album of Beethoven Piano Sonatas. What metadata will you get?

In most cases, Composer, Artist, Album, Date, Genre, etc, are shared by all the tracks on the album. Which makes them 100% useless for distinguishing one track from another. The only field that does that is the Name. If you downloaded metadata for an album, and it had nothing in the Name field but all the others were populated, you would consider that garbage metadata, because you would be nowhere. You would have no idea what piece of music, which sonata, track 4 was, and you would have to look it up somewhere and type it in.

Imagine if the list of Names you retrieved looked like this:
1. Allegro
2. Adagio
3. Menuetto
4. Prestissimo
1. Allegro
2. Allegretto
3. Presto
1. Allegro
2. Adagio
3. Menuetto
4. Prestissimo

You'd probably be a bit PO'd. What piece of music, what sonata, is track 6? Without the composition information, it's not complete. So it's usually there.

So that's why Name is the source. You have to get information out of Name, and into Composition. That makes Composition the destination.

You asked what my workflow is...

I'll start by saying I didn't pull my naming system (described above in this thread) out of a hat. I'm an old programmer. That means I'm lazy and don't want to type in more than I have to. So I chose the system with care, and part of the reason was that the conventions I am using are in very widespread use.

When I import new music, the first thing I do is review the metadata and clean up the [Name] field. As a concrete example, I just popped Thomas Murray's recording of Mendelssohn Organ Sonatas in and fired up dbPA. Here's what it showed me:

(I don't rip with MC because I find DBPA gives better results.) I've never ripped this disc before. And look at that Name field. It is almost exactly in the format I want. The only thing I need to do is do an S&R to remove that colon after "Op. 65:". I could add the 1. 2. 3. 4. for movement numbers if I wanted. But the Composition information is THERE. Not anywhere else. There.

I find that a very large proportion of tracks have Name fields already in this format, or close to it. And if they don't, when I import the tracks I immediately do a little massaging via the expression language to whip the format into shape. That's the discipline you have to have.

You say you key info into the Composition field. But where are you getting the info you key in? How do you know that track is part of "Cello Suite No. 1 in G Major"? You're getting it from the Name. Ancillary details like catalog reference you're getting from elsewhere, but the part that identifies what composition it is, you're deriving that from what the name field tells you.

I've heard other people say that they just accept whatever metadata is retrieved at rip time. That's fine for them, but they're going to have a lot of disorganized garbage in their metadata, because those databases are garbage in garbage out.

This is why [Composition] is essential, but the other fields like [Movement #] are frankly penny ante. If you don't have Composition info for a track (even if it's only embedded in the name) you literally don't know what piece of music you have. Maybe people get confused, because they actually have Composition info without realizing it, because the info itself is usually part of the Name. But because it's in the [Name] field, they get confused over understanding that Composition is just a stripped down version of what is in Name.

The question of whether the [Name] field provided by online databases will give you good movement infomation is pretty hit and miss. But they almost always give you the Composition info, because if they don't no one would consider that acceptable metadata in the first place.

So in the context of discussing what MC can automate, if it depends on info already in [Composition], which the user had to provide, then it hasn't automated it. Automation and Composition only make sense if you're talking about automatic parsing of data to put INTO Composition, and as described above, ultimately that would come from Name. Of course, since [Composition] is a user field, and thanks to all the chatter it probably never will be automated, you'll be free to use your own expressions to base Composition on whatever other fields you want, or none at all.

Another long post, but I hope this is clear. Did I answer your question?

hoyt · « **Reply #4 on:** March 12, 2021, 07:12:44 pm »

Quote from: wer on March 12, 2021, 03:37:54 pm

(I don't rip with MC because I find DBPA gives better results.)

...

Another long post, but I hope this is clear. Did I answer your question?

Yes! I hadn't seen this thread when I responded on the other one because I had looked for your process yesterday when you had mentioned to someone else to search, and hadn't gotten around to responding until this morning. This was a great read and very informative, thank you for taking the time.

I was assuming that when talking about MC "dealing with Classical music better" that the initial source of metadata would be in MC. That source would either be importing a download, or ripping a CD in MC. So I was envisioning a system where we would be deciphering the discrete data elements of Composition, Movement Name, and Movement # to create the Name. I also didn't realize you created the Movement Name and Movement # fields seperatly from the built in fields. That makes sense, I was attempting to use the Movement Number and found the translation of Roman Numerals to numeric values a bit annoying. I also struggled understanding how it would easily apply to opera and ballet music like you had mentioned, because Giselle Act I - No.5a - End Of Hunting Scene didn't really break down without me adding a new fields, which I did (Act and Scene). However, the method of adding a new generic field for Movement Name and Movement # makes sense to me.

The eye opener for me is that you're really setting that metadata in a 3rd party tool, I'm not, so I'm not really replicating your workflow to figure out how MC can make it easier. What you described makes sense when the Name is pure coming into MC, but if you already have different metadata, I think that's harder to get to. For example, this album (not sure how it got tagged). I can tell what all those pieces are, but in order to have MC parse them into your scheme, I would need to rename the 10 tracks:

Same thing with this one, the tagging is in bad shape, but I know what's what. Tracks 1 - 7 are Suite No 1, 8 - 14 are Suite No 2, etc.

In order to "clean those up", I would rather enter Composition 4 times, then set Movement =[Name], manually clean the 4 that have the Composition in them, then make Name=[Composition]: [Movement], versus entering the full string 24 times "Suite No. 1 in C Major: I. Overture", etc.

Or this one, where the Composition is a part of the Album name:

I certainly see more where you're coming from now. I use and rely on MC to help me clean up metadata, so that is where my thinking went. To be perfectly clear, I think your idea of using the Composition field as a relational field is wonderful. How I would go about getting information into that field is a different use case though. Looking at the classical albums in my dad's library, it seems less than 1/3 are tagged with the name like you have - a lot are [Composition] - [Movement]. I'm assuming he just popped the disc in and had YADB do the naming.

wer · « **Reply #5 on:** March 12, 2021, 07:51:42 pm »

Quote from: hoyt on March 12, 2021, 07:12:44 pm

The eye opener for me is that you're really setting that metadata in a 3rd party tool, I'm not, so I'm not really replicating your workflow to figure out how MC can make it easier. What you described makes sense when the Name is pure coming into MC, but if you already have different metadata, I think that's harder to get to.

No, I do not modify the metadata in DPBA, you just assumed that. I rip in DBPA, the metadata that you saw in the screenshot is what comes into MC, and then I do my cleanup in MC. The point is that you CAN replicate my MC workflow, because it does not matter at all whether DBPA does the initial metadata lookup, or MC does. Either way, you start with some retrieved metadata. It's either perfect, or needs to be massaged. The source doesn't matter - the process is the same.

Quote from: hoyt on March 12, 2021, 07:12:44 pm

I can tell what all those pieces are, but in order to have MC parse them into your scheme, I would need to rename the 10 tracks:

This example, of the Schumann, is really a non-issue isn't it? You talk about renaming the 10 tracks like it's some kind of burden. But it's done in 1 step: F&R Find " - " and replace with ": " It's as simple as that. Trivial, and done in 5 seconds.

Quote from: hoyt on March 12, 2021, 07:12:44 pm

Same thing with this one, the tagging is in bad shape, but I know what's what. Tracks 1 - 7 are Suite No 1, 8 - 14 are Suite No 2, etc.

So this is a good example of garbage data. But your "you would rather" preference is arbitrary, because it's the same amount of work. If I get a garbage retrieval like this, I have to do the same as you: manually enter it. But what I do is manually enter it into a temp field and the copy it over with [Name]=[temp]: [Name] or I could just directly edit Name as F2 =Suite No. 1 in C Major:[Name]

The bottom line is when you have garbage metadata like you show in the Bach suites, you have to clean up every track. But this is not at all difficult with a little substitution using the expression language as shown above. And it does not matter if you're changing Name or Composition instead, the amount of work is the same. Of course, some people don't want to put any work into their tagging at all. Those people can't really be helped.

You might wonder why does it even matter if the info ends up in [Name], instead of just leaving it only in Composition? The reason is compatibility.
Number of Cars that recognize the Composition field: Zero
Number of Phones that recognize the Composition field: Zero
Number of Streaming Music Players that recognize the Composition field: Zero
Number of TVs and Receivers that recognize the Composition field: Zero

These devices are going to show you the Name field though. They ALWAYS show you the Name field. So that's why it's good to have a well-structured Name field.

We derive Composition from the Name field because then, when you are using MC, you can actually do more with it.

Quote from: hoyt on March 12, 2021, 07:12:44 pm

...it seems less than 1/3 are tagged with the name like you have - a lot are [Composition] - [Movement]. I'm assuming he just popped the disc in and had YADB do the naming.

You're making my case for me, Hoyt. When the data has any kind of structure, like your "a lot are [Composition] - [Movement]" example, how can you be claiming that's an obstacle?

You can fix 20,000 of those in 10 seconds:
Find and Replace: Find " - " and replace with ": "

It doesn't matter at all what system you adopt. If you want to use [Composition]: [Movement] that's fine. If you want to use [Composition] - [Movement] you can do that. Just change the expressions in my example.

The point is it's better to use ONE system for your tracks, not 15.
If you have a bunch of tracks as [Composition]: [Movement], and a bunch as [Composition] - [Movement], and another bunch as ([Composer]) [Composition] --- [Movement] and so on, then that's the problem. If you want your music well organized, make it consistent.

There's nothing I'm doing that you couldn't do; I do all my editing in MC. So forget about my ripping in DBPA, it doesn't matter. If you have a bunch of different naming "standards" because you just accepted whatever when you ripped, you saved yourself some effort on the front end, but you sacrificed organization and automation for it. Fixing it after is not really more effort than if you'd fixed them all to be consistent in the first place. I put in my effort in the beginning, and then everything is easier later.

You could well be right about how your father ripped; a lot of people are lazy about their tagging and just take whatever; maybe they don't care or maybe they don't know; and then they end up with a hodgepodge of inconsistent metadata. If that's what they want fine. What do I care if my neighbor keeps his kitchen clean or messy?

But a lot of people want clean, consistent, organized metadata. It's people like that who ask about tips like this. People who don't care about their metadata don't ask about tips to organize it.

EnglishTiger · « **Reply #6 on:** March 12, 2021, 11:46:02 pm »

wer I've got a couple of questions for you.

Is MC already intelligent enough to sort Movement Numbers that use Roman Numerals into the correct order, i.e. IX after VIII and not between IV and V, or is that something that MC would have to work on?

My preferred way of handling things that use Acts and Scenes i.e. 'Act 1 - Spell No 5 Scene (Allegro vivo - Moderato)' from Pyotr Il'yich Tchaikovsky Swan Lake, is to put it all into the Movement tag. Am I correct in thinking that as long as the parsing expressions don't find a . in the "Movement part of the Name" it will put it all in the Movement Name tag?

wer · « **Reply #7 on:** March 13, 2021, 12:05:50 am »

There will be a Roman Numeral conversion function. But MC doesn't know how to sort Roman Numerals numerically, and I wouldn't count on that happening.

In my functions, Movement Name is unaffected by dots. The Movement Name field includes everything after the colon, as described earlier. See examples above.

Only Movement # pays attention to dots. See the description of that above. There's even an example for opera.

MikeO · « **Reply #8 on:** March 16, 2021, 07:05:02 am »

Hi a bit late to the party. I agree consistency is key. The problem is classical metadata is far from consistent.

I have somewhere around 90-100k classical tracks , so believe me I’ve seen most variants. Please don’t take this a downer, you idea is great an clearly works for you.

I started 7 years ago, when I retired , to clean up Composition. Later I started to add Movement, a much messier task

I used a 3rd party tool MusiCHI Tagger, which has a tool MusiCLEAN , this has an online db of composers works etc. Applying MusiCLEAN corrects the Composer, Composition, Composition Date, Opus no, Tonality at one fell swoop. It also has a comprehensive Text Processing tool.

Movement is another kettle of fish, hence why I replied

If you consider Beethoven, Piano Concerto No.1 in C major Op.15

Now think of the various permutations and the complexity of parsing the variants.

Composition tends to be delimited by : or - but often by space (3)
Movements are numbered I - IV or 1 - 4 and quite often nothing (3 more)
The I or 1 is normally followed by a period but not necessarily, commonly a space

You also get

Concerto for Piano and Orchestra No.1 in C major Op.15
Or just plain
Concerto No.1 in C Major Op.15

The inclusion of Key and Opus No is common but not necessarily universal

Add to that English , German and occasionally French

This makes the extraction of even a consistent Composition a nightmare

Composer is often blank , more often put in the Artist Tag. In many cases not even mentioned

The Bach example above where the first track is prefixed with the Composition and the rest not is far too common

Using code to parse is possible using String.Split() but applying period separator to split off the movement would still miss those delimited by space.

I could go on , indeed I probably will. The scheme proposed depends a lot on the consistent starting point . If that is an existing (large) library, getting that starting point right given the metadata provided by the record labels is a pretty daunting task . I suspect daunting enough to put off all but the devoted.

I would be delighted to pitch in and help , my background is a 50 yr exposure to classical music , 10 years of classical metadata and a C# development knowledge.

To be honest though without some form of db lookup to “translate” all the equivalent versions I do feel a parsing solution is only going to scratch the surface and probably introduce a level of frustration if the starting point is inconsistent, which it often is.

As an aside there are 3 common sources of standardised data

AllMusic
MusicBrainz
Discogs

If one is serious about standardisation of naming convention , maybe a look at SongKong or MusicBrainz Picard would be a more productive start where the tools do a lookup against a database. Songkong references MusicBrainz and Discogs.

Before I get too excited, the 3 db in question each return a different standard name !! Marginally different but different all the same.

The world of classical music is far from standard.

Sorry if I sound negative, but having been through this pain I have never found anything to yield any level of success other than painful and time consuming manual manipulation.

Mike

JimH · « **Reply #9 on:** March 16, 2021, 07:18:54 am »

One thing that should help is for people who do have well tagged music to submit it to YADB, our online metadata database. Some experimenting would need to be done. Submit a few tracks to YADB, remove the tags on the files, then lookup tracks from YADB.

This process builds a digital signature from the sound of the first few seconds, no matter how it's encoded. It uses this as a key for submission and lookup.

MikeO · « **Reply #10 on:** March 16, 2021, 07:52:21 am »

Before doing an upload it would be great to agree a “standard” format so that was is uploaded is useful to all, I would gladly upload any of my curated metadata. It took me a while to standardise it , I am happy for others to benefit

Can you rescan an existing album against the db ?

hoyt · « **Reply #11 on:** March 16, 2021, 11:35:59 am »

Quote from: MikeO on March 16, 2021, 07:05:02 am

Hi a bit late to the party. I agree consistency is key. The problem is classical metadata is far from consistent.

I think this is where I'm getting hung up. In order to make a calculated field, the input has to follow the same rules every time. This is hard enough to do with live rock music, let alone classical. I found a box of a few dozen classical CDs in my basement that I had gone through at the beginning of the pandemic to take to the library. Never got around to it, so I tried to rip them over the weekend to re-familiarize myself with this (I used XLD to rip these on a Mac, not MC). I agree that wer's process is good and works well, but I don't see how that can be replicated "for the masses." For example, I took 5 albums yesterday. Every single track was tagged (automatically) as [Composer]: etc.

I'm not sure that having the Composer in the name is the best method, so I could remove the Composer, but on a compilation CD, it might be helpful. For example, one of the CDs I ripped was like this:

If I played this album, and just saw Piano Concerto #1 and the Album name of Rachmaninov; Tchaikovsky: Piano Concertos, I wouldn't know which composer was being played. So in this case, it makes sense for there to be another delimiter - but that breaks the rule. That's fine, but I would (and in this case did), manually correct the Composition field, and left the name as you see it.

In the set of 5 CDs, one was a compilation of Opera arias. The Name field on these came into MC as [Composer]:[Opera] - [Aria]. This works to put the Opera as the Composition and the Aria as the Movement. Again, this broke the delimiter rule because the first part was the Composer, not the Composition, but that's fine because I manually corrected it. The lack of a consistent delimiter will be challenging to say the least. For the others, the delimiters between the Composition and Movement Number varied between <space><hyphen><space>, <nospace><colon><space>, or <nospace><hyphen><nospace>. This was just 5 CDs chosen at random and I ended up with 4 different schemes. Yes, I can go through and standardize them as wer has done, but without some tooling to help me get to that state, it's really hard to imagine most people doing that.

In another example (that I had already ripped and tagged), I have a handful of tracks that the name and Composition are two distinct items:

I know the piece Let the Bright Seraphim as that, not as Samson: Act 3, X. Let the Bright Seraphim. But having the Opera name in the Composition field may be useful for searching later, so I want to have it tagged as such.

My opinion is that things like Composer, Composition, Movement, and Movement Number are discrete elements. They should be treated as such. You can derive a discrete element from a name field, but you should not keep a discrete element as a derived element.

MikeO · « **Reply #12 on:** March 16, 2021, 11:48:14 am »

Your small random sample shows exactly what I’m talking about

Scale that 100,000 tracks and you can see the magnitude of the problem , there simply isn’t a standard , so writing a fixed Expression is going to lead to even more errors in splitting with resultant frustration

I am not trying to decry the effort, just outlining my experience to date . Your 5 albums scaled to 1000 , the spread is unlikely to change

Classical metadata is a mess of note ...

Without some specific tools I can’t see how this can scale to a library the size of mine 😎

timwtheov · « **Reply #13 on:** March 16, 2021, 01:52:39 pm »

Or mine: I have somewhere in the neighborhood of 390,000 classical tracks (because I'm insane).

Wer's system is great and very doable with a) smaller libraries and b) libraries that are mostly restricted to certain composition types (symphonies, concertos, sonatas, etc.), which tend to be tracked the same on most recordings. With operas, suites from larger works, other vocal works like oratorios, masses, and the like, it becomes really hard to standardize across albums because recordings often track things very differently, even if there's a relative standard regarding a composition's parts (Richard Strauss's operas, for example, seem to be tracked fairly consistently across different recordings; not so Wagner, Mozart, Verdi, et al).

Most fields can be standardized via tagging through Allmusic or Musicbrainz or Musichi--composer, composition, etc.: I've done it with Allmusic; see image--but it's really hard to get standardized [Name] fields across albums for many types of compositions if you're using something like "composition: movement" scheme even with these tools. Mine's pretty close, but some tracks come in with [Name] as "1. Allegro con brio," or "I. Allegro con brio," or "Allegro con brio" or even "Movement 1: Allegro con brio." Even doing the easier types of compositions can be time-consuming if you have a large collection (especially when one was inconsistent with it at the beginning), but for operas, etc., it's a nightmare because little is standardized from the outset.

wer · « **Reply #14 on:** March 16, 2021, 03:03:20 pm »

Quote from: MikeO on March 16, 2021, 07:05:02 am

I could go on , indeed I probably will. The scheme proposed depends a lot on the consistent starting point . If that is an existing (large) library, getting that starting point right given the metadata provided by the record labels is a pretty daunting task . I suspect daunting enough to put off all but the devoted.

Mike, this seems to be your overall point, and it's one that I made in my post. You have to impose consistency on your data. Any system requires consistency.

Classical metadata in online databases is all over the place. That's the real world. No one is going to fix that for us.

When I started amassing my classical collection, I realized from the beginning consistency would be required. Every time I ripped an album, if the retrieved metadata was non consistent, I made it consistent. Thus, I ate the elephant one bite at a time.

If instead, someone decided they would just accept whatever metadata they retrieved from the internet, and kept doing that, then at the end they would have a large classical collection, very inconsistently tagged. They would have a whole elephant sitting on their plate. Eating it would seem daunting.

Systems require consistency. If you don't want to make your data consistent, that's not a problem with the system.

There is not going to be any magic DB to "translate" all the different ways of naming classical music, because that would require AI natural language recognition: there are as many ways of writing track names as their are capricious individuals to submit it. Who's going to write such a thing and make it available for free? So we will all have to fix our own metadata, or not.

I probably did not spend more time overall than would be required for you to fix your data inconsistencies, I just did it a little bit at a time over years, every time I acquired music.

Good luck...

wer · « **Reply #15 on:** March 16, 2021, 04:25:23 pm »

Regarding Tim and Mike's posts:

Quote from: MikeO on March 16, 2021, 11:48:14 am

Your small random sample shows exactly what I’m talking about

Scale that 100,000 tracks and you can see the magnitude of the problem , there simply isn’t a standard , so writing a fixed Expression is going to lead to even more errors in splitting with resultant frustration
...
Without some specific tools I can’t see how this can scale to a library the size of mine ??

Quote from: timwtheov on March 16, 2021, 01:52:39 pm

Or mine: I have somewhere in the neighborhood of 390,000 classical tracks (because I'm insane).

Wer's system is great and very doable with a) smaller libraries and b) libraries that are mostly restricted to certain composition types (symphonies, concertos, sonatas, etc.), which tend to be tracked the same on most recordings. With operas, suites from larger works, other vocal works like oratorios, masses, and the like, it becomes really hard to standardize across albums because recordings often track things very differently,

Guys, I'm sorry, but this just isn't correct.

It reminds me of a bibliophile who has amassed a collection of 100,000 books, which he has spread throughout his house, according to whatever was expedient at the time. Years later, he learns about the Dewey Decimal System and LC System. And he says, Oh, those can never work for large libraries like mine.

Although I don't have a lot of opera, I do have some. I have a lot of masses, vespers, and other sacred and choral works. There is just no problem at all comporting those with my system. There just isn't.

Moreover, if you don't like my exact system, just design your own, to fit your own sensibilities. But you can't do that, in terms of it being automatically calculated, unless your data is consistent.

The problem isn't the size of the library, it's the librarian.

I don't mean that as a slight on you; I'm just saying that the owner of the library has to organize it and make it consistent. If you don't have consistency, you can't automate. And if you do have consistency, you can automate, regardless of the size of the library.

So it's not correct to say systems like this don't work for large libraries or libraries with lots of different types of classical music. What is correct is to say that systems like this don't work for libraries with disorganized or inconsistent metadata. Or perhaps more accurately, libraries with disorganized or inconsistent metadata don't allow for simple automated systems like this.

As I said before, I chose to normalize metadata every time I brought in new music, instead of putting it off until later. You saved time early in the process, whereas I chose to save my time later in the process.

I get what you're saying about you wish there were better tools available to help you "fix" large libraries. I have proposed some to be added to MC. But I didn't have such tools when I started my collection, which is why I made the choices I did.

You may find, if you approach it in an organized way, that the tools in MC (search & replace, expression language) are sufficient to make the process doable. Perhaps you can't eat the elephant all in one bite, but a leg at a time might work, with scraps to clean up at the end. It just depends on how much of a jumble you have. I sympathize.

wer · « **Reply #16 on:** March 16, 2021, 05:30:54 pm »

Quote from: hoyt on March 16, 2021, 11:35:59 am

I think this is where I'm getting hung up. In order to make a calculated field, the input has to follow the same rules every time. This is hard enough to do with live rock music, let alone classical.

I don't find it hard to do at all. I've tried to explain previously how to do it.

Quote from: hoyt on March 16, 2021, 11:35:59 am

I agree that wer's process is good and works well, but I don't see how that can be replicated "for the masses." ...

If I played this album, and just saw Piano Concerto #1 and the Album name of Rachmaninov; Tchaikovsky: Piano Concertos, I wouldn't know which composer was being played. So in this case, it makes sense for there to be another delimiter - but that breaks the rule. ..

In the set of 5 CDs, one was a compilation of Opera arias. The Name field on these came into MC as [Composer]:[Opera] - [Aria]. This works to put the Opera as the Composition and the Aria as the Movement. Again, this broke the delimiter rule because the first part was the Composer, not the Composition, but that's fine because I manually corrected it. The lack of a consistent delimiter will be challenging to say the least. For the others, the delimiters between the Composition and Movement Number varied between <space><hyphen><space>, <nospace><colon><space>, or <nospace><hyphen><nospace>.

This is what the Find & Replace (F&R) tool is for.

As an example, your first screenshot of the Rachmaninov & Tchaikovsky. I could fix this album very easily in two quick steps, as follows (ignore the quotation marks, that's for readability)

F&R: ":" -> ";"
F&R: " - " -> ": "

That leaves the composer there, puts a semicolon after the composer, and delineates the composition from the movement info with a colon. So in about 10 seconds, it's consistent with my expressions as they are.

If you wanted, you could use listitem to then break out the composer and put parenthesis around it, so instead of "Rachmaninov;" it would be "(Rachmaninov)". And you could modify my Composition expression to ignore the part of the name up to the first semicolon, or to ignore the first part that was in parenthesis.

You can do whatever you want. Or you can learn to use Regex() for even greater flexibility.

The goal of my post, as I think I tried to describe, was to show people that they could do this sort of thing, and a way to do it. There's no need to do exactly what I did; you can do whatever you want.

For example, if one were to refuse to accept that the same naming syntax can work for opera as it does for symphonic music, then one could still accomodate that. Since there would be a tag (subgenre or whatever) to differentiate, you could put the Composition or Movement info expressions inside big IfCase() or IfElse() statements so that it can use totally different syntax based a tag or some other condition.

Automation Systems work only with normalized data. If you have one data standard, you can have one system. If you want 6 data standards, you can have 6 systems. If you have no data standard, you get no system.

Your comment about discrete vs derived elements doesn't really seem to be about them being discrete, it seems to be about you wanting them to be ad hoc. You want to be able to put in whatever you want. That's fine, but that's arbitrary, not a system. Another option, aside from abandoning automation, would be to have three fields: [Calculated Composition], [Manual Composition], and [Composition]. You could then define [Composition] as FirstNotEmpty([Manual Composition],[Calculated Composition]). Then the [Composition] field would always display your preferred results; either what you manually entered, or the automatic value if you entered nothing.

To tie this back in with Tim and Mike, comments:

I think everyone is free to do whatever they want, and whatever works best for them. To post and say "This system can't work for me" seems to be born from one of two intentions: to justify a decision not to use it, or to criticise it as deficient. The former is unnecessary, and the latter is inaccurate. If someone posted a video on woodworking on youtube, you wouldn't post a comment saying "This tutorial can't work for a lot of people because they don't have a garage; so they'd get sawdust all over their living room."

So I think it's misleading to tell other users things like this system can't work for large libraries, or can't work unless you have only the simplest collection of classical music.

I'm perfectly happy to address implementation questions, or make improvements, or fix problems with how the expressions work, but "it doesn't work with my unstructured data" isn't one of them.

Ultimately, what I'm hearing seems to have more to do with the woeful inconsistency in online metadata. That online metadata for classical is woefully inconsistent is beyond dispute. And you will not get consistency in online metadata. Because you will not get a billion other people to agree on a standard. Any standard. So you can either fix the metadata yourself, as I did, or you can live with the effects of garbage in garbage out.

It is possible to standardize your data with effort, and it is possible to implement automatic systems with standardized data. I showed people one way of doing that, because some people want to learn and are willing to put in that effort. But everyone is free to develop a different system that better fits their needs, or to do nothing at all if that is their preference.

It's perfectly reasonable to make the decision for oneself that "oh, standardizing all this data is just too much work". I know from past conversations that Tim, for example, cares quite a bit about his classical music and data. But whether he thinks it's worth it to put in the effort to fix all the inconsistencies or not, I don't know. That's up to him. Maybe people who would want to, if it were just a bit easier, should start threads asking how they can standardize their data, just like people are always asking the best way to move their files.

The point of my thread was that, once you get some standardization, you can do a lot with it.

Good luck with whatever you decide to do...

timwtheov · « **Reply #17 on:** March 16, 2021, 06:16:01 pm »

Good rebuttal, Wer!

I think maybe I didn't emphasize what I wanted to emphasize in my write-up (and in retrospect, maybe it doesn't belong in this thread at all, as it's not really a critique of your tutorial, despite what I said at the beginning of my post above): it's not that I can't standardize a particular opera or have a particular way of doing operas in general, but since different recordings of the same opera are often tracked very differently, it becomes difficult if not impossible to make standardized meta-data across different recordings of the same work's movements/parts. With symphonies, concertos, etc., this is easy to do because they're almost always tracked the same. Operas, suites, et al are often not.

Example: two recordings of Wagner's Flying Dutchman. Karajan's on EMI has 20 total tracks, Bohm's on DG has 29. Already there's an issue with standardization. Track 1 on both is the Overture, so that seems fine; track 2 on Karajan is "Mit Gewitter und Sturm" and on Bohm it's "Hohoho! . . .," which accoriding to the booklet on Karajan was part of track 1 on his recording. Crap: there goes the standardization. And it goes downhill from there, as every other set of tracks on the two recordings is different in terms of "movement name."

That's all I was getting at with [Name] being difficult, if not impossible to standardize, again, across recordings of the same work, excluding those types I mentioned above, which are usually tracked the same on most recordings. For those, sure, it's not too hard with the expression language and the other tools you mentioned to rectify inconsistent metadata in [Name] or any other field. But as I'm going through and trying to make consistent my own track names, this is the problem I'm running into. I can (and do) still use a basic template for [Name]: composition: movement #. movement name (well, truth be told, I was inconsistent with movement #; that's definitely on me).

So I wasn't really trying to critique your system, but when someone has differently tracked recordings again and again in his/her large library because of the way record companies divided up tracks on their discs, an inconsistency introduced at the source I obviously have no control over, the kind of standardization you're after becomes problematic. It doesn't make the system irrelvant or not worthwhile for others, and I hope I didn't give that impression, as the tutorial you wrote is excellent and one I likely would've used had I not gone down the AMG/MCUtils road.

wer · « **Reply #18 on:** March 16, 2021, 06:53:46 pm »

I understand what you mean. I have a few recordings that fall into that category. By the way, I like the verb "tracked". That's an excellent way to describe how they actually divide it on the disc. So I will adopt it.

What I have generally done, when I have encountered that situation, is to be a bit more flexible in my labeling. I mean after all, if the engineers are just going to lay it on the disc whilly-nilly, then there's only so much you can do.

So for example, I have some tracks where I have listed the "movement number" as 1;2;3 or I-III, because they combined them in one track. Or if they split a movement across multiple tracks, then I might have three tracks as 2a, 2b, and 2c. I constructed the expression so it could handle things like that. I have movement #s of "Part I, 18" if that's what the piece calls for.

Perhaps I should explicitly state, (although it is implied in the original post, and even given an example in opera) that I am not trying to force all movement numbers down to an integer. That would be, to me, too difficult.

So I define the movement number as you see in the expression, and that allows great flexibility in the way those details are listed in the name.

It's also worth mentioning one of your points: If two different recording of the same piece of music are "tracked" differently, then it will be impossible to make the metadata for the respective tracks match in every detail, and you shouldn't try. Fundamentally for MC the ultimate unit is the track, and that's that. The only thing you could do would be to start combining or splitting tracks to get around that. I don't do it.

So I think the problem lies in the belief that "if you impose a system, two different recordings of the same opera should have matching metadata for each track". But it's not the system that's the problem, it's the belief. Ultimately, the way things are tracked is arbitrary, by the engineer or whomever did it: you cannot guarantee a match. So the system works if you accept that it should present metadata reasonably for that recording. It's not possible to match across differently tracked recordings. But massage the data a bit, and you can arrive at something very reasonable.

So here's exactly what I did for Beethoven's Piano Sonata #12. Ashkenazy and Kovacevich tracked it differently:

Ashkenazy
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I. Andante con variazioni
Sonata No. 12 in A flat major 'Funeral March', Op. 26: II. Scherzo & Trio; Allegro molto
Sonata No. 12 in A flat major 'Funeral March', Op. 26: III. Marcia funebre sulla morte d'un eroe
Sonata No. 12 in A flat major 'Funeral March', Op. 26: IV. Allegro

Kovacevich
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I-1. Andante con variazioni
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I-2. Variazione 1
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I-3. Variazione 2
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I-4. Variazione 3
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I-5. Variazione 4
Sonata No. 12 in A flat major 'Funeral March', Op. 26: I-6. Variazione 5
Sonata No. 12 in A flat major 'Funeral March', Op. 26: II. Scherzo (Allegro molto)
Sonata No. 12 in A flat major 'Funeral March', Op. 26: III. Marcia funebre sulla morte d'un Eroe (Maestoso andante)
Sonata No. 12 in A flat major 'Funeral March', Op. 26: IV. Allegro

One kept all the variations in the first track, the other didn't. So that's how I tagged it, and it works fine within the system.

I just make sure that the information I put in the different delimited positions of the Name conveys the appropriate information.

hoyt · « **Reply #19 on:** March 16, 2021, 07:07:07 pm »

Quote from: wer on March 16, 2021, 05:30:54 pm

There's no need to do exactly what I did; you can do whatever you want.

But you are proposing that MC do something with your method (#1 from here: https://yabb.jriver.com/interact/index.php/topic,128860.msg894607.html#msg894607). I think it would be unhelpful of MC to add this automatic parsing. Too often it would be wrong on the initial metadata, and I do not think most users would go and update their [Name] tags to reflect that. I understand that's what you're proposing, I just don't think people will do that. Unless MC adds the tooling to help users tag these items discretely upfront, they will not go writing regex to update the [Name].

I think there is nothing wrong with what you've done and in fact think quite the opposite. I am not critiquing your method. I have learned from what you did and it helped me set some of these tags in my library. It's led me to see the value in having [Composition] be a relational item. It makes sense to me that Compositions within a single Album operate as linked tracks. I think it would be great to see MC add more things into the "Fill Track Numbers from List Order..." function (like [Movement Number], or custom fields like [Movement #]). I think it would be really great to see an extra grouping in the main 'files' dialog by Composition (rough Photoshop copy/ paste, maybe you'll get the idea):

You're proposing changes to the way that MC operates based on how you've established these tags. In my mind, more users would update the individual tags manually instead of adopting a precise convention for entering the [Name] tag.

wer · « **Reply #20 on:** March 16, 2021, 07:42:08 pm »

As I explained in the other thread, the automatic parsing, to which you object, is one point out of seven (now six). Everyone wanted to complain about automatic parsing, and no one could look at the big picture.

And I would also remind you that the automatic parsing proposed in the original thread was for Composition only. I quote:

Quote

2. The ability to automatically recognize the Composition from the track names. I have described this formula before:
...
Movement info is extraneous and not required for Composition support.

Which meant a single colon, which is a widely used standard, not just mine. People who think a single colon can't be achieved in their names, those are people who can't be helped. I'm sorry you won't support a colon.

To be quite honest, I have completely lost interest in advocating for automatic parsing as part of MC. There has been too much complaining that "it doesn't work for me" instead of constructive contributions on a better standard. It's not worth my time to continue to fight for one small part of the enhancement, to the exclusion of everything else.

You see, I am perfectly capable of writing my own expressions to parse however I want. A lot of people are not. So I will not suffer in the slightest if there is no automatic parsing. Other people, less experienced users, newer users, users with less proficiency in the expression language, will suffer. I won't.

This thread exists to help people who want to do it, since they won't be getting any automation. People who don't want to do it, well this isn't the thread for them.

So as I said in the other thread, I am perfectly willing to move on and focus on the other points. And as far as the other points are concerned, the objections you have raised to standardization are irrelevant.

All the other benefits depend only on MC recognizing a Composition field and doing things accordingly. Re-read the other thread if you are unclear on that. So people can populate the Composition field however they want, and if they don't know how, they can just type it in manually or do nothing like they're doing now.

People don't seem to recognize that perfect is the enemy of good. Everyone wanted something perfect for themselves, so there couldn't be any agreement, so there won't be any automation at all. So this thread isn't here to talk about why there shouldn't be any automation. The nay-sayers have won that argument already; there won't be any.

This thread is here to help the people who want to learn to do that automation for themselves.

MikeO · « **Reply #21 on:** March 17, 2021, 02:19:23 am »

Sorry I was not decrying your effort in anyway , I agree fiddling in Expression Language and the like is not everybody's forte . I was simply trying to point out the mess and variability in classical metadata.

In my opinion knowing what I do of such metadata and automatic parsing would like create a mess that needs to be corrected in a large proportion of cases and would not lead to the standardization you want

If I could make one hopefully constructive comment.

Given that each new input to the library needs some manual intervention to standardize the [Name] so that it can be parsed and split across the colon to its component parts

Given that Classical Metadata is a MESS of note

Given that most Music Players MC included display the [Name] tag when playing

Would a reverse process have more value. Spend the manual time and effort ensuring that [Composer] ,[Composition] , [Key],[Opus No.] , [Movement #] , [Movement] and [Nickname] are standardized and correct and then from those Component Tags construct the [Name] tag to a standard of your choice.

eg [Name] tag = [Composer]: [Composition] in [Key] Op.[Opus No]: [Movement #]. [Movement]

That way what is displayed is standardized. Manual intervention would be the same either way

Just a thought

PhDSM · « **Reply #22 on:** March 17, 2021, 01:21:28 pm »

Here is my contribution to this subject

Regarding tagging, I've created a set of regex formulas that i use to extract, opus, movement#, work# directly from 'name'
Here are some examples for those who are interested :

From a name like "String Quartet No1 in A minor Op.41, I allegro"

to extract opus
=replace(regex([name], /#^(.*\W|)(Op\.?\s?[0-9]{1,3})(\W.*){0,1}#/,2,0),/ ,)

to extract BWV / BMV
=replace(regex([name], /#^.*\W(B[WM]V\.?\s?[0-9]{1,4})(\W.*){0,1}#/,1,0),/ ,)

to extract K. or KV.
=replace(regex([name], /#^.*\W(KV?.?\s?[0-9]{1,3})(\W.*){0,1}#/,1,0),/ ,)

To extract composition #:
=regex([name], /#^.*(?:No|n°)\.{0,1}\s{0,1}(\d+).*#/,1,0)

to extract Movment number in arab form from roman in name:
=listfind(0;I;II;III;IV;V;VI;VII;VIII;IX;X;XI;XII;XIII;XIV;XV;XVI;XVII;XVIII;XIX;XX;XXI;XXII;XXIII;XXIV,regex([name],/#^(.*\W|)([IVX]{1,5})\W.*#/,2,0),,1,1)

to extract the movement name after the roman number :
=regex([name], /#^(.*\W|)[IVX]{1,4}\W(.*)#/,2,0)

to extract the tone note :
=regex([name], /#^.* in ([ABCDEFG])(\W.*){0,1}#/,1,0)

to extract the tone scale type (Min/Maj) :
=regex([name], /#^.*\W[ABCDEFG] (sh |sharp |fl |flat |)(Min|maj)(or){0,1}(,{0,1}\W.*){0,1}#/,2,0)

to extract the tone note alteration (Sharp/Flat)
=if(regex([name], /#^.* in [ABCDEFG]\W(sh)(arp){0,1}(\W.*){0,1}#/,0,0),sh,if(regex([name], /#^.* in [ABCDEFG]\W(fl)(at){0,1}(\W.*){0,1}#/,0,0),fl,)

Phil

wer · « **Reply #23 on:** March 17, 2021, 01:41:04 pm »

Thanks, Phil.

For anyone who thinks these expressions work for them, you can use them to create calculated data fields as described in my original post.

If there's a catalog number, personally I just leave it as part of the Composition. I like it to be searchable, but I don't have any need for it in a separate field.

The different catalog number expressions could also be combined to return the catalog number regardless of type.

glynor · « **Reply #24 on:** March 17, 2021, 03:41:15 pm »

You could also use them as a Tag on Import rule and they'll auto-parse and assign to the relevant fields at Import time. That means you'd need to normalize your [Name] tag before import, but if you already do that...
https://wiki.jriver.com/index.php/Tag_on_Import

oie · « **Reply #25 on:** March 26, 2021, 05:52:13 pm »

Many thanks wer, great post.
Very helpful for people like me who don't know how to create/use expressions.
Not perfect, nothing is, but great way of standardising tags in works like concerts, sonatas and symphonies.
Thanks again,
Oscar

EnglishTiger · « **Reply #26 on:** April 25, 2021, 09:18:42 am »

Unlike some people around here my primary motive for moving my Classical Music Collection into MC was so that I could listen to it without having to get up every now and then to change the disc in the CD-Player. Like everybody else I've ended up with Metadata that is a total mess mainly because there isn't a single scrapable web-site that provides Metadata in a single consistent format and there probably never will be. I've even got some tracks where the track name is in Japanese and even though it was a CD that I ripped when I put those Track Names into a translator it tells me they came from a DVD, oh and the site that provided that metadata charge. I am also aware that sorting out that mess is going to take time and effort, something I'm prepared to do as my Grand Children will probably be dead long before anybody will be able to come up with a way of doing it automatically.

After reading through Wer's original posting, spending some time thinking about it, seeking some clarification and what tags i wanted/needed I worked out that with some slight modifications Wer's methodology could be used on every different type of Classical Music. Yes I know that things like Ballet, Opera and a few other Composition Types use Acts, Scenes, Parts, etc. but unless you are determined to add a Tag for every variant of what is technically a "Movement" does it really matter if they appear in MC alongside a different "Tag Name".

The fields/tags I decided on, that could be Automatically Parsed from the Name tag/field, were Composition, Composer's Catalogue #, Movement Name, Movement # and Total Movements (a replacement for the Movement Count tag).

Just like Wer I decided that everything before the 1st : in the Track Name, including the opus/catalogue number was the Composition and everything after it was the Movement.
But since I also wanted to extract the opus/catalogue number info, if it was present, to a separate tag I placed a ; in front of it.

However I differ slightly from Wer when it comes to breaking down the "Movement".
When it comes to things that use Acts, Scenes, Parts, etc. I don't want it trying to extract any part of the "Movement Name" to form the "Movement #" so I remove any periods "." from the Movement part of the Name. Likewise there is an inconsistency when it comes to using either Roman or Arabic Numerals for Movement Numbers. so decided to only use Arabic Numerals followed by a period ".", with no prefixes like No. etc.,

So in my Classical Music Collection you will find these 3 Tchaikovsky Tracks:-
The Nutcracker Suite; Op. 71a, TH 35, ČW 32: 4. Danse russe - Trépak
The Nutcracker; Op. 71, TH 14, ČW 14: Act 2 No 12 Divertissement Trepak - Russian Dance
Symphony No. 1 in G minor "Winter Dreams"; Op. 13, TH 24, ČW 21: 1. Allegro tranquillo

And yes the 1st 2 are the same piece of music but the 1st one is from the Concert Suite, and has a movement number whilst the 2nd one is from the Full Ballet Score and doesn't.

The other change I made was to modify Wer's Calculated Expressions to restrict MC from doing anything with tracks whose Genre is not "Classical"

For "Composition I use

Code: [Select]

If(IsEqual([Genre],Classical,1),ListItem([Name],0,:),)
For Composer's Catalog #

Code: [Select]

If(IsEqual([Genre],Classical,1),ListItem(ListItem([Name],1,;),0,:),)
For Movement Name

Code: [Select]

If(IsEqual([Genre],Classical,1),ListItem([Name],1,:),)
I'm not sure if I've got the Expression wrong or there's an error in the way ListItem works but I've found that if the Movement Part of the Name contains a : only the part before that : ends up in the Movement Name

For Movement #

Code: [Select]

If(IsEqual([Genre],Classical,1),If(IsEqual([Movement Name],.,8),ListItem([Movement Name],0,.),),)
Total Movements - since the Same Composition can appear on Different Albums and by Different Artists even on the Same Album I Use

Code: [Select]

If(IsEqual([Genre],Classical,1),ItemCount(/[Album/]/[Artist/]/[Composition/]),)
I use I use Find & Replace Edits for everything else, like converting Movement Numbers from Roman to Arabic Numerals or removing non-required punctuation marks.

Oh - this morning I used the above methods to get the Names of every movement from Franz Joseph Haydn's 107 Symphonies (a total of 414 individual tracks/files) into the format I want them to use.

So yes getting the mess sorted out is going to take time and manual effort but so does making a cup of coffee, or switching on my PC and Speakers so that I can listen to my music.

INTERACT FORUM

Author Topic: How to Automatically Parse Composition and Movement info in Classical Music (Read 5206 times)