INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Library field and how to trim a field -> get a tex from a multi value field  (Read 1327 times)

henning65

  • Junior Woodchuck
  • **
  • Posts: 65

Problem solved --> see last post
------------------

Hello

I would like to ask for advice on how to set up an expression to calculate a library field.

Step 1
I set up a costume library field PERFORMER to get information (performer) from the audio metadata into the MediaCenter Library.
I designed a new library field PERFORMER to get the multi values from audio files. (this works - I can see the data in the new generated library field PERFORMER). The data are multi-value. The different data are separated by ";". The orchestra performer is highlighted by an extension "(orchestra)".

Step 2
I try to extract the names of performing orchestras from the new created library field PERFORMER. I have problems to trim the data and delete all data not related to the orchestra. To set up the calculation I do only have the information "orchestra" and the seperator ";"

I do need help to set up the expression.
I would be glad if someone could help.

With regards
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640

Show the expressions you have written so far. Also show the input, and the exact output you desire.

Also, you've been using the wrong word. It's not a "costume" field, it's a "custom" field.  The two words have totally different meanings in English.
Logged

henning65

  • Junior Woodchuck
  • **
  • Posts: 65

@wer: thanks for helping!

I started to work on the expression, but failed. There is not really anything to start working on.

I try to make it as transparent as possible, what I try to achieve:

It's about classical music, and the file multi value metadata field "performer" (FLAC). Please find enclosed a picture of on track, showing the different performers. The picture is taken from a MusicBrainz Picard window: MP Picard is a software which can perform tagging based on the MusicBrainz database.

I was able to define a simple JR MeciaCenter Library field "PERFORMER" to be populated with the values from the file metadata field "performer". Please see picture two taken from the tags as they show up in JR MediaCenter.

The library field "PERFORMER" now has the relevant information (the fact, that there is an orchestra performing AND the name of the orchestra) but beside that other content in the "PERFORMER" library field, which is not relevant for now.

For the custom library field ORCHESTRA:
IF the track without an orchestra, the field should be empty.
IF the track has an orchestra listed in the "PERFORMER" library field, the name of the orchestra should be calculated.

EXCEMPLE:
If the track has the PERFORMER field information:
Alison Browner (alto vocals);Chorus Musicus Köln (choir vocals);Wilfried Jochens (tenor vocals);Markus Schäfer (tenor vocals);Angela Kazimierczuk (soprano vocals);Peter Lika (bass vocals);Franz-Josef Selig (bass vocals);Das Neue Orchester (orchestra)

than the ORCHESTRA field should have: "Das Neue Orchester"

The problem: the number of values in the multi value field performer varies from track to track. Also the position of the orchestra in the list of values can be different (not always at the end of the list). Also the name of the orchestra is different from album to album...

As posted in my first post: To set up the calculation I do only have the information "orchestra" and the seperator ";"
Somehow - I guess - the calculation must work like: "search for "orchestra" and take all letters 2 letters left from "orchestra" till ";"

Thanks for your help!

Yours

Logged

henning65

  • Junior Woodchuck
  • **
  • Posts: 65

Actually - If I learn how to extract the Orchestra Information from the Performer-multi-value field - I could do interesting things with other informations like:
-- special instruments (classical music)
-- other performers like soloists (classical music)
-- search for special vocal performers for audio drama (sorting like for conductors & other artists)

So ... your help is very Wellcome!
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640

...
For the custom library field ORCHESTRA:
IF the track without an orchestra, the field should be empty.
IF the track has an orchestra listed in the "PERFORMER" library field, the name of the orchestra should be calculated.

EXCEMPLE:
If the track has the PERFORMER field information:
Alison Browner (alto vocals);Chorus Musicus Köln (choir vocals);Wilfried Jochens (tenor vocals);Markus Schäfer (tenor vocals);Angela Kazimierczuk (soprano vocals);Peter Lika (bass vocals);Franz-Josef Selig (bass vocals);Das Neue Orchester (orchestra)

than the ORCHESTRA field should have: "Das Neue Orchester"
...

Since the number of entries in the string is arbitrary, and the position of the target within the string is also arbitrary, the best way to do this would be to use the newly created regex -2.

This expression will return the value of the orchestra for the input example you gave:
Code: [Select]
Regex([Performer],/#([\w\s]+)(?= \([Oo]rchestra\))#/,-2)
The [Oo] in Orchestra will match whether the word is capitalized or not.

This code is a good regular expression and it works. I do not at the moment have access to a copy of MC to test, so I can't verify if MC's idiosyncratic implementation of regular expressions can digest this.  If it can't, you can either work it out for yourself or perhaps Zybex will happen along and give you any necessary correction.

You can learn about using Regex with MC here:
https://wiki.jriver.com/index.php/String_Manipulation_Functions#Regex
Logged

henning65

  • Junior Woodchuck
  • **
  • Posts: 65

@wer: Thanks for your help. The function works with some - but important - exceptions:
a) Orchestras with a name containing special letters like éèêëâäãîïí cause malfunctions. I guess these special letters are simply not covert by "\w"?
Do you have any idea how to include these letters?

b) Another malfunction results from a wrong assumption I had: I assumed that only one orchestra will be present on a single album. That's wrong - or let's say - the metadata list for some tracks show more than one orchestra. This results in two - in one case even three orchestras populating the library field.

At all I believe a data type "list (semicolon delimited) would suit my goals best. I learned, that other MC library fields (like "conductor") which might have more than one value ("conductor A" and "conductor B), simple list the track under both values. This is most-likely wrong - but the "behavior" of MC is -to my judgement- more user friendly.
For the custom MC library field ORCHESTRA: If two orchestras are found in the library field "PERFORMER", MC generates a view, where the two orchestras are both listed, separated by ";" - actually this is not very user friendly as the idea was to be able to see which albums from a certain orchestra I do have (to listen orchestra focused you may say) - now that's all mixed up again...

I tried to change the MC Library field data type from string to "list (semicolon delimited)":
If I change only the setting for "ORCHESTRA" - not possible - "jumps" back to string.
If I change the setting for "ORCHESTRA" and "PERFORMER": the calculation no longer works (field "ORCHESTRA" empty).


The goal would be
a) include orchestras with "special letter" in name
b) find a way to preserve the data type "list (semicolon delimited) (as only than the different orchestras are separated and listed individually, as they should
I hope the regex function can be used to calculate a list?

I would be happy for any advice.

Have a nice evening.
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640

You'd better start learning regular expressions. :)  Here's one tutorial; there are dozens.
https://regexone.com/

It's good that you partly understood what \w is for. But \w isn't what matches the string Orchestra; that's the later part. Matching accented characters is a common question.  You should google "regex accented characters" and you will immediately see many posts that give you values to substitute for that part in the match string.  Try plugging some of them in until you get what you want. Hint: [A-Z] will match all uppercase non-accented letters.

Report back here with your successful attempt so others can learn along with you, if you want to contribute to the community.

Regarding the list data type, yes that can be done. It is called type casting. I just helped someone else with this recently.  Adding
Code: [Select]
&datatype=[list] to any string (like the output of a regex expression) will cause MC to treat the string as a list.

Read this post for an example:
https://yabb.jriver.com/interact/index.php/topic,127840.msg887089.html#msg887089
Logged

henning65

  • Junior Woodchuck
  • **
  • Posts: 65

@wer

Thanks!

I ended up adding all non-latin-letters, as I couldn't find an expression capturing all of them - I also had to include "\-"
That's how the expression looks now:

Regex([PERFORMER],/#([\w\s'üýäàáâæãåāéèêëėîïíīìöôòóõœøōßśšÿçćčñńÁÛØ∏ÅÍÏÌÓÙÇflŒÆÄÀÁÂÆÃÅĀÉÈÊËĖÜÛÙÚŪÎÏÍĪÌÖÔÒÓÕŒØŌŚSŠŸÇĆČÑŃ\-]+)(?= \([Oo]rchestra\))#/,-2)&datatype=
    I wouldn't have been able to get anywhere near without your help - thanks for this.

    And I will take up your suggestion and write a little "how to".

    If you have some advise on how to make the expression a little more elegante?! - you are Wellcome!

    thanks!

    Henning
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640

You mean something like [À-ÿ] ?

Remember my hint. You can use a range of characters.

Also, look at your most recent post. When you post an expression with type casting here on the forum, you have to include it in a code quote (use the # button in the post editor). You'll notice for forum stripped the trailing list cast from your regex. Others reading it will be misled.

But congratulations on getting your first one working!
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619

The issue with non-standard chars can be avoided with ListGrep:
listgrep([Performer], /(orchestra)        -> returns "Das Neue Orchester (orchestra)"   (not case sensitive, also matches Orchestra)

Removing the "(orchestra)" part then requires some simpler regex - this should work for you:
regex(listgrep([Performer], /(orchestra), /#(.+) \(#/, 1)

PS: While testing a non-regex method, I found an undocumented "feature"; the replace() function will remove trailing "()" from the result:
replace(listgrep([Performer], /(orchestra), orchestra, )     ->  returns "Das Neue Orchester", where I would have expected "Das Neue Orchester ()"

(Note that this last one is case sensitive, so it will fail with "(Orchestra)")

EDIT: fixed some escape chars, I had used \( instead of /(
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #10 on: December 22, 2020, 01:49:03 pm »

Here's another solution with regex: capture anything except ";" (skips non-matching list entries) that is followed by " (orchestra)". This one works better if there's more than one Orchestra listed in the field:
Regex([Performer],/#([^;]+)(?= \([Oo]rchestra\))#/,-2)
or
Regex(listgrep([Performer], /(orchestra), /#([^;]+) \(#/, -2)
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #11 on: December 22, 2020, 02:03:39 pm »

I'd forgotten ListGrep!

You don't need the regex to remove the (orchestra) part; just split it on the ( with listitem and the even numbered items will be the values (starting with zero).
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #12 on: December 22, 2020, 02:14:22 pm »

You don't need the regex to remove the (orchestra) part; just split it on the ( with listitem and the even numbered items will be the values (starting with zero).

True, but might fail on entries with extra parens like "Some guy (and his mother) (orchestra)"
Logged

henning65

  • Junior Woodchuck
  • **
  • Posts: 65
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #13 on: December 22, 2020, 02:56:29 pm »

@zybex: that is very helpful - Thank you
@wer: your advice was very helpful - learned a lot!


To extract the Orchestra I took:
Regex([PERFORMER],/#([^;]+)(?= \([Oo]rchestra\))#/,-2)&datatype="list"
I added datatype="list"
I created another custom field for choir:
Regex([PERFORMER],/#([^;]+)(?= \([Cc]hoir vocals\))#/,-2)&datatype="list"

Now - in the last step I try to get the solists from the list PERFORMER (that's still not perfect - but close enough). To do so, I try to calculate a custom MC library field SOLIST. With SOLIST=PERFORMER-CHOIR-ORCHESTRA-composer-conductor (this four "artists" have their own categories now - that's why I like to eliminate them here.
I tried to work with "?!" and with "^", but still can't get the function working. I would be happy, if you could give me an idea of how to formulate the expression correctly.

Thanks for helping!
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #14 on: December 22, 2020, 03:05:34 pm »

Using the negative lookhead "?!", capture all except ";" which is NOT followed by orchestra/choir/conductor/composer:
Regex([Performer],/#([^;]+) \((?!orchestra|choir|conductor|composer)#/,-2)

Edit: looks like Regex mode -2 is not case sensitive - no need for [Oo]
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #15 on: December 22, 2020, 03:24:55 pm »

If you want to keep the roles for the remaining entries, you can also do this:
listRemove(listremove(listremove(listremove([Performer],/(orchestra,2),/(choir,2),/(conductor,2),/(composer,2)

This returns:
Alison Browner (alto vocals);Wilfried Jochens (tenor vocals);Markus Schäfer (tenor vocals);Angela Kazimierczuk (soprano vocals);Peter Lika (bass vocals);Franz-Josef Selig (bass vocals)

Where the previous Regex one will return just the names:
Alison Browner;Wilfried Jochens;Markus Schäfer;Angela Kazimierczuk;Peter Lika;Franz-Josef Selig

(just noticed that the ListRemove() documentation is wrong. The working syntax is ListRemove(List, String|Index, Mode), with just 3 args for all modes)
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #16 on: December 22, 2020, 03:28:27 pm »

I don't have access to MC atm so I was just doing this from memory; I can't test any of it. If -2 is not case sensitive that's helpful. I just gave something I thought likely to work.

I have seen the MC regex implementation do some squirrelly things with negative lookaheads and lookbehinds in the past, so I wouldn't want to quote those to people too much without testing in MC. Not sure that support is robust.  I do remember it does not support some of the common flags, like \K to drop a match.

Henning, all of this goes back to something that I told you earlier: it is not difficult to formulate a perfectly good regex that works properly in all the online evaluation engines, but does not work in MC. So you could well find "cut and paste" examples elsewhere on the web that are in fact correct, but still don't work in MC.  This is a reality we just have to work around.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #17 on: December 22, 2020, 03:36:42 pm »

I have seen the MC regex implementation do some squirrelly things with negative lookaheads and lookbehinds in the past, so I wouldn't want to quote those to people too much without testing in MC

Indeed. I'll add to this that lookaheads/lookbehind can be a bit dangerous because they can cause the Regex evaluation to take an exponentially long execution time with some input strings. If MC doesn't run them with a timeout it can cause it to hang or become sluggish if you're unlucky and have some weird field content. Other Regex constructs such as chained masks "(.+?).*" are also dangerous and should be avoided.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Library field and how to trim a field -> get a tex from a multi value field
« Reply #18 on: December 22, 2020, 03:46:21 pm »

Shameless plug to my Expression Editor tool:
https://yabb.jriver.com/interact/index.php/topic,125975.msg872893.html

@henning65, you might find that useful as a playground to develop expressions :)

EDIT: ... for Windows.
Logged
Pages: [1]   Go Up