INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Expression Question  (Read 734 times)

timwtheov

  • Galactic Citizen
  • ****
  • Posts: 354
Expression Question
« on: December 13, 2020, 03:32:13 pm »

I have the following expression, mostly copied from the last Regex example on the Wiki:

Code: [Select]
Regex([Vocalist(s)],/#([\w\s\.]+(?=\s\())#/,-2)
I adapted it to get rid of role parentheses (e.g., Jerome Hines (baritone), Herold Kraus (tenor)) in [Vocalist(s)]. The one problem is that it doesn't account for hyphenated last names: Peter Roth-Ehrang (bass), for example, returns Ehrang.

I was trying to parse the expression using Google, since a lot of this language isn't on the wiki, to fix it myself to account for these. But I'm kind of at a standstill here.

I see that \w searches for alpha-numeric (?) characters in a string and \s searches for other characters like underscores and the like. Not sure what that . does after the \s in the brackets, or what the + does. ?= matches any string followed by a specific string n, though I'm not sure what that means: something to do with the parentheses? That's what I changed from the wiki to get it remove the roles in parentheses instead of brackets. And speaking of brackets, why are those included in the first half of the parentheses after the /# and why parentheses in the second half, i.e., [\w\s.] but (?=\s\())?

So: how would I add in whatever to account for hyphenated names and if someone wouldn't mind taking the time, maybe go through the syntax and explain what it's doing? If not the latter, then at least the former, again, if someone wouldn't mind.

Thanks much!
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640
Re: Expression Question
« Reply #1 on: December 13, 2020, 03:57:23 pm »

I don't have time to dig into the regex today, but I'll give you this tip. If you're trying to learn to parse regular expressions, try these two tester sites. They're pretty good at explaining what each clause does.

https://regexr.com/
https://regex101.com/
Logged

timwtheov

  • Galactic Citizen
  • ****
  • Posts: 354
Re: Expression Question
« Reply #2 on: December 13, 2020, 04:26:09 pm »

Will do. Thanks!
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2618
Re: Expression Question
« Reply #3 on: December 13, 2020, 04:38:03 pm »

Careful what you wish for...  ;D

Here's a couple of good resources for learning the basics:
https://www.shortcutfoo.com/app/dojos/regex/cheatsheet
https://cheatography.com/davechild/cheat-sheets/regular-expressions/

To fix that particular case, you can just add the dash inside the square brackets:
Code: [Select]
Regex([Vocalist(s)],/#([\w\s\.\-]+(?=\s\())#/, -2)
A more flexible Regex would be this one (capture anything until it finds a "(" or reaches the end of the text):
Code: [Select]
regex([Vocalist(s)],/#(.+?)\s*[\($]#/, 1)
Here's a breakdown of the first regex:
\w = any word char: A to Z, a to z, underscore
\s = whitespace
\. = the dot character (backslash is the escape character; just the dot "." means "any character", but "\." means the dot character literally.
\- = the dash character, also need escaping when used inside ranges [ ... ]
[\w\s\.\-]    = a range - matches one of the 4 above
[\w\s\.\-]+  = match any of those chars 1 or more times (plus)
(?= ... )    = "followed by"; this is called a positive lookahead; it means that the next text needs to match this, but it won't be included in the output
\(  = open parenthesis symbol, escaped. Same as the dot above, the "(" is special so needs to be escaped
(?=\s\()    = "followed by" a space and an open-parenthesis

So the full expression is:
([\w\s\.-]+(?=\s\())   = letters, spaces, dot or dash, one or more times, which are followed by a space and a parenthesis.

Note that this won't work if the [Vocalist(s)] has no open-parenthesis, since the lookahead won't find anything to match.


The other one is:
.+  = any character, one or more times
(.+)  = these parenthesis make this a "capture" group. The last arg in the Regex() function ("1") specifies which capture group we want for the output.
\s*   = whitespace characters, 0 or more times. Meaning, an optional space
[ ... ]  = again a range, matches one of the chars inside
$ = end of string position (not a literal char)
[\($]  = matches an open-parenthesis, OR the end of the string (meaning, the parenthesis is optional too!)

To sum it up:
(.+)\s*[\($]  = any number of characters which are *optionally* followed by spaces, open parenthesis, or nothing.
 
Logged

timwtheov

  • Galactic Citizen
  • ****
  • Posts: 354
Re: Expression Question
« Reply #4 on: December 13, 2020, 06:19:52 pm »

Thanks so much, Zybex! Really helpful, as is, again, Zelda!
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640
Re: Expression Question
« Reply #5 on: December 13, 2020, 09:32:41 pm »

I came back to follow up on this, but I see Zybex has sorted you out.  And to my chagrin I also see that I should perhaps have made time earlier, as Tim is using regex -2, which was my fault after all. Sorry.  ;)

The first answer Zybex gave is the better one.
Code: [Select]
Regex([Vocalist(s)],/#([\w\s\.\-]+(?=\s\())#/, -2)
The second one, the "more flexible Regex", will only return the first match, and I'm expecting your Vocalist field may have more than one entry, but you won't know how many entries it may have.  That's exactly the issue that the -2 option was designed to solve, so stick with the first one. 

However, you might want to consider this:
Code: [Select]
Regex([test],/#([\w\s\.\-]+(?=\())#/, -2)This version will work regardless of whether there is a space preceding the open-paren or not. The first version will not return a name if the paren follows without a space (e.g., "Jerome Hines(baritone)" ) Maybe your list is well and consistently formatted, and maybe it's not.

It's also worth mentioning that in MC, you can sometimes get regex results that will diverge from what you get in online testers or more standard engines...

To quote myself as was inserted into the wiki:
Quote
Important Notes
Proficient users of Regex should be aware of the following points regarding Media Center's implementation:

        1: The Regex() function will ONLY return results for things that are explicitly placed in a capture group, for example:
        [\w\s]+ is a recognised regular expression, however, in order for this to work in Media Center, it must be modified, like so: ([\w\s]+)
        Putting parens around it creates a capture group, and that is the only way MC will return a result.

        2: All Regex() modes except -2 will return exactly one result for each defined capture group.

        3: Care must be taken when working with strings of indeterminate length, or lists with an indeterminate number of elements, as if you have more capture groups in your expression than there exist matches in the string, Regex() will return nothing at all.

Regex mode -2 was devised specifically to get around the above issues, without which, it would be almost impossible to deal with lists with a variable number of elements.

But as with any regex, there's more than one way to skin the cat.

Have fun...
Logged

timwtheov

  • Galactic Citizen
  • ****
  • Posts: 354
Re: Expression Question
« Reply #6 on: December 13, 2020, 10:14:00 pm »

Quote
The second one, the "more flexible Regex", will only return the first match, and I'm expecting your Vocalist field may have more than one entry, but you won't know how many entries it may have.  That's exactly the issue that the -2 option was designed to solve, so stick with the first one.

Yes, you're right Wer, that's why I originally was going to use -2, even before I saw the sample on the Wiki that was pretty close to what I wanted to do.

And yes, most of the lists I copy from various sources are decently formatted, but since you never know, thanks for that!
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640
Re: Expression Question
« Reply #7 on: December 13, 2020, 10:26:26 pm »

If I remember correctly, the example you're referring to in the wiki was derived from a similar situation quite some time ago where I was helping someone parse their [Actors] field.  If you look in that "Tooltips" thread I think you'll find it.  They were wanting to do almost exactly the same thing as you were doing.

I'll also mention that this sort of thing can now also be accomplished without using Regex, but by using some of the new (or newly enhanced) list manipulation functions if you use them cleverly; some of those were Zybex's pet ideas. He loves those list functions!
Logged

timwtheov

  • Galactic Citizen
  • ****
  • Posts: 354
Re: Expression Question
« Reply #8 on: December 13, 2020, 10:33:10 pm »

Quote
I'll also mention that this sort of thing can now also be accomplished without using Regex, but by using some of the new (or newly enhanced) list manipulation functions if you use them cleverly; some of those were Zybex's pet ideas. He loves those list functions!

That's what I was originally going to use to try to get rid of the role parentheses, I think with a Listremove(). But then I saw the regex() example on the Wiki, and that's exactly what I was looking for. It's also good to force myself to learn some of the ins and outs of regex, which I've largely avoided after some bad (frustrating) initial attempts with it.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2618
Re: Expression Question
« Reply #9 on: December 14, 2020, 03:06:00 am »

Right, thanks Wer - for lists, mode -2 is better.
Also for this case, using ListMix() and such would only complicate it further.
Logged

wer

  • Citizen of the Universe
  • *****
  • Posts: 2640
Re: Expression Question
« Reply #10 on: December 14, 2020, 03:32:07 am »

It's also good to force myself to learn some of the ins and outs of regex, which I've largely avoided after some bad (frustrating) initial attempts with it.

Tim, if you're feeling confused, abused, and perhaps a little bit violated after your first encounter with regular expressions, I would say everything is normal and the grammar is having its intended effect on you. 

Just keep going, and eventually you won't feel a thing anymore.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2618
Re: Expression Question
« Reply #11 on: December 14, 2020, 04:12:31 am »

Quote
Just keep going, and eventually you won't feel a thing anymore.
Yes, it's so bad it eventually makes you lose all sensibility, you'll be completely numb ;D

Keep in mind that Regex is not adequate at all for many tasks. For instance, here's the full Regex needed to completely validate an email address according to the RFC spec:
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Everything is a nail when you only have a hammer...
Logged

timwtheov

  • Galactic Citizen
  • ****
  • Posts: 354
Re: Expression Question
« Reply #12 on: December 14, 2020, 09:03:09 am »

Good to know. Looking forward to feeling comfortably numb in a few days/weeks/months/years.  :)
Logged
Pages: [1]   Go Up