INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Calling regex/string manipulation gurus  (Read 2107 times)

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Calling regex/string manipulation gurus
« on: May 21, 2019, 05:45:34 am »

Have strings in the following format:

Jon Anderson (lead vocals); Steve Howe (guitar); Billy Sherwood (guitar); Chris Squire (bass guitar); Alan White (percussion drums (drum set)); Igor Khoroshev (keyboard)

And want to remove all the bracketed text from the strings, to leave:

Jon Anderson; Steve Howe; Billy Sherwood; Chris Squire; Alan White; Igor Khoroshev

Been looking at regex()/list/clean stuff all morning, but so far no joy...

Anybody??
Logged

mattkhan

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 4226
Logged

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: Calling regex/string manipulation gurus
« Reply #2 on: May 21, 2019, 07:21:15 am »

Stumbled upon that page myself while scratching my head, but couldn't get any of the solutions to work.  Will take another stab...
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8014
Re: Calling regex/string manipulation gurus
« Reply #3 on: May 21, 2019, 09:53:56 am »

The problem with these kinds of tasks in MC is that MC has no concept of substitution of characters or patterns while in the Regex function.  For example, in perl or a similar language you could do this:

s/\(.+\)//g

That takes everything in the parenthesis, including the parenthesis themselves, and substitutes them for something else.  The "something else" goes between the last 2 // characters.  So in this case, nothing.  So it just deletes them.  The "g" says to "do this on the whole string".  As opposed to doing it on just the first match it finds and stopping.

MC Regex() lets you match and let's you do references.  So you could make a reference to the stuff before the first parenthesis and print it out.  But then what do you do with the next set (after the semi-colon)?  ...and the next and the next and the next?  You can write big messy expressions to try to deal with N number of repeats, but in this case, they contain ";" so it gets harder.

I'm not aware of a clean way of doing this in MC.  But someone might be able to do something with Global Variables or something.

It would be nice if MC could include functionality like I described above, but it probably requires a lot of thought to make sure that the right stuff is included and that it's not too dangerous. With the wrong expressions like the one above, you could wipe out data *quick*.

Good luck Mark.

Brian.
Logged

Moe

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 718
  • Hi
Re: Calling regex/string manipulation gurus
« Reply #4 on: May 21, 2019, 11:08:17 am »

It sure would be nice if the find and replace tool supported regular expressions.  That would take care of your problem really easily.
Logged

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: Calling regex/string manipulation gurus
« Reply #5 on: May 21, 2019, 11:11:39 am »

Thanks, Brian.  My conclusion too.  I'm pretty decent with regular expressions elsewhere but couldn't figure it out in MC due to lack of support for global searches/captures.  I did end up created a big rule that sort of worked but ran out of [Rn] save slots in which to capture the results I was after.

Would be nice if MC could add a way to populate the [Rn] with a /g global regex rule modifier.

Was wondering whether there was some list magic that might also work, but failed to come up with anything useful there either.

 I'll probably end up offloading to a script in order to process or just give up on what I'm trying to achieve.


Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8014
Re: Calling regex/string manipulation gurus
« Reply #6 on: May 21, 2019, 12:04:00 pm »

This kind of task is probably best suited to an import/export cycle with processing externally, or using something like MrC's MCUtils which let you use all the power of perl to manipulate MC fields.

Again, good luck!

Brian.
Logged

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: Calling regex/string manipulation gurus
« Reply #7 on: May 22, 2019, 05:13:17 am »

Well, this monstrosity fits my needs:

listcombine(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex(if(isequal(regex([performer (no guests)],/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),/#^(([^\(]+)\([^\)]+\)+(.*)|(.*))#/,-1)[R1][R2][R3],(,8),[R2][R3],[R1]),)

Matt, any chance of updating the Regex engine to support global lookups that auto-populate the [Rn] saves?


Logged

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10933
Re: Calling regex/string manipulation gurus
« Reply #8 on: May 22, 2019, 05:23:32 am »

I'm not sure thats what you really want. What we have is a "regex_match" function - extracting information from a given string, what you want is "regex_replace", what those s// strings typically indicate, this has nothing to do with global lookups or anything of that.
Logged
~ nevcairiel
~ Author of LAV Filters

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: Calling regex/string manipulation gurus
« Reply #9 on: May 22, 2019, 05:29:02 am »

I'd like to be able to do something like:

regex([string],/#([\d\d])#/,global flag)

 and have MC populate [R1],[R2],[R3],[R4],[R5],[R6],[R7],[R8],[R9] with as many results as are found.

Right now it would need to be

regex([string],/#([\d\d])([\d\d])([\d\d])([\d\d])([\d\d])([\d\d])([\d\d])([\d\d])([\d\d]),-1)

ie an explicit instruction set without the ability to react to dynamic strings.

This is something all regular expression engines support except for MC at the moment...
Logged

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10933
Re: Calling regex/string manipulation gurus
« Reply #10 on: May 22, 2019, 06:49:23 am »

Imagine it did that, how would you then dynamically act on any given number of matched results? Its not like the expression stuff has loops or anything like that. I might not be seeing how its being used successfully then.
Logged
~ nevcairiel
~ Author of LAV Filters

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: Calling regex/string manipulation gurus
« Reply #11 on: May 22, 2019, 08:24:51 am »

Easy, you could construct a new user field, for example:

Field="[R1][R2][R3][R4][R5][R6][R7][R8][R9]"

And the result would be filled with whatever data is there; it might be

Field="2325"

or

Field="12345678"

or whatever.

Or you could stick it all in a list

Field="[R1]; [R2]; [R3];"etc&datatype=list and do list work, eg you could actually count the number of results.  Or maybe even regex() might return the results count in a new variable [R0]?

Etc.

This is bread and butter stuff in regular expressions work...
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8014
Re: Calling regex/string manipulation gurus
« Reply #12 on: May 22, 2019, 03:16:56 pm »

It seems to me that a regex based replace would fix a good number of these kinds of cases.  Ala, "s/find/replace/g" from perl, sed, and friends.  Maybe something like:

Code: [Select]
RegexReplace(String, Search Regex, Replace Regex, mode)

example:

Code: [Select]
RegexReplace([Name], /#\(.+\)/#,,0)

This would replace everything between parenthesis in the [Name] field with nothing.  Deleting those substrings.
Modes might include:
0: operate globally on the entire string (like the g modifier in perl and sed)
1.: operate only on the first match (like not using a modifier in perl and sed)
2: delete the regex substring (like using d in sed)

There are probably some other commonly desired things that could be done with modes, but that's all I've got for now.

Or more generally to write procedural code against this kind of stuff.  But that's probably a MUCH bigger discussion.

Brian.
Logged

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: Calling regex/string manipulation gurus
« Reply #13 on: May 23, 2019, 01:09:49 am »

+1 for Brian's suggestion.
Logged
Pages: [1]   Go Up