INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Removing all characters up to the first Capitalized word  (Read 710 times)

mvandyke

  • World Citizen
  • ***
  • Posts: 157
Removing all characters up to the first Capitalized word
« on: April 01, 2021, 01:19:14 pm »

Trying to figure out a way to remove all character up to the first capitalized word.  In all instances those variables to be removed would be numbers and spaces.

Here is an example:
1    National Education Week   
1. Moonshadow -
1. No One
10    I Have Dreams   
10. Try Sleeping With A Broken Heart
11    What's The Matter Here?   
11. Un-thinkable (I'm Ready)

I've looked at expressions for removeleft and removecharacters but can't seem to find the correct syntax.  Any help would be appreciated.

Thanks
Matt

Logged

hoyt

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 863
Re: Removing all characters up to the first Capitalized word
« Reply #1 on: April 01, 2021, 01:50:07 pm »

I'm by no means a regex expert, but I think looking into that will solve a situation like this.  Here's what I did quickly that worked on the few examples you provided:

Regex([field],/#(([A-Z]).*?.+(.\w|[[:punct:]]))#/,1,1)

I added that |[[:punct:]] because I assumed "11    What's The Matter Here?   " should be: "What's The Matter Here?"

Someone else may see a more efficient way, but that at least seemed to address the question.
Logged

mvandyke

  • World Citizen
  • ***
  • Posts: 157
Re: Removing all characters up to the first Capitalized word
« Reply #2 on: April 01, 2021, 03:03:08 pm »

I'm by no means a regex expert, but I think looking into that will solve a situation like this.  Here's what I did quickly that worked on the few examples you provided:

Regex([field],/#(([A-Z]).*?.+(.\w|[[:punct:]]))#/,1,1)

I added that |[[:punct:]] because I assumed "11    What's The Matter Here?   " should be: "What's The Matter Here?"

Someone else may see a more efficient way, but that at least seemed to address the question.
;D ;D ;D  Perfect - that worked and thanks so much for your help!

Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2376
Re: Removing all characters up to the first Capitalized word
« Reply #3 on: April 01, 2021, 04:11:03 pm »

There's no need to define what comes after the first capital letter:
regex([field], /#([A-Z].*)#/,1,1)

Logged

Wheaten

  • Guest
Re: Removing all characters up to the first Capitalized word
« Reply #4 on: April 01, 2021, 04:12:40 pm »

only change i would suggest is expand domain [A-Z] to [A-Za-z]. In case your start word has no capital.

Regex([field],/#(([A-Za-z]).*?.+(.\w|[[:punct:]]))#/,1,1)

If you want to play/test REGEX, I suggest to download Expresso.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2376
Re: Removing all characters up to the first Capitalized word
« Reply #5 on: April 01, 2021, 04:16:14 pm »

You can just remove the last arg to make it case-insensitive:
regex([field], /#([A-Z].*)#/,1)
Logged

hoyt

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 863
Re: Removing all characters up to the first Capitalized word
« Reply #6 on: April 01, 2021, 04:20:58 pm »

There's no need to define what comes after the first capital letter:
regex([field], /#([A-Z].*)#/,1,1)

You're right.  I originally had the second part to remove the " -" from this one: 1. Moonshadow - .  Then I added the punct and that re-includes the hyphen.  May be better off specifically calling out the punctuation to keep?

(([A-Z]).*?.+(.\w|\?|\!|\.))

And when I tried it without the last element, it wasn't matching the case correctly in MC.  It was dropping that first letter from my match group for some reason.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2376
Re: Removing all characters up to the first Capitalized word
« Reply #7 on: April 01, 2021, 04:30:16 pm »

Adding a Clean() sorts that one out:
Clean(regex([field], /#([A-Z].*)#/,1,1))

Your long expression has some other issues:
- It only captures names of 4 chars or more. "10. Sun" won't be captured.
- the ".*?.+" part is saying "capture something of any length followed by something of at least 1 char". This is ambiguous and causes de engine to try and match multiple combinations, slowing it immensely for some input strings. Just use ".+" or ".*", not both.
- After this you have another ".\w" which means "capture any character followed by a letter", which is again, a bit redundant.

Logged

hoyt

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 863
Re: Removing all characters up to the first Capitalized word
« Reply #8 on: April 01, 2021, 04:37:14 pm »

Adding a Clean() sorts that one out:
Clean(regex([field], /#([A-Z].*)#/,1,1))

Your long expression has some other issues:
- It only captures names of 4 chars or more. "10. Sun" won't be captured.
- the ".*?.+" part is saying "capture something of any length followed by something of at least 1 char". This is ambiguous and causes de engine to try and match multiple combinations, slowing it immensely for some input strings
- After this you have another ".\w" which means "capture a character followed by any letter", which is again, a bit ambiguous

Good catch!  This is why I have to use regex101.com whenever trying to make something work with regex.
Logged
Pages: [1]   Go Up