Indeed. There was another problem, in that if "xxxanything" came at the end of the string, that was it, it just ended, there was no semicolon to key from, so for those cases, I was getting failures. I tried a few things, messing around with pipes '|' for "Or" and double captures, and couldn't quite get there.
Alternation, looking for semicolon, or end of line, would solve this problem:
xxx[^;]+
(;|$)I now have:
if(regex([keywords],/#(!Places[^;]+)#/,0),
replace([R1],!Places\,),
Failure\No Places)&datatype=[list]
Let's see... This works because I'm saying, "Give me everything that comes after !Places that's not a semicolon", right? So, when it finds one, it stops, or when it gets to the end of the string, it stops. How neat is that?
Right, it Matches !Places followed by 1 or more non-semicolon's, as far as it can go...
Neat.
This does the same job as a ridiculously long expression, in a tiny fraction of the time taken by that long expression. Amazing stuff.
That's exactly why I pushed so hard for them. :-) Existing left, right, removeright, etc. functions while trivial to use, and too specific *when the job is generalized string matching and extraction*. Nice to have them around, but too limiting to be useful across a broader range of problems.
Now, about that "global" switch you mentioned a few days back...
Is it worth asking for this ability? I personally really, really need it to be able to do the same as the above and pull all the people out of my nested keywords field. I'm not certain what I'm asking for though...
Would a 'g' switch also capture (up to nine) instances automatically, for example, or do we just need that 'g' switch plus a touch of MrC magic?
Matt probably spent a lot of time implementing RE support, and I certainly don't want to push here. I understand they have their priorities and respect that.
I personally think it would complete the package, and would be tremendously useful.
But let me clarify what global is and is not, and what is required to be useful. Say you have your string (e.g. your keywords)
abc;def;gh;ijkl;
and you want to capture the stuff in between the semicolons (we'll take the semicolons too, for an easier RE). We can always repeatedly match such patterns with grouping, and a quantifier:
^([^;]+;)+$
or match exactly, say, 6:
^([^;]+;){6}$
But with capturing comes the wrinkle. You can capture the first and last one easily, and can even get the Nth one (a bit clumsy, but fine), but there is no way in RE's by themselves to get all of them as independent captures.
Instead, to implement global capture, the developers using the RE engine write a loop, which progressively runs the RE over spans of the string. The RE engine has the ability to remember where it last stopped, so the developer can re-call the RE with the sub-string for subsequent matches. So it would look like:
while not at end of string {
match current start of string against RE <-- captures happen here
set start of string to last position used by RE
}
Of particular note, there is an implicit last-capture idea here. The captures occur during matching, so only the last captured item would be available. So this would not be any more powerful than just writing an expression that captures the last match (and we know we can do this).
The power comes when replacement (aka substitution) is implemented, and the global concept applies here. With substitution comes the ability to replace matched patterns with specified strings (which can even include previous captures!). We could, for example, append our keywords with X (where the stuff inside the last / / chars is the replacement text).
keywords: abc;def;gh;ijkl;
subst [keywords], /([^;]+;)/, /\1 (X)/ keywords now: abc X;def;gh;ijkl;
or we could even remove the keyword:
subst [keywords], /([^;]+;)/, // keywords now: def;gh;ijkl;
But what if we wanted to do that everywhere in the string? This is where global comes in:
Gsubst [keywords], /([^;]+;)/, /\1 X/ keywords now: abc X;def X;gh X;ijkl X;
or we could, like above, replace the keyword with nothing at all:
Gsubst [keywords], /([^;]+;)/, // keywords now:
<empty>Again, this is implemented in the loop construct mentioned above, where at each iteration, the captures are set, their positions in the string remembered by the RE engine, and the implementer does the necessary mechanics to replace the the sub-string with the specified replacement text. So, global only has meaning with substitution.
When would this be useful?
1. when you want to globally replace matches (including with nothing, or amending the captures)
2. when you have sub-strings in text you want removed, so that you can capture what remains
Let's say I have some file names, such as:
foo_bar-some__thing__ (v22).pdf
and you want to get rid of the parens, underscores, and dashes, and replace each occurrence by a single space:
Gsubst [filename (name)], /[()_-]+/, / / filename now: foo bar some thing v22.pdf
Without global, you can only do this
generally with one of the occurrences.
Or, I have a list of key / value pairs in some text field, such as performers and the pairs are name: instrument as follows:
trombone: sue; vocals: sally; drums: sam;
With
Gsubst [performers], /([^:]+): ([^;]+);/, /\1;/
we'd obtain the list of performers sans instruments;
sue; sally; sam
and
Gsubst [performers], /([^:]+): ([^;]+);/, /\2;/
generates the instruments:
trombone; vocals; drums;
So, I think it is tremendously useful, and completes the generalization of today's specific functions (this supersedes and generalizes Replace() and RemoveCharacters(), which are too specific, and supplements ListBuild()).
Some would argue that the examples are contrived. Well, they are a little, but only for the sake of brevity and explanation of a more general concept - making the examples more complex doesn't serve much purpose other than making them more difficult to read. On the other hand, we've already seen some requests (you, me, others) who would find this quite useful. And then there are those who won't consider MC because the want this ability and other tools such as
mp3tag support this (so they go there).