In another thread someone asked for a technique to remove some characters from a file name:
http://yabb.jriver.com/interact/index.php?topic=97989.0I gave a regular expression to do this. I'm using this post to explain the regular expression and hopefully serve as a short introduction to how regex works.
The assignment was to take a string that has a dash and a space in front of it. "- " ...and return only the part after the dash and the space. For example:
"- Speak To Me" would become "Speak To Me" . In addition, not every string we are going to process will have the "- ". Some will just be normal and our expression needs to deal with those too.
Regular Expressions are a pattern matching and grouping tool. They let you do things like slicing up strings into different pieces, reorder them, manipulate them, etc. So regex is very well suited to this task. Fundamentally what we are trying to do is take an expression like this one:
- SomeCharactersHere
..and return only the second part or "SomeCharactersHere". Let's take that expression and slowly build it up into a proper Regular Expression. Regular Expressions have their own set of special characters that mean certain things and they are very useful. The first one we are going to use is the "match any character" special. This is simply a period or a dot. Now our expression becomes:
- .
That's not very useful though because just one dot only matches one character. We want to match *any* number of characters after the "- " sequence. Luckily there's a modifier we can use that says "take the last character and allow it to repeat one or more times." That is the plus or "+". Now we have:
- .+
..and we are getting somewhere! Let's go further. We want to be able to separate out the last part. That is, the part that doesn't have the "- " in it. Regex lets us group things with parenthesis. Let's put them in here:
- (.+)
Regex will let us refer to a group later. Anything that's inside of ( ) is called a Reference or a Back Reference. They are numbered from left to right, and we can have as many as we need. But let's group off that sequence of "- " too. Now our expression is:
(- )(.+)
So now we can refer to the first part as Reference #1 and the second part as Reference #2. The second Reference is really what we want to print out of this when we are done, so we can transform that field to remove those pesky leading characters. Speaking of, remember when I said that the leading "- " was optional? We need to set up the regex to deal with that. As it is, the regex will only match if it sees the "- " followed by 1 or more other characters.
Remember the modifier we used earlier, the plus ? That means "repeat the thing before this one or more times". There are other modifiers that are similar. We are going to use the one that means "Repeat the thing that comes right before this ZERO or more times." That sounds kind of like "optional" right? That's exactly what it is. That modifier is the star or * . So our expression now turns into:
(- )*(.+)
That * means to repeat the stuff inside that first set of parenthesis zero or more times. We have almost a complete regular expression at this point. However, all Regular Expressions have a start character and an end character. This tells the regex engine where the actual regex characters begin and end. In JRiver MC they have chosen /# and #/ as the start and end character sequences. So now our expression becomes:
/#(- )*(.+)#/
Getting really close now. So lets review it, left to right.
/# is the start sequence. Then (- ) means to match a sequence of a dash and then a space and to group those two characters into a reference. Then the * tells that the group of (- ) can be seen zero or more times. Now (.+) means to match ANY character ONE or more times and to group all of those characters into a reference for us. Incidentally, that's the second reference in the expression. Finally the #/ sequence tells the regex engine that the regular expression is done.
That's the meat of the expression, but we need to put it inside of MC's regex() function. Regex() takes 4 arguments. The first one is the string we want to process. In this case we are going to use the [Name] field. The second argument is easy, it's the Regular Expression itself. The thing we just spent all this time building up. Just take the last two arguments on faith for a moment. Here's the regex call we are going to make:
regex([Name],/#(- )*(.+)#/, -1, 0)
At this point (and any time you are working on a regex) it would be helpful to make an Expression Column in one of your views and cut and paste this regex into it. You should also edit one of your song names to have a dash and then a space in front of it. So we have something to test with. When you do, you're going to see that the new expression column is totally blank! That's because the regex function itself doesn't usually return anything, so nothing gets printout out in the column. What we REALLY want is that stuff in the second set of parenthesis... the second Back Reference, remember? Regex references are accessed as [R1], [R2], [R3], etc. Let's tell it to print the second Reference or [R2]:
regex([Name],/#(- )*(.+)#/, -1, 0)[R2]
You should now see, in your expression column, all of your original song names intact and the one you added with the "- " in front of it, should now have the "- " sequence removed!
If you want to use this to transform several of the your [Name] fields to remove the "- ", all you have to do is highlight them, go to the Tag Editor and paste the expression into the [Name] field with an "=" in front of it like:
=regex([Name],/#(- )*(.+)#/, -1, 0)[R2]
You can use this to experiment with your own Regular Expressions. Using an expression column is totally non-destructive, so you can play around as much as you want and not break anything. Hopefully this will get you started towards writing and using your own Regular Expressions.
Good luck!
Brian.