The thing I keep hearing is that "XBMC just does this automatically." Maybe I'm not understanding, because setting up regular expressions is the opposite of something that just automatically works.
First of all, I hope it's understood that XBMC and probably most other programs that perform the same sort of function (for whatever purpose) are using regex. Furthermore, whether you expose the regex to the user or not, it's still the most mathematically elegant way to recognize patterns. But how you code the solution is your business, so I assume your question really pertains to properly defining the user need...
I'm not familiar with it, but I'm sure "XBMC just does this automatically" means that once configured correctly (and, yes, that necessarily includes providing regex suitable to the circumstances), it just works. In other words, with a set of regex that will match any form of filename I might use for video, I can then just include my video folder(s) in Auto-Import and all my files will be properly recognized and tagged. Equally important, if I encounter one exception, I know the fix can be to modify one regex or add another, and I'll never have to worry about that exception again.
I hesitate to question your ability to create an algorithm that does it all, but I suspect you must be making the assumption users will magically follow one of some finite set of file naming schemes you believe is sensible. No, I don't need to go that far. I don't believe you can create one fixed algorithm that will handle even just all of different perfectly sensible ways to organized files in a file system. And even if you could, why would you want to maintain such a thing?
Some things are obvious like "S3E21".
Yes, there's probably no more than a dozen commonly used ways to denote the series and episode. But that's just one component of several for just one media type. Even if you could create an algorithm that would handle most of the commonly used schemes, you would then be faced with educating users about what is recognized and what is not. And then you'll find yourself trying to advise users to rename their files while they're asking you to modify your algorithm (and, take my word for it, they'll all tell you they have at least 1,000 files matching the pattern you left out).
Adding a simple list of programs would make the algorithm even better.
Incorporating a list of titles recognized by a meta data provider being used is not a bad idea. For searches of the meta data database to work, you would have to use the correct name anyway. But the user-friendly way to implement that would be to do so internally and not attempt to enforce it on the filename (i.e., your system might ask once, "do you mean
this series?"). This particular list, however, doesn't appear to be a good choice. From the number of omissions I can readily spot, I would wonder too about the accuracy of the items on the list.
What is it about the system I outlined in my previous post that concerns you? It seems to me you could go as far as you please in making such a thing user-friendly. First of all, simply being populated with a list of expressions that will match all the schemes most users are likely to use will result in a it-just-works-out-of-the-box experience for most users. Most of the remaining users could deal with whatever doesn't work by picking something from a list that appears to be a match, modifying it if necessary, and placing that expression in the right position in the list of expressions. As I mentioned, that pick list would be of file pathname descriptions, not the regex themselves. Only the 0.1% of users who are actually comfortable with regex would actually choose to add a regex directly.
Like a number of similar matters in MC, the issue is not about providing something that "just automatically works." That's because nothing "just automatically works" all the time. To be truly user-friendly, your solution needs to do that
and provide the means to easily configure the thing so it does work when it fails to do so automatically. Even if users find the unavoidable failures annoying, what they'll want to hear is, "Don't worry—it can be configured to do anything," not, "You're SOL—change your filenames."