INTERACT FORUM

More => Old Versions => JRiver Media Center 30 for Windows => Topic started by: CltrAltDel on March 27, 2023, 09:45:53 am

Title: Expression "isrange" handling of accents / umlauts / ligatures
Post by: CltrAltDel on March 27, 2023, 09:45:53 am: For a long time I've been using a pane view with (among others) a category/column "A-Z" for movies.
This uses the following expression for grouping:

if(isrange([name], a-z), formatrange([name],1,0), #)

which results in a column with "#ABC...Z" for filtering, so something like

#      -> 2 Guns | 2 Days in the Valley | 12 Monkeys | 2001: A Space Odyssey | ...
A      -> A Beautiful Mind | À La Carte! | Almost Famous | Ärger Im Gepäck | Atomic Blonde | ...
B      -> Babylon | Bad Santa | Barbarella | Barton Fink | ...
C      -> Car Wash | Casablanca | Casino | Chinatown | ...
...
Z      -> Zabriskie Point | Zero Effect | Zodiac | Zulu | ...

After the update to MC30 the column looks like this:

#        -> 2 Guns | 2 Days in the Valley | 12 Monkeys | 2001: A Space Odyssey | ...
(Others)   -> À La Carte! | Ärger Im Gepäck
A        -> A Beautiful Mind | Almost Famous | Atomic Blonde | ...
B        -> Babylon | Bad Santa | Barbarella | Barton Fink | ...
C        -> Car Wash | Casablanca | Casino | Chinatown | ...
...
Z        -> Zabriskie Point | Zero Effect | Zodiac | Zulu | ...

So umlauts and letters with accents are not handled like letters any more.
Interestingly, the ligature "Æ" is handled like a number, so the movie "Æon Flux" is sorted to "#" (should also go to "A"...).

Is this a bug or is it by design?
If it's by design, how to get around it?
I made some attempts with adding lines like

isequal([name], À), A,
isequal([name], Ä), A,

to the expression, but that does not seem to work.
As I'm not that savy with the expression language, some help would be greatly appreciated :)
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: zybex on March 27, 2023, 11:01:09 am: This is not easy to do without a new function to remove diacritics from text (unicode normalization). Perhaps Matt can add a Normalize() function for that, or a new mode to the existing FixCase() function.

Then you would use something like this (if you have no names starting with symbols/punctuation):
Letter(normalize([name]),1,2)
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: Matt on March 27, 2023, 11:10:20 am: Look at Clean(...) in mode 9.
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: zybex on March 27, 2023, 11:13:55 am: Perfect :) That mode is not documented, apparently: https://wiki.jriver.com/index.php/String_Manipulation_Functions#Clean

Then it's just:
Letter(Clean([Name],9),1,2)
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: Matt on March 27, 2023, 11:18:19 am: Quote from: zybex on March 27, 2023, 11:13:55 am
Perfect :) That mode is not documented, apparently: https://wiki.jriver.com/index.php/String_Manipulation_Functions#Clean

Then it's just:
Letter(Clean([Name],9),1,2)

Just updated.
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: zybex on March 27, 2023, 11:26:27 am: Thank you.
Apparently Æ and other ligatures are not handled. NFKD is likely the best for this: https://unicode.org/reports/tr15/#Norm_Forms, but maybe not worth the effort unless you're using some library like Boost that already does it.
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: Matt on March 27, 2023, 11:33:57 am: Quote from: zybex on March 27, 2023, 11:26:27 am
Apparently Æ and other ligatures are not handled.

I just added Æ and will translate it to A. If there are others just let me know.

Thanks.
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: CltrAltDel on March 28, 2023, 09:00:44 am: Thanks for looking into that - the clean function did the trick :)
I changed the expression to

if(isrange(Clean([Name],9), a-z), formatrange(Clean([Name],9),1,0), #)

and that worked like charm!

If the Æ is added to the clean function, would adding Œ be reasonably?

And maybe the documentation could be expanded to mention that not only accents but also umlauts, circumflexes, cedillas, tildes, rings, slashes (Ø), carons (and more?) and (possible) ligatures are "cleaned"... or in short just "Removes diacritics." ;D
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: CltrAltDel on March 28, 2023, 01:13:30 pm: While reviewing expressions for actor names I just stumbled across another relatively common letter, the Icelandic thorn "Þ" for "Th"... would it be reasonably to translate it to "T"?
Title: Re: Expression "isrange" handling of accents / umlauts / ligatures
Post by: blgentry on March 29, 2023, 02:20:23 pm: I helped out with the Clean(9) function to translate diacriticals. I used this table as the source of the translation:

https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html

It has all the characters mentioned in this thread as far as I am aware. It's possible I missed some of them, but I think they are all in what I gave to Matt.

Brian.