INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Expression "isrange" handling of accents / umlauts / ligatures  (Read 650 times)

CltrAltDel

  • Recent member
  • *
  • Posts: 13

For a long time I've been using a pane view with (among others) a category/column "A-Z" for movies.
This uses the following expression for grouping:

if(isrange([name], a-z), formatrange([name],1,0), #)

which results in a column with "#ABC...Z" for filtering, so something like

#      -> 2 Guns | 2 Days in the Valley | 12 Monkeys | 2001: A Space Odyssey | ...
A      -> A Beautiful Mind | À La Carte! | Almost Famous | Ärger Im Gepäck | Atomic Blonde | ...
B      -> Babylon | Bad Santa | Barbarella | Barton Fink | ...
C      -> Car Wash | Casablanca | Casino | Chinatown | ...
...
Z      -> Zabriskie Point | Zero Effect | Zodiac | Zulu | ...


After the update to MC30 the column looks like this:

#            -> 2 Guns | 2 Days in the Valley | 12 Monkeys | 2001: A Space Odyssey | ...
(Others)   -> À La Carte! | Ärger Im Gepäck
A            -> A Beautiful Mind | Almost Famous | Atomic Blonde | ...
B            -> Babylon | Bad Santa | Barbarella | Barton Fink | ...
C            -> Car Wash | Casablanca | Casino | Chinatown | ...
...
Z            -> Zabriskie Point | Zero Effect | Zodiac | Zulu | ...


So umlauts and letters with accents are not handled like letters any more.
Interestingly, the ligature "Æ" is handled like a number, so the movie "Æon Flux" is sorted to "#" (should also go to "A"...).

Is this a bug or is it by design?
If it's by design, how to get around it?
I made some attempts with adding lines like

isequal([name], À), A,
isequal([name], Ä), A,


to the expression, but that does not seem to work.
As I'm not that savy with the expression language, some help would be greatly appreciated  :)
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #1 on: March 27, 2023, 11:01:09 am »

This is not easy to do without a new function to remove diacritics from text (unicode normalization). Perhaps Matt can add a Normalize() function for that, or a new mode to the existing FixCase() function.

Then you would use something like this (if you have no names starting with symbols/punctuation):
Letter(normalize([name]),1,2)
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42387
  • Shoes gone again!
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #2 on: March 27, 2023, 11:10:20 am »

Look at Clean(...) in mode 9.
Logged
Matt Ashland, JRiver Media Center

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #3 on: March 27, 2023, 11:13:55 am »

Perfect :) That mode is not documented, apparently: https://wiki.jriver.com/index.php/String_Manipulation_Functions#Clean

Then it's just:
Letter(Clean([Name],9),1,2)
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42387
  • Shoes gone again!
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #4 on: March 27, 2023, 11:18:19 am »

Perfect :) That mode is not documented, apparently: https://wiki.jriver.com/index.php/String_Manipulation_Functions#Clean

Then it's just:
Letter(Clean([Name],9),1,2)

Just updated.
Logged
Matt Ashland, JRiver Media Center

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2619
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #5 on: March 27, 2023, 11:26:27 am »

Thank you.
Apparently Æ and other ligatures are not handled. NFKD is likely the best for this: https://unicode.org/reports/tr15/#Norm_Forms, but maybe not worth the effort unless you're using some library like Boost that already does it.
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42387
  • Shoes gone again!
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #6 on: March 27, 2023, 11:33:57 am »

Apparently Æ and other ligatures are not handled.

I just added Æ and will translate it to A.  If there are others just let me know.

Thanks.
Logged
Matt Ashland, JRiver Media Center

CltrAltDel

  • Recent member
  • *
  • Posts: 13
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #7 on: March 28, 2023, 09:00:44 am »

Thanks for looking into that - the clean function did the trick :)
I changed the expression to

if(isrange(Clean([Name],9), a-z), formatrange(Clean([Name],9),1,0), #)

and that worked like charm!

If the Æ is added to the clean function, would adding Œ be reasonably?

And maybe the documentation could be expanded to mention that not only accents but also umlauts, circumflexes, cedillas, tildes, rings, slashes (Ø), carons (and more?) and (possible) ligatures are "cleaned"... or in short just "Removes diacritics." ;D
Logged

CltrAltDel

  • Recent member
  • *
  • Posts: 13
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #8 on: March 28, 2023, 01:13:30 pm »

While reviewing expressions for actor names I just stumbled across another relatively common letter, the Icelandic thorn "Þ" for "Th"... would it be reasonably to translate it to "T"?
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8014
Re: Expression "isrange" handling of accents / umlauts / ligatures
« Reply #9 on: March 29, 2023, 02:20:23 pm »

I helped out with the Clean(9) function to translate diacriticals.  I used this table as the source of the translation:

https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html

It has all the characters mentioned in this thread as far as I am aware.  It's possible I missed some of them, but I think they are all in what I gave to Matt.

Brian.
Logged
Pages: [1]   Go Up