More > JRiver Media Center 30 for Windows

Expression "isrange" handling of accents / umlauts / ligatures

<< < (2/2)

zybex:
Thank you.
Apparently Æ and other ligatures are not handled. NFKD is likely the best for this: https://unicode.org/reports/tr15/#Norm_Forms, but maybe not worth the effort unless you're using some library like Boost that already does it.

Matt:

--- Quote from: zybex on March 27, 2023, 11:26:27 am ---Apparently Æ and other ligatures are not handled.

--- End quote ---

I just added Æ and will translate it to A.  If there are others just let me know.

Thanks.

CltrAltDel:
Thanks for looking into that - the clean function did the trick :)
I changed the expression to

if(isrange(Clean([Name],9), a-z), formatrange(Clean([Name],9),1,0), #)

and that worked like charm!

If the Æ is added to the clean function, would adding Œ be reasonably?

And maybe the documentation could be expanded to mention that not only accents but also umlauts, circumflexes, cedillas, tildes, rings, slashes (Ø), carons (and more?) and (possible) ligatures are "cleaned"... or in short just "Removes diacritics." ;D

CltrAltDel:
While reviewing expressions for actor names I just stumbled across another relatively common letter, the Icelandic thorn "Þ" for "Th"... would it be reasonably to translate it to "T"?

blgentry:
I helped out with the Clean(9) function to translate diacriticals.  I used this table as the source of the translation:

https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html

It has all the characters mentioned in this thread as far as I am aware.  It's possible I missed some of them, but I think they are all in what I gave to Matt.

Brian.

Navigation

[0] Message Index

[*] Previous page

Go to full version