INTERACT FORUM

More => Old Versions => JRiver Media Center 19 for Windows => Topic started by: chrisjj on January 16, 2014, 07:43:33 pm

Title: Expression to remove diacritics?
Post by: chrisjj on January 16, 2014, 07:43:33 pm
Is there some way in an expression to remove diacritics? I find no explicit function . I need a version of Name without diacritics, to make Remove Duplicates treat as equal values that differ only in diacritics.

Thanks.
Title: Re: Expression to remove diacritics?
Post by: MrC on January 16, 2014, 08:43:41 pm
Do you really mean "diacrytics" (which are just letters with additional glyph marks), or do you really mean all letters not typical ASCII a-z, A-Z, 0-9, punctuation, etc.

For example, should the following be accepted or rejected?

    ø  ß  œ Ɣ

To do these types of comparisons, you can normalize the Unicode to a certain standard form (where diacritics are split to include the combining characters and then these are stripped) and some additional characters such as ß are force-converted to S.  But more technically correct is to use a level 1 Unicode Collation Algorithm comparison.  These are well beyond MC's capabilities.

Somewhere I have some code that does this - if you want it, and I can find it, it can be turned into a pscriptor scriplet so that you could perform this conversion and save the results to an MC field or two, or have it save a comparison result.
Title: Re: Expression to remove diacritics?
Post by: chrisjj on January 17, 2014, 05:17:52 am
Do you really mean "diacrytics" (which are just letters with additional glyph marks), or do you really mean all letters not typical ASCII a-z, A-Z, 0-9, punctuation, etc.

I do mean diacritics.

or do you really mean all letters not typical ASCII a-z, A-Z, 0-9, punctuation, etc.

No - I don't want any letters removed.

For example, should the following be accepted or rejected?

    ø  ß  œ Ɣ


Accepted unchanged, since they don't have diacritics.

To do these types of comparisons, you can normalize the Unicode to a certain standard form (where diacritics are split to include the combining characters and then these are stripped) and some additional characters such as ß are force-converted to S.  But more technically correct is to use a level 1 Unicode Collation Algorithm comparison.  These are well beyond MC's capabilities.

Thanks.

Somewhere I have some code that does this - if you want it, and I can find it, it can be turned into a pscriptor scriplet so that you could perform this conversion and save the results to an MC field or two, or have it save a comparison result.

I would very much like that - thanks. I'd missed the announcement of pscriptor (http://yabb.jriver.com/interact/index.php?topic=85990.0). This feature looks awesome. I'll try it now. Even more awesome would be the ability to call it from expressions, but I don't currently see any expression-language call-out function what would allow this.
Title: Re: Expression to remove diacritics?
Post by: MrC on January 20, 2014, 12:58:43 pm
Bump.  Did you get started with pscriptor, and if so, have you thought about how you'd like to use it for this problem here?
Title: Re: Expression to remove diacritics?
Post by: chrisjj on January 21, 2014, 08:41:31 am
Bump.  Did you get started with pscriptor,

Not yet. I saw the install procedure is something I'll need to take a bit of time over.