INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Should diacritics be ignored when grouping?  (Read 6034 times)

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Should diacritics be ignored when grouping?
« on: June 06, 2013, 10:44:31 am »

I have been trying out Theater View and noticed that in the Director view I had a few duplicates, because sometimes the names were filled out with accents diacritics, and sometimes they were missing:



Perhaps they should be ignored when grouping items together?
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Should accents be ignored when grouping?
« Reply #1 on: June 06, 2013, 10:53:53 am »

We ignore accents on search, but not on grouping.

I don't feel strongly on the grouping part. 

However, I'm a little afraid of this topic:
http://yabb.jriver.com/interact/index.php?topic=80105.0
Logged
Matt Ashland, JRiver Media Center

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Should accents be ignored when grouping?
« Reply #2 on: June 06, 2013, 11:03:40 am »

As a native English speaker, my concern was also that there may be situations where accents should not be ignored.
I prefer to have things entered correctly, but as illustrated, when using external data sources there may be discrepancies.

For my use, it seems like ignoring accents everywhere would be preferable. Perhaps this could be a global option?

  • Ignore accents
  • Ignore accents when searching
  • Treat accents as unique characters
Logged

yannis

  • World Citizen
  • ***
  • Posts: 229
Re: Should accents be ignored when grouping?
« Reply #3 on: June 07, 2013, 03:01:45 am »

I do hope that this would be optional. Fuzzy searches is one thing; fuzziness itself allows for a mixed bag of results. But sorting is totally different, it should be absolute. To impose a logic where sonar=soñar is really the same as if one barged in in your culture declaring that stay=stey.

On the other hand an optional "ignore diacritics" might indeed be useful in smartlists, where tracks are omitted because of a one letter difference. In fact, applying fuzziness here might be a blessing: "theater" could equal "theatre", "La Monte Young" would appear near "LaMonte Young" etc.
Logged

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Should accents be ignored when grouping?
« Reply #4 on: June 07, 2013, 11:01:45 am »

I do hope that this would be optional. Fuzzy searches is one thing; fuzziness itself allows for a mixed bag of results. But sorting is totally different, it should be absolute. To impose a logic where sonar=soñar is really the same as if one barged in in your culture declaring that stay=stey.
The issue, as illustrated above, is that metadata sources are often inconsistent in how they handle diacritics, which ends up with a lot of duplicate groups. (p.s. thanks for the correction - I meant "diacritics" when I said "accents")

When I am filling in data myself, I am very careful to get this right, and I'll usually go over any albums I import to correct mistakes in the metadata.
With films though, I don't care enough to be going through and checking everything manually, which is how errors like the one shown above are in my library.

I'm also not meaning that Media Center should strip the files of that data, just that it would treat:

èéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệe as e for the purposes of grouping, sorting, searching etc.

I think smartlists and search already do ignore diacritics.
Logged

yannis

  • World Citizen
  • ***
  • Posts: 229
Re: Should diacritics be ignored when grouping?
« Reply #5 on: June 07, 2013, 03:07:28 pm »

Well, maybe what I said wasn't clear enough, so I'll reiterate. For native speakers of languages other than english, these elements of written language are not decorative; they are crucial when these people e.g. look up a dictionary, therefore they are crucial when sorting in MC.

If you're proposing an option, I'd say why not; but if it's supposed to be obligatory, then I strongly disagree. If anything, MC should be made aware of these differences, to accommodate for more precise smartlists.

BTW, now that I said "accommodate", some languages don't contain double letters that don't have vocal value. How would it feel, then, if you had sort or look it up as "acomodate" ;)
Logged

InflatableMouse

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3978
Re: Should diacritics be ignored when grouping?
« Reply #6 on: June 07, 2013, 03:15:17 pm »

Well, maybe what I said wasn't clear enough, so I'll reiterate. For native speakers of languages other than english, these elements of written language are not decorative; they are crucial when these people e.g. look up a dictionary, therefore they are crucial when sorting in MC.

If you're proposing an option, I'd say why not; but if it's supposed to be obligatory, then I strongly disagree. If anything, MC should be made aware of these differences, to accommodate for more precise smartlists.

BTW, now that I said "accommodate", some languages don't contain double letters that don't have vocal value. How would it feel, then, if you had sort or look it up as "acomodate" ;)

Can you give examples where two artist or bandnames' only difference is by diacritics?

What I mean is when Antonín Dvořák is actually someone else as Antonin Dvorak?
Logged

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Should diacritics be ignored when grouping?
« Reply #7 on: June 07, 2013, 04:01:16 pm »

Well, maybe what I said wasn't clear enough, so I'll reiterate. For native speakers of languages other than english, these elements of written language are not decorative; they are crucial when these people e.g. look up a dictionary, therefore they are crucial when sorting in MC.

If you're proposing an option, I'd say why not; but if it's supposed to be obligatory, then I strongly disagree. If anything, MC should be made aware of these differences, to accommodate for more precise smartlists.
Well as I said in this post, a global preference would probably be the best option.

What I mean is when Antonín Dvořák is actually someone else as Antonin Dvorak?
Exactly. I haven't come across that yet, and I think it's probably quite rare for that to happen. But it is relatively common for metadata sources to be inconsistent with their handling of diacritics.
Logged

mwillems

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 5234
  • "Linux Merit Badge" Recipient
Re: Should diacritics be ignored when grouping?
« Reply #8 on: June 07, 2013, 04:08:11 pm »

He actually provided an illustration in his post, there's a band called "sonar," and another band called "soñar."  Not sure if that was intentional (and I haven't heard of either band), but there we are.  

I expect that kind of overlap is much more common in diacritic rich languages, where the diacritic makes the difference between two entirely different common words that might work their way into band names.  Which is why we English speakers are kind of deaf to it.  

I would support a configuration option.
Logged

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Should diacritics be ignored when grouping?
« Reply #9 on: June 07, 2013, 04:17:39 pm »

He actually provided an illustration in his post, there's a band called "sonar," and another band called "soñar."  Not sure if that was intentional (and I haven't heard of either band), but there we are.
I assumed that was a hypothetical example. The closest thing I can find is:

Sonar may also refer to:
Sonar (band), a Belgian musical group
Soñar is a radio single of the Mexican Electro/Pop band Belanova

I expect that kind of overlap is much more common in diacritic rich languages, where the diacritic makes the difference between two entirely different common words that might work their way into band names.  Which is why we English speakers are kind of deaf to it.  
Perhaps, but that's why I proposed making it an option, rather than forcing it on everyone.

And it's not that I think diacritics are pointless - I know their meaning. It's that many sources have often stripped out the diacritics, and files are not getting grouped together properly as a result.
Ignoring them when grouping would largely solve this. (and I would also prefer to display the name with diacritics as the name of the grouping)
Logged

mwillems

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 5234
  • "Linux Merit Badge" Recipient
Re: Should diacritics be ignored when grouping?
« Reply #10 on: June 07, 2013, 04:21:19 pm »

I assumed that was a hypothetical example. The closest thing I can find is:

Sonar may also refer to:
Sonar (band), a Belgian musical group
Soñar is a radio single of the Mexican Electro/Pop band Belanova

There's a Tucson based latin-funk band called soñar apparently: http://www.myspace.com/thisissonar

I wasn't quibbling with the idea of more search/sort options, I greatly support it :-) 
Logged

yannis

  • World Citizen
  • ***
  • Posts: 229
Re: Should diacritics be ignored when grouping?
« Reply #11 on: June 08, 2013, 01:24:01 am »

Sorting is not limited to bands/artists, it could well be related to track titles, album titles etc. To sort all instances of an (MC stripped) "sonar" together is to sort stay/stey as the same entity. Or, in a more funny example, say you want a smartlist with songs about dreaming - you would also get all the songs about ringing a bell.
Logged

InflatableMouse

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3978
Re: Should diacritics be ignored when grouping?
« Reply #12 on: June 08, 2013, 02:35:40 am »

Of course the real solution is fixing your tags which is often as simple as selecting the tracks and choosing the right version from the list in the artist field.

For grouping, if such a thing were to be introduced in MC, I would personally prefer it grouped by default where you can simply ungroup it by right clicking and choosing 'ungroup' or something and exclude a selection from begin grouped.

Sorting is another story entirely. Is there even a standard for sorting the different versions of the same letter (excuse my lack of knowledge if they are not considered the same letter)? There probably is, Excel seems to sort it consistently with 62's row of e's as: eéèėêëěĕēẽęȩếềḗḕễḝẻȅȇểẹḙḛệ.
Logged

yannis

  • World Citizen
  • ***
  • Posts: 229
Re: Should diacritics be ignored when grouping?
« Reply #13 on: June 08, 2013, 05:27:19 am »

Is there even a standard for sorting the different versions of the same letter (excuse my lack of knowledge if they are not considered the same letter)? There probably is, Excel seems to sort it consistently with 62's row of e's as: eéèėêëěĕēẽęȩếềḗḕễḝẻȅȇểẹḙḛệ.


If you're refering to the small world of computer geeks, I'm not sure and I'm to busy to search right now. But in the greater world, there are indeed standards and they are eviedent if you peruse a dictionary or an encyclopaedia.

Now, this being a theoretical discussion, allow me to share a personal story. My native tongue is Greek and when PCs were originally introduced, the 256 chars DOS code page could not fit it the complete local alphabet. But the press was eager to jump on the new bandwagon, so they accepted to drop the accented capital letters. Now, this is critical, because an accent may change completely the meaning of a word; many jokes were produced by headlines and titles that had an unintentional double entendre. Years later, unicode came in to save the day. But only few newspapers went back to the rules. And they still make fools of themselves from time to time, while on the side their stupidity causes deep cultural repercussions. Which goes to show that these decisions may be way more important than they seem

BTW, In Spain back then, on the other hand, I remember reading they enforced a policy by which no keyboard could be imported if it did not have a dedicated key for "ñ". Some cultures are more resistant than others.
Logged

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Should diacritics be ignored when grouping?
« Reply #14 on: June 08, 2013, 11:16:25 am »

Again, I am not suggesting that diacritics be removed - this change is to try and fix the problem of metadata sources having already stripped the diacritics from names.

And after some more thought, I would even expand those options to:
  • Ignore diacritics
  • Ignore diacritics in names and searches
  • Ignore diacritics when searching
  • Treat diacritics as unique characters
Media Center's current behavior is that of #3 and we have had people request #4.

For #2, "names" would be fields such as Artist, Director, Actors etc.
In fact, taking it a step further, perhaps that could simply be a field that accepts a list of tags in which diacritics would be ignored for the purposes of grouping, sorting etc.

This is not because I think diacritics have no meaning and should be ignored, but because metadata sources are very bad about this.
Using my original example, three out of four films from Krzysztof Kieślowski do not have any diacritics and have the director filled out as Krzysztof Kieslowski.

These options would group them all under Krzysztof Kieślowski so that they are all displayed with the proper diacritics, rather than two separate groups.


If diacritics are that important to you, what I have proposed would surely benefit you as well, with the addition of #4 treating diacritics as unique characters at all times.

Of course the real solution is fixing your tags which is often as simple as selecting the tracks and choosing the right version from the list in the artist field.
Films have far too many entries for that to be a realistic option. I'm happy to fix it with CDs because it's typically one artist per disc, artists with diacritics in their names are somewhat uncommon in my library, and I use dBpoweramp with its "PerfectMeta" feature, which avoids most of those issues in the first place.
Logged

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Should diacritics be ignored when grouping?
« Reply #15 on: June 19, 2013, 11:18:13 pm »

I want a function that maps diacritics to gringo (I added such functionality to a renaming script I wrote for the Directory Opus crowd - it maps Unicode to ASCII).  With this, we can detect inconsistencies, rename files to be ASCII-safe, and search/group in any fashion.

I'd suggested it for it for some issues I was having with a DLNA device, but bob satisfied me with one of those nifty DLNA options tucked away from prying eyes.  Maybe I'm not so satisfied anymore, and am seeking additional fulfillment.
Logged
The opinions I express represent my own folly.

glynor

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 19608
Re: Should diacritics be ignored when grouping?
« Reply #16 on: June 20, 2013, 06:28:46 am »

and I use dBpoweramp with its "PerfectMeta" feature, which avoids most of those issues in the first place.

What does that do?
Logged
"Some cultures are defined by their relationship to cheese."

Visit me on the Interweb Thingie: http://glynor.com/

6233638

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 5353
Re: Should diacritics be ignored when grouping?
« Reply #17 on: June 20, 2013, 02:53:56 pm »

What does that do?
Perfect Meta uses five providers (AMG, GD3, SontaDB, Musicbrainz, freedb) to find metadata for your CDs and compares it across them all. If there's metadata that matches across multiple sources, that will be used for the disc.

They also let you easily pick from any of the sources, or edit it manually, if a track is not listed as you expect.
More details here: http://www.dbpoweramp.com/cd-ripper.htm

I find it greatly speeds up finding/entering metadata for discs.
Logged

Denti

  • Citizen of the Universe
  • *****
  • Posts: 593
Re: Should diacritics be ignored when grouping?
« Reply #18 on: July 10, 2014, 08:11:03 am »

Resurrecting this thread to make two comments and ask about future plans:

1. Diacritics should absolutely be ignored (or ignorable) for searches, since not everyone has a keyboard that makes typing letters with diacritics easy.  (That is: the way it is now is good)

2. Diacritics should also be ignored (or ignorable) for sorting. I don't like that Müller, for example, gets bumped to the end of the Mu- groupings, instead of being sorted right after "Mule" or "Muller" (no diacritics).
Logged
Pages: [1]   Go Up