INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: How does JRiver identify duplicate files?  (Read 12157 times)

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
How does JRiver identify duplicate files?
« on: May 25, 2016, 12:14:43 pm »

I just transfered files to my handheld device and notice now a lot of them lacking. I believe, this is, because JRiver identified them as duplicates. Some of them have the same filename (for example »4. Allegro« is quite normal for classical music and appears quite often – this has the consequence that many symphonies on my handheld have only three parts.) but they have not the same size and are in all respects very different. Can I change the algorithm JRiver uses or deactivate this behaviour?
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #1 on: May 25, 2016, 12:36:52 pm »

If your handheld sync definition is storing the files in their own directories, per album, you should not have any filename conflicts.  Does your handheld sync definition use directories?  Or is it putting all files in the same directory?

Brian.
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #2 on: May 25, 2016, 01:17:25 pm »

It uses different directories. The problem is not a filename problem. The problem is that JRiver doesn’t send the files to the handheld if it identifies them as duplicates. And it identifies a file named »01-01 Allegro.flac« and a file named »04-01 Allegro.flac« as duplicates. The first one has 3 MB, the other one 12 MB. It’s easy to see that they can’ be duplicates. But just one pf them arrives at my handheld. I guess JRiver doesn’t consider file names and size. That’s the problem I believe.
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #3 on: May 25, 2016, 01:49:09 pm »

Ok, you originally said filename was the issue, but I see that what you meant is song name or simply the [Name] field.

I just created a playlist to sync that intentionally has several pairs of songs that are named identically.  I have something like 10 pairs of songs that have the same name.  So 2 songs named "Why Can't I Have You", 2 songs named "Magic", etc.

Almost all of these end up being copied to the handheld.  With one exception:  I have a one pair of songs, from the same album, that have the same name.  Those songs are marked as duplicates by the handheld sync system, and therefore, only one of the two songs gets copied!

So it seems that the algorithm is [Name] and [Album] (and probably [Artist]) must all be the same to be considered a duplicate.  I'm guessing this is what's happening to you too?  Only songs from the same album with the exact same name?

This seems like it should probably be changed.  I don't know of any way of configuring MC to not do this.  I *think* it would be something the developers would have to do.

Assuming what I said above about your conflicts are correct, you could perhaps make separate album tags for separate works that appear on the same album?  Or maybe the developers will be motivated to change this behavior in some way.

Brian.
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #4 on: May 25, 2016, 03:20:42 pm »

Thanks for your help. But this is no the solution. Let’s assume an album with the 10 symphonies by one composer. It has 10 discs, everyone contains two symphonies. It is nearly sure that there are two discs which begin with a track named »Allegro« or »1. Allegro«. The file name would be different: »01-01 1. Allegro« and 04-01 1. Allegro«, but JRiver seems not to look at the file name. This file size is different, but JRiver seems not to look at the file size. So it is nearly sure that it identifies files which are quite different as duplicates. Now I can’t change the album because it is one album,  not 10... It seems I have to copy the files manually.
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #5 on: May 25, 2016, 03:33:46 pm »

I was confirming what you said:  Duplicate [Name] fields for songs on the same album are apparently considered duplicates by MC when doing handheld sync.

I was offering a suggestion of changing the [Album] field because that will bypass MC thinking that those songs are duplicates.  I don't have lots of classical music.  But I know a little bit about it.  ...and in my estimation, [Album] doesn't really mean a lot for Classical works.   The Composer, the name of the Work, and the Movement usually are more descriptive than an Album name.

So you could do something like changing the Album name to OldAlbumName - SymphonyName.  That way the different symphonies, which are on the same physical CD would get different album names.  Or you could choose some other way of doing this.  It's just a suggestion.  If it doesn't work for your organizational system, I get that.  I'm just offering you some ideas for how a workaround might be done.

I would also guess that this is not what the developers intended when they made this feature.  So perhaps they will fix this.  I can't speak for them.

Good luck to you.

Brian.
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #6 on: May 25, 2016, 03:51:33 pm »

Well... I think this is not a solution. I have for example an Album »Beethoven: Symphonies (Gardiner)«. The symphonies have the title in »Grouping«, the tracks are named as the parts. That’s fine and works quite well. It’s not a good idea to have an album »Beethoven: Symphonie #1 (Gardiner)« with Grouping »Symphonie #1«. At least I don’t like this. I think it’s better not to use the sync feature...
(Anyway I don’t understand why it is so. If I want to have duplicates on my handheld, that’s my choice. So it should be possible to deactivate this option even if it works properly.)
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #7 on: May 25, 2016, 03:56:20 pm »

We must have a language barrier.  I'm not trying to say that my ideas for a workaround are a "proper solution".  They are just ideas.

Maybe someone from JRiver will have a comment on this.

Brian.
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #8 on: May 25, 2016, 04:10:42 pm »

May be it’s the language barrrier, more exactly my poor knowledge of the english language... I understand quite well what you write but can’t express myself in english clearly enough. Let my try this way: I understand that you wanted to show a workaround. But it doesn't help me. That’s not your fault... ;)
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #9 on: May 25, 2016, 04:20:19 pm »

Ok, thanks for responding.

Maybe someone else has some better ideas.  :)

Brian.
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #10 on: May 26, 2016, 01:25:16 am »

The problem is known since at least six years. There is a thread about this:

http://yabb.jriver.com/interact/index.php?topic=56600.msg525936#msg525936

This means, I guess, JRiver will never address this problem. But I believe, I have a workaround which is inspired by an idea from this thread. I didn’t try it yet because I’m not sure if it has unwanted side effects, but I think, I’ll try it today. When I give every track in my library a unique track number, the problem should disappear. This must be possible with the library tools. Before, I think, I should save the real track numbers to an unused tag, so I can restore them if better days dawn. It’s a strange work around, but maybe it works.  ::)
Logged

ferday

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1732
Re: How does JRiver identify duplicate files?
« Reply #11 on: May 26, 2016, 02:30:42 pm »

Is the disc # tag set?  That may just fix it...
Logged

Magic_Randy

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2350
  • I used to be indecisive, but now I'm not so sure..
Re: How does JRiver identify duplicate files?
« Reply #12 on: May 26, 2016, 02:55:01 pm »

I have a similar issue that I reported under the 21.0.83 thread.

http://yabb.jriver.com/interact/index.php?topic=104977.0

If the track name and track # are the same, you will get a duplicate. For me I'm getting false duplicates on multi-disc albums where the track name and track # are the same but the disc # is different.

Randy
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #13 on: May 26, 2016, 04:30:29 pm »

Is the disc # tag set?  That may just fix it...

No. It doesn't fix it. Because it happens that the disc #, tha track # and the title are identical. For example: An album with masses by one composer. 6 Cds, the same artist, the same composer, the same album. Every CD has »1. Kyrie«, »2. Gloria« etc. So all are duplicates. And you can’t avoid this. The solution is very simple: check the file size too. Anyway: I don’t understand why the sync routine looks for duplicates; I don’t understand, why such a foolish algorithm is used; I don’t understand why I can’t disable this mechanism; and I don’t understand why this problem which was reported in 2010, is until now not addressed. It is really strange to be forced to find a solution to outwit the program.
Logged

DJLegba

  • Citizen of the Universe
  • *****
  • Posts: 992
Re: How does JRiver identify duplicate files?
« Reply #14 on: May 26, 2016, 05:38:47 pm »

I've seen this problem with Sync Handheld for a long time now. I noticed it first when trying to transfer an audiobook to my Android phone. The workaround is to convert to mp3 (or whatever) to a folder and then copy that folder (with its subfolders) to the phone.
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #15 on: May 26, 2016, 06:31:11 pm »

I have to agree with one thing Werner has said:  It doesn't seem useful to have Handheld sync try to eliminate duplicates.  If the logic of this operation were REALLY well implemented, it would be nice to have it available as a tool to "find and eliminate duplicates".  But not as part of handheld sync.  Just my opinion.

Brian.
Logged

Magic_Randy

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2350
  • I used to be indecisive, but now I'm not so sure..
Re: How does JRiver identify duplicate files?
« Reply #16 on: May 26, 2016, 07:22:14 pm »

I have to agree with one thing Werner has said:  It doesn't seem useful to have Handheld sync try to eliminate duplicates.  If the logic of this operation were REALLY well implemented, it would be nice to have it available as a tool to "find and eliminate duplicates".  But not as part of handheld sync.  Just my opinion.

Brian.

I agree...
Logged

ferday

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1732
Re: How does JRiver identify duplicate files?
« Reply #17 on: May 26, 2016, 10:52:42 pm »

You still didn't answer the disc # question.  A 6 CD set has 6 disc numbers, they are not the same

Either way, I think the real long term solution is to allow for us to use the expression of choice in locating duplicates! 
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #18 on: May 26, 2016, 11:36:22 pm »

OK. May be it can help to add the disk # to the expression. I think, the best idea would be to check file name, file size and file date, but it’s not my task… It’s not my task to find the solution and the discussion is quite sterile because JRiver has no intention to change this.
Logged

DJLegba

  • Citizen of the Universe
  • *****
  • Posts: 992
Re: How does JRiver identify duplicate files?
« Reply #19 on: May 27, 2016, 05:05:28 am »

Setting the disc # makes no difference. This is absolutely a bug with Sync Handheld.
Logged

Magic_Randy

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2350
  • I used to be indecisive, but now I'm not so sure..
Re: How does JRiver identify duplicate files?
« Reply #20 on: May 27, 2016, 08:41:02 am »

Setting the disc # makes no difference. This is absolutely a bug with Sync Handheld.

MC is not considering Disc # in their MatchKeyExpression which detects duplicates.  Until they change that using Disc # will not help.
Logged

DJLegba

  • Citizen of the Universe
  • *****
  • Posts: 992
Re: How does JRiver identify duplicate files?
« Reply #21 on: May 27, 2016, 08:43:31 am »

^ On my phone at least, Sync Handheld tries to put everything into a single folder, so disc # is not going to help. The sync process needs to create subfolders on the phone.
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #22 on: May 27, 2016, 09:14:26 am »

^ I think that this detection of "false duplicates" happens before Handheld Sync transfers the files to the handheld sync area, so creating disc folders won't help with this particular issue.

But, if you want to create those folders anyway, because you like the organization, or some other reason, you can!

(click on handheld sync options) > Files ,Paths, & More > Audio Path > (paste in expression)

This expression, or something like it will work to create disc directories:

Code: [Select]
[Album Artist (auto)]/[Album]/if(isempty([Disc #]),,Disc[Disc #])/
If there is no disc number, it won't try to create the directory.  If there is a disc number, it creates a directory like "Disc2".

Brian.
Logged

DJLegba

  • Citizen of the Universe
  • *****
  • Posts: 992
Re: How does JRiver identify duplicate files?
« Reply #23 on: May 27, 2016, 10:21:39 am »

^ Thanks Brian, you are of course correct. In any event, Sync Handheld takes a long time to read the phone's directory, so it's much faster to convert to a folder on the computer and then copy that folder to the phone via the operating system - and you don't run into the duplicate bug.
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #24 on: May 27, 2016, 11:57:23 am »

^ Thanks Brian, you are of course correct. In any event, Sync Handheld takes a long time to read the phone's directory, so it's much faster to convert to a folder on the computer and then copy that folder to the phone via the operating system - and you don't run into the duplicate bug.

The test I set up for this thread, to see if Sync triggered false duplicates, goes to a folder on disk.  ....and it finds one false duplicate, as reported above.

If you've found a way around it with your music, that's cool.  But it does seem to still exist, no matter where you sync to.  For the record it doesn't really affect me.  But I'd like to see it fixed since it seems to be unintended behavior.

Brian.
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #25 on: May 27, 2016, 01:23:42 pm »

My solution is to destroy the field »track #«: I copied it’s contents for all tracks to a newly created field »Original Track« and filled the track field with consecutive numbers for all files in my database. I wrote a script which in the future will change new files so that the track number is in the field »Original Track« and the track-field will contain a unique number (the number of second since today midnight plus a random number). So the mechanism which detects duplicates is blocked. And if it happens sometimes (I don’t believe it and JRiver addresses this feature, I can easily copy the real track number back to it’s proper place.
This is a workaround. Stupid but not as stupid as the duplicate detection. And it works. Better said: It works for FLAC files. I found no command line tool for MP3 I can use in my script, but that’s not so important. I use MP3 exclusively for audio books and can manage them by hand.
Logged

Magic_Randy

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2350
  • I used to be indecisive, but now I'm not so sure..
Re: How does JRiver identify duplicate files?
« Reply #26 on: May 27, 2016, 02:41:00 pm »

My solution is to destroy the field »track #«: I copied it’s contents for all tracks to a newly created field »Original Track« and filled the track field with consecutive numbers for all files in my database. I wrote a script which in the future will change new files so that the track number is in the field »Original Track« and the track-field will contain a unique number (the number of second since today midnight plus a random number). So the mechanism which detects duplicates is blocked. And if it happens sometimes (I don’t believe it and JRiver addresses this feature, I can easily copy the real track number back to it’s proper place.
This is a workaround. Stupid but not as stupid as the duplicate detection. And it works. Better said: It works for FLAC files. I found no command line tool for MP3 I can use in my script, but that’s not so important. I use MP3 exclusively for audio books and can manage them by hand.

My understanding is that duplicate detection is based on creation of a 'match key' for each file.  By default, the match key is built on (basically) this expression: [Name]-Clean([Artist], 1)-[Album]-[Genre]-FormatNumber([Track #, 0])-[Media Type]

This suggests that the root cause of the false duplicates is that it is not considering Disc # as part of the duplicate detection. I've confirmed this with my own tests.

That suggests that the workaround you suggested will work to block duplicate detection, but it will likely have other side effects as Track # within an album is usually a way you want to sort. A variation on your approach would be to set the Track # to a sequential for all tracks in an album. Example:

Album 1 Disc 1 Track 1
Album 1 Disc 1 Track 2
Album 1 Disc 2 Track 3
Album 2 Disc 1 Track 1
Album 2 Disc 1 Track 2
Album 2 Disc 2 Track 3
....
Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: How does JRiver identify duplicate files?
« Reply #27 on: May 27, 2016, 05:48:13 pm »

MC can do all of this manipulation for you if you'd like.  You don't need to use an external tag editor.

Make a new library field like original_track_no .  Then use the tagging window in MC to copy the track # field to the new field.  Now, you can build a quickie expression to make all tracks within an album unique.  Just select all the tracks in the album and paste this expression into the [Track #] field:

=[disc #][track #]

Now all of the track numbers will look like: 11, 12, 13, 14, 21, 22, 23, 24, etc.

You can do this for many albums at once if you would like.  Because it's an expression, it will work on as many files as you want.

Brian.

Logged

Magic_Randy

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2350
  • I used to be indecisive, but now I'm not so sure..
Re: How does JRiver identify duplicate files?
« Reply #28 on: May 27, 2016, 07:39:33 pm »

MC can do all of this manipulation for you if you'd like.  You don't need to use an external tag editor.

Make a new library field like original_track_no .  Then use the tagging window in MC to copy the track # field to the new field.  Now, you can build a quickie expression to make all tracks within an album unique.  Just select all the tracks in the album and paste this expression into the [Track #] field:

=[disc #][track #]

Now all of the track numbers will look like: 11, 12, 13, 14, 21, 22, 23, 24, etc.

You can do this for many albums at once if you would like.  Because it's an expression, it will work on as many files as you want.

Brian.



Good idea Brian.

Werner, I suggest you try this on one of your trouble albums to see if solves your problem.

I'll hold off awhile to see if JRiver fixes the issue. I'm only having a problem on multi-disc albums that repeat the same song many times. I can easily identify these and apply a workaround to them if need be.

Randy
Logged

Werner

  • Junior Woodchuck
  • **
  • Posts: 72
Re: How does JRiver identify duplicate files?
« Reply #29 on: May 28, 2016, 01:44:53 am »

It doesn't help. There are albums with the same name which contain tracks with the same title. (Beethovens Symphonies are Beethovens Symphonies, the difference is just the conductor. May be not even this, but only the year... And it’s nearly sure that the CD’s stricture is identical and so the titles.) What helps is only to make the track number unique. And it’s not so difficult. (It’s stupid but not difficult...) The original track number is saved, so I can restore it. And the listing in JRiver uses the Original track number. My handheld (COWON Plenue D) doesn’t display the track number because I use the directory structure, so all is well...
Anyway: I did it took not too much time.

The best solution would be to kill this feature. Or to let the user decide if he wants to use it. Or let the user decide how duplicates are detected. The best would be to let me decide if I want to have duplicates on my handheld or not. It may happen that I want. I think JRiver should not decide for me. (I’m over 18 years old and JRiver is not Apple. – And I hate Apple’s paternal attitude as well.)
Logged

Magic_Randy

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2350
  • I used to be indecisive, but now I'm not so sure..
Re: How does JRiver identify duplicate files?
« Reply #30 on: May 28, 2016, 06:47:49 am »

It doesn't help. There are albums with the same name which contain tracks with the same title. (Beethovens Symphonies are Beethovens Symphonies, the difference is just the conductor. May be not even this, but only the year... And it’s nearly sure that the CD’s stricture is identical and so the titles.) What helps is only to make the track number unique. And it’s not so difficult. (It’s stupid but not difficult...) The original track number is saved, so I can restore it. And the listing in JRiver uses the Original track number. My handheld (COWON Plenue D) doesn’t display the track number because I use the directory structure, so all is well...
Anyway: I did it took not too much time.

The best solution would be to kill this feature. Or to let the user decide if he wants to use it. Or let the user decide how duplicates are detected. The best would be to let me decide if I want to have duplicates on my handheld or not. It may happen that I want. I think JRiver should not decide for me. (I’m over 18 years old and JRiver is not Apple. – And I hate Apple’s paternal attitude as well.)

I understand. My problem is more limited in scope and if JRiver considered Disc # my false duplicates would go away. But I agree that the best solution (unless I'm missing something) is to not check for duplicates at all.
Logged
Pages: [1]   Go Up