INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Removing duplicates  (Read 2673 times)

Dave T

  • Regular Member
  • World Citizen
  • ***
  • Posts: 171
Removing duplicates
« on: February 26, 2012, 10:54:40 am »

I realize this is a FAQ, but I've searched the forum and didn't see a post with the answer to my question.

I am recently getting back into playing music in MC after a period of using Pandora for most of my music listening.  I am just now realizing that I have literally thousands of duplicate tracks in my music library.  I'm not sure how I did this, but it looks like at some point I renamed my mp3s with a different file naming scheme, and ended up with two copies of my files - one with the old and one with the new naming scheme.

Is there an automated way to remove the duplicates?  I'd like to have something identify duplicates in the file system based on duplicate album/song name/track number in the mp3 tags - and then have some criteria for deciding which duplicate to delete.  I am a programmer and can and will write some code to do this fairly easily if I need to - but surely someone else has already done this? 

Or, is there a way to do this within MC?  It seems like you could use MC to rename and relocate tracks based on tags, and thereby move all duplicates to the same file in the same place - similar to how I got into this mess.  I didn't see anyone talking about using this technique in the forum, though...

Here's hoping!

- Dave
Logged

wig

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 750
Re: Removing duplicates
« Reply #1 on: February 26, 2012, 12:32:07 pm »

    Is there an automated way to remove the duplicates?  I'd like to have something identify duplicates in the file system based on duplicate album/song name/track number in the mp3 tags - and then have some criteria for deciding which duplicate to delete.  I am a programmer and can and will write some code to do this fairly easily if I need to - but surely someone else has already done this?  

    Or, is there a way to do this within MC?  It seems like you could use MC to rename and relocate tracks based on tags, and thereby move all duplicates to the same file in the same place - similar to how I got into this mess.  I didn't see anyone talking about using this technique in the forum, though...


    Dave, finding duplicates is easy, but there is no automated way to remove duplicates that I'm aware of.

    To find ONLY duplicates (and keep one copy of files that have duplicates)

    • Create new Smartlist by clicking F9.
    • Under Modify Results, choose Remove Duplicates Of and check the fields #, Name, and Album
    • Name the Smartlist Originals, and click Ok

    This will create a list of all the 'originals' of your tracks. Now you can create a second smartlist that shows all the tracks that aren't in the list.
    • Create new Smartlist by clicking F9.
    • Under Rules, choose Playlist is not any Originals
    • Create a second Rule, Media Type is Audio (this is in case you have non-audio contents in your library)
    • Name the Smartlist Copies, and click Ok

    You can use the Copies list to delete the duplicate tracks.
    Logged

    Dave T

    • Regular Member
    • World Citizen
    • ***
    • Posts: 171
    Re: Removing duplicates
    « Reply #2 on: February 26, 2012, 12:51:05 pm »

    Thanks.  I understand the gist of what you're saying, but I'm not familiar enough with SmartLists to fully understand it yet.

    So can I use this technique to move or delete all but one of each duplicated file - leaving one copy in the original place?  And can I do them all at once?  Again - I literally have thousands of duplicates, so any technique where I'll have to manually deal with each individual duplicate won't work for me.
    Logged

    wig

    • MC Beta Team
    • Citizen of the Universe
    • *****
    • Posts: 750
    Re: Removing duplicates
    « Reply #3 on: February 26, 2012, 01:04:29 pm »

    Thanks.  I understand the gist of what you're saying, but I'm not familiar enough with SmartLists to fully understand it yet.

    So can I use this technique to move or delete all but one of each duplicated file - leaving one copy in the original place?  And can I do them all at once?  Again - I literally have thousands of duplicates, so any technique where I'll have to manually deal with each individual duplicate won't work for me.

    Following my instructions, you'll create a dynamic list that will show the current duplicates each time you click on it. It will show ALL copies of the track, so you couldn't just delete all the tracks that were returned. However, it could still be a huge time-saver.

    For example, you could sort the smartlist by filename, and then delete large sections of the duplicates in that manner.

    My best advice it get get familiar with smartlists, feel comfortable with the various options, and only then start your deletion spree  ;D
    Logged

    Dave T

    • Regular Member
    • World Citizen
    • ***
    • Posts: 171
    Re: Removing duplicates
    « Reply #4 on: February 26, 2012, 01:09:54 pm »

    It still sounds like I'd be reading through the list and manually deleting duplicates.  That would take days.  Hopefully there's an easier way, otherwise I'm going to code something.
    Logged

    wig

    • MC Beta Team
    • Citizen of the Universe
    • *****
    • Posts: 750
    Re: Removing duplicates
    « Reply #5 on: February 26, 2012, 01:18:55 pm »

    It still sounds like I'd be reading through the list and manually deleting duplicates.  That would take days.  Hopefully there's an easier way, otherwise I'm going to code something.

    I doubt it would take long at all, but I've never been opposed to a good piece of code being brought into the world.  ;D

    MrC or one of those other folks might have a more elegant solution.

    JRiver does have an expression language, btw.

    http://wiki.jriver.com/index.php/Media_Center_expression_language
    http://wiki.jriver.com/index.php/Smartlist_and_Search_-_Rules_and_Modifiers
    Logged

    wig

    • MC Beta Team
    • Citizen of the Universe
    • *****
    • Posts: 750
    Re: Removing duplicates
    « Reply #6 on: February 26, 2012, 01:24:40 pm »

    Upon further inspection, there is actually a modifier that will keep one copy.

    ~nodup

    In fact, there is a Remove Duplicates Of option in the Smartlist window now. Not sure how long it has been there.

    I've never used this option, but it seems to be exactly what you're looking for.

    I've updated my instructions above. It involves two smartlist instead of one, but will only take a minute or two to create.

     



    Logged

    MrC

    • Citizen of the Universe
    • *****
    • Posts: 10462
    • Your life is short. Give me your money.
    Re: Removing duplicates
    « Reply #7 on: February 26, 2012, 01:31:12 pm »

    A caveat with the ~nodups modifier.  It really wasn't designed for duplicate removal from the library; rather, it is more suited for playlist duplicate removal.  The problem is that there is no control over which "copy" is retained.  A simple matching of a couple of key tags is used to identify duplicates, but it may not be the set you want.

    If there is some unique and consistent property to your duplicates (such as directory location, date imported, file type, etc.), these properties can be used more reliably to ensure you're selecting the correct copy to be removed.

    Can you identify anything unique and consistent about the tracks?
    Logged
    The opinions I express represent my own folly.

    Dave T

    • Regular Member
    • World Citizen
    • ***
    • Posts: 171
    Re: Removing duplicates
    « Reply #8 on: February 26, 2012, 02:03:30 pm »

    No, I don't think there's anything I can use to identify which the dup to be removed is.  They're mostly, but not all, in the same directory.  A very common problem is that the file names are the same except one has a dash between the track number and the song name, and one has a space.  But there are other types of dups too.

    So if I can create a list of all the "extra" duplicates, that still leaves one and only one copy of each song - then I could delete everything in that list.  Is that doable?
    Logged

    MrC

    • Citizen of the Universe
    • *****
    • Posts: 10462
    • Your life is short. Give me your money.
    Re: Removing duplicates
    « Reply #9 on: February 27, 2012, 01:29:50 pm »

    In this case, there will not be a single automated way to accomplish this.

    So, let's take it one case at a time.  Let's start with duplicate removal based on file tags/MC properties.  See:

       http://wiki.jriver.com/index.php/Duplicate_Files

    After this, we can work on duplicate removal based upon the physical filename.
    Logged
    The opinions I express represent my own folly.

    marko

    • MC Beta Team
    • Citizen of the Universe
    • *****
    • Posts: 9143
    Re: Removing duplicates
    « Reply #10 on: February 27, 2012, 02:14:30 pm »

    Quote
    The problem is that there is no control over which "copy" is retained.

    This was my belief too, for a long time, until it was pointed out that if you apply a sort to the list, then remove the dupes, it keeps the first listed and removes the rest...

    Example:
    All duplicated audio, sorted by Artist, Name, File size:
    [Media Type]=[Audio] ~dup=[Artist],[Name] ~sort=[Artist],[Name],[File Size]-d

    Now, remove duplicates by adding a final "~nodup" modifier:
    [Media Type]=[Audio] ~dup=[Artist],[Name] ~sort=[Artist],[Name],[File Size]-d ~nodup=[Artist],[Name]

    And see how it retains the largest sized of each group of duplicates.
    Depending upon what you want, it is possible to exercise a little control over what is kept and what is removed, but you need to put a bit of thought into how to go about it.

    This means that, say you wanted to keep the largest files and delete the rest, you would make a smart list using the second example, let's call that "smartlist A"
    You would then create a second smartlist that included all the dupes, but excluded files in smartlist A
    [Media Type]=[Audio] ~dup=[Artist],[Name] ~sort=[Artist],[Name],[File Size]-d use the wizard to add the rule: Playlist | Is Not | Smartlist A

    Select the results and send to recycle bin.

    -marko

    MrC

    • Citizen of the Universe
    • *****
    • Posts: 10462
    • Your life is short. Give me your money.
    Re: Removing duplicates
    « Reply #11 on: February 27, 2012, 03:26:16 pm »

    Right, of course, and the clarification is a useful one.  The following sentence was also relevant to the point:

       A simple matching of a couple of key tags is used to identify duplicates, but it may not be the set you want.

    I've found it not uncommon to have a couple of dup files which differ only by a few seconds of silence or fadeout, so this confounds using file size as part of the dup detector.

    And unless tagging is consistent / accurate across the dups, properties such as [Name] and [Artist] aren't much use.  Hence my comments about dup removal from playlists vs. the file system (i.e. removal of the same track listed multiple times in a playlist vs. removal of one or more physical files which essentially match an unknown primary file).
    Logged
    The opinions I express represent my own folly.

    struct

    • Galactic Citizen
    • ****
    • Posts: 380
    Re: Removing duplicates
    « Reply #12 on: February 27, 2012, 03:52:04 pm »


    Select the results and send to recycle bin.

    -marko

    Or if you are a little bit scared that you might accidentally remove an extended live edition that has a larger filesize (or bitrate if you sort by that instead) because you are as bad with tagging as I am, you can use the Library Tools-> Rename and move them to a directory that is not imported by MC, then delete them from the libary.  You can then go and get them back if you find a mistake.

    Craig
    Logged

    wig

    • MC Beta Team
    • Citizen of the Universe
    • *****
    • Posts: 750
    Re: Removing duplicates
    « Reply #13 on: February 27, 2012, 05:00:56 pm »

    Now, remove duplicates by adding a final "~nodup" modifier:
    [Media Type]=[Audio] ~dup=[Artist],[Name] ~sort=[Artist],[Name],[File Size]-d ~nodup=[Artist],[Name]

    And see how it retains the largest sized of each group of duplicates.

    This means that, say you wanted to keep the largest files and delete the rest, you would make a smart list using the second example, let's call that "smartlist A"
    You would then create a second smartlist that included all the dupes, but excluded files in smartlist A
    [Media Type]=[Audio] ~dup=[Artist],[Name] ~sort=[Artist],[Name],[File Size]-d use the wizard to add the rule: Playlist | Is Not | Smartlist A

    The using of ~dup and ~nodup in same expression is interesting; I knew I'd learn something from this thread!

    I'd strongly recommend adding Track Number to your list modifiers in your example; I have several 'Legacy Edition' albums like this one that include multiple versions of a song on the same album. Only one version would be kept unless you mandate discrete track #s.
    Logged

    Dave T

    • Regular Member
    • World Citizen
    • ***
    • Posts: 171
    Re: Removing duplicates
    « Reply #14 on: February 27, 2012, 05:10:55 pm »

    Thanks for the replies!  Unfortunately, when I hadn't heard back for a while I got about halfway through writing a program to remove dups, but now that it sounds like there is a solution within MC after all, I want to try that.

    So I'm trying what Marko suggested, and it isn't working for me.  First of all, how do you edit smartlists using the syntax like you're describing?  In MC16, I only see how to create/edit smartlists using the wizard.

    I went ahead and created the two smartlists like you describe, using the wizard, and it kinda sorta works - but the smartlist of the files to delete is only showing a small fraction of what I need to remove.  I'm not sure why...  One question - would this still work if the files were identical (same file size)?  Most of my dups seem to differ only in the file name.  So what I want to do is to remove all but one of the dups, and if there is a biggest one - keep that.  If they are all the same, then I don't care which one gets removed.
    Logged

    MrC

    • Citizen of the Universe
    • *****
    • Posts: 10462
    • Your life is short. Give me your money.
    Re: Removing duplicates
    « Reply #15 on: February 27, 2012, 05:15:52 pm »

    First of all, how do you edit smartlists using the syntax like you're describing?  In MC16, I only see how to create/edit smartlists using the wizard.

    Use the Import/Export button.  Paste your expression.  You'll see the updated UI after you OK out of the Import/Export edit area.
    Logged
    The opinions I express represent my own folly.

    marko

    • MC Beta Team
    • Citizen of the Universe
    • *****
    • Posts: 9143
    Re: Removing duplicates
    « Reply #16 on: February 27, 2012, 06:09:54 pm »

    Bear in mind that I only offered those strings as examples of what was possible. When doing this for real, carefully consider the fields you use to track the duplicates. Adding the track number field too is good advice.
    Pages: [1]   Go Up