INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Idea: ~Similar=[artist],[name] (Music)  (Read 11947 times)

chriswale

  • World Citizen
  • ***
  • Posts: 173
Idea: ~Similar=[artist],[name] (Music)
« on: August 15, 2009, 06:24:01 am »

Hi,

Recently I have been eliminating duplicate songs from my collection.
I use the search filter ~dup=[artist],[name].  However, many duplicate songs don't have the same identical artist and name.

I suggest a SIMILAR filter to help me find those duplicate songs more easily!
~similar=[artist],[name]

Below are some examples of duplicates that aren't detected with the ~dup filter:

Example 1:
  • Mariah Carey - Obsession
    Mariah Carey - Obsesion

Example 2:
  • Barcelona - Let Go
    Barcelona - Let Go (Original Mix)

Example 3:
  • David Guetta feat. Kelly Rowland - When Love Takes Over
    David Guetta - When Love Takes Over (feat. Kelly Rowland)

I just copied 24 of my lounge CD's to the computer, it took ages to find all the duplicates manually.
A similar filter would help loads!

Who else thinks this is a good idea for Media Center 14?
Or is there already a way to do this and I am just missing something?

Thanks!
Logged

Listener

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1084
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #1 on: August 15, 2009, 11:40:04 am »

I use panes views to see any extra or near-duplicate values for tags.  A tag value appears only once in the pane list and they are alphabetical order.   

Sorting on the right set of fields can also group near-duplicates close together so that both the original and the variant can be seen on the screen at the same time.  Sometimes sorting on the file name helps find lost files or files with mis-spelled tags.

Searching can be useful in looking for variant spellings too.

Bill
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #2 on: August 16, 2009, 07:31:59 am »

I have a few thousand songs. When adding new music to my collection I can't possibly go through a list of hundreds of artists / and thousands of song names each time.

A similar filter would help so much!
In fact, I think that a similar filter would be more useful than a duplicate filter.

All you would need to do is add new music to you collection, filter for similar songs and then choose which to keep and which to delete.
Show similar songs based on [artist] and [song name]. You have to agree that it would be a useful feature!
Logged

Listener

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1084
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #3 on: August 16, 2009, 11:21:09 am »

I'm not arguing against your proposed feature.  I'm pointing out ways to attack the problem with MC as it stands.

When I rip some CDs or import a group of music files,  I try to verify the tags as part of the process.  The last step is to look at the new files in the view I normally use.  I click on tag values in panes to be sure that the new music is present and that the tag is spelled the way as for other music files.  It takes a minute or two for 6-10 CDs worth of music.

> I have a few thousand songs

That isn't a terribly large collection.

> I just copied 24 of my lounge CD's to the computer, it took ages to find all the duplicates manually.

 
It would take me  10 minutes or less to verify the tag values in 24 CDs I purchased.  I don't know what you mean by lounge CDs. If that means collections with many different artists, then it will take longer than for Cds with one or a few artists per CD.

Choose a view with panes for each tag you want to check.  (Make a new view with panes for all the tags you need to check for near duplicates if you don't normally use panes.).

For example, Genre, Artist and Name (Song).  For each artist, scroll down in the Artist Pane and look for nearby near duplicates.  If you find near duplicate values, click on all variants of the spelling in the Artist pane.  Make sure the Tag window is shown.  In the Tag window, use the drop-down list for artist to select the single wording you want.  (The tag values present for the selected files are shown at the top of the list before the values for all files in your collection.)  Now look at the list of files and be sure that both new files and older ones with the same Artist tag appear as expected.

You can do this for the Name (Song title) field as well.

I understand that you are campaigning for a new feature. That's fine.  However, anybody else who reads this thread might benefit from techniques that use existing MC features.  MC has a rich set of features that can often to used solve tag editing problems.  I've been using MC for over 3 years.  I'm describing what I actually do when I rip CDs or import files.

Bill




 
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #4 on: August 17, 2009, 02:57:01 am »

When adding artists albums, I agree, it would not take too much time to do what you suggest.  It's so quick I wouldn't even use the pane view. For albums containing various artists, checking the new songs against a current collection is not as easy.

My (find similar) feature suggestion would be useful to clean up an existing library of music.

Say one has a collection of 20000 songs and has removed all the identical duplicates, how then would they find the duplicates that have slight tag differences [artist] or [name]?  You would have to manually scan the entire list. This would take a considerable amount of time and mistakes could be made.  A find similar songs would solve the problem.

I scanned my list of songs for similar tracks this weekend. I took me the entire weekend and I found that +-5% of my collection consisted of undetected duplicates!  A mammoth task.

A friend of mine has a collection of music which is not as nearly organized as mine is. Without a (find similar) feature, it will take him years.
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #5 on: August 17, 2009, 10:05:43 am »

I did suggest fuzzy search a few yrs back for this very purpose but nothing came of it :(

How to tell where similar ends ?
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #6 on: August 17, 2009, 10:56:18 am »

Agreed...

Another consideration would be how to avoid reviewing the same similar tracks each time...

For example, you might have a similar result like:
  • David Guetta - When Love Takes Over (Original) [3:33] [128kbps]
    David Guetta - When Love Takes Over (feat. Kelly Rowland) [3:33] [192kbps]
    David Guetta - When Love Takes Over (Tiesto Remix) [6:40] [320kbps]
    David Guetta (feat. Kelly Rowland) - When Love Takes Over [3:50] [320kbps]

3 of the above are duplicates [3:33] [3:33] [3:50], and one remix [6:40].
After reviewing the songs I choose to delete the two [3:33] songs.

ending up with...
  • David Guetta (feat. Kelly Rowland) - When Love Takes Over [3:50] [320kbps]
    David Guetta - When Love Takes Over (Tiesto Remix) [6:40] [320kbps]

now the question is, how would one avoid having to compare the songs each and every time they search for similar tracks?
And what if another version of the same song is added later on and need to see the similar tracks for review again?

other considerations would be:
duration
bitrate: 128, 192, 320
file type: mp3, flac, etc...


I'll see if I can find a music sorting tool that already includes this feature. I'll need it when adding new music to my collection.
I'll post if I find anything. Otherwise I'll have to manually check each new song against my collection when adding new music.
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #7 on: August 17, 2009, 12:31:12 pm »

Another consideration would be how to avoid reviewing the same similar tracks each time...

how would one avoid having to compare the songs each and every time they search for similar tracks?

My (partial) solution for this is to do with tagging..

David Guetta - When Love Takes Over (feat. Kelly Rowland) [3:33] [192kbps]
David Guetta (feat. Kelly Rowland) - When Love Takes Over [3:50] [320kbps]

Always put the feat. in the [Name] field, because otherwise the two tracks are exactly the same. You could create a custom field called [Featured Artist] but that just makes things unwieldy in the long run.

And what if another version of the same song is added later on and need to see the similar tracks for review again?

There is no solution here, cept to do it manually :(

As i listen to a VA, i do a Locate-> artist to see what i have, and find misspellings this way, followed by an Alt+<- to get back to PN.

A future search for an artist gets all tracks by Guetta regardless of what the mix or the featured artist is.

Or a n=[when love" finds all tracks that being with "when love"

Over time you will fix 80% of the tags and opt to forget the remaining 20% :)
Logged

Listener

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1084
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #8 on: August 17, 2009, 12:55:39 pm »


> David Guetta - When Love Takes Over (feat. Kelly Rowland) [3:33] [192kbps]
> David Guetta (feat. Kelly Rowland) - When Love Takes Over [3:50] [320kbps]

> Always put the feat. in the [Name] field, because otherwise the two tracks are exactly the same.
> You could create a custom field called [Featured Artist] but that just makes things unwieldy in the long run.

Some of the classical music recordings I really like have been remastered several times.  I use a custom field named version to differentiate such differences.  Most of my classical music views have a column for the version field.  That works fine as long as I have the screen real estate for 5 panes.

> There is no solution here, cept to do it manually :(

I agree that you have to do some manual work to get your tags right.  Before you start, you should think through what you want from the tags and the views you use them in.  The better your thinking, the faster the manual work will go and the more useful the results.

> Over time you will fix 80% of the tags and opt to forget the remaining 20% :)

Amen.  A tagging error sometimes comes to my attention when I am browsing for music.  Some I fix in a second or two and some I just leave as is.
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #9 on: August 17, 2009, 01:15:04 pm »

Another problem that i haven't got my head around is aliases or pseudonyms.

Many times an artist will opt to use different names for a different sound.

Its nice to be able to see all the work an artist has created

How to deal with this  ?

- You could use nested fields with the parent being the orginal name of the artist but this forces you to use a list type field like [Artists]

- You could create a seperate field called [Orginal Artist] and put the name there.

Course each custom field you create requires an extra column or pane in a vewscheme or filelist. It takes little longer to display in a viewscheme that displays the whole library.

I decided to leave it as it and not bother :)
Logged

eba

  • Galactic Citizen
  • ****
  • Posts: 351
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #10 on: August 17, 2009, 04:12:39 pm »

Another problem that i haven't got my head around is aliases or pseudonyms.

Many times an artist will opt to use different names for a different sound.

Its nice to be able to see all the work an artist has created

How to deal with this  ?

The way I sometimes deal with this one is to use a single artist in the album artist field.

For example, I like to see Mark Knopfler's solo stuff together with Dire Straits, as there is little difference in sound, more it is just an evolution of Knopfler's work as he was always the driving force for Dire Straits anyway...so I put Dire Straits into the album artist for Knopfler albums (this way rather than the other as Dire Straits is the better known name that first comes to mind)

Likewise, for artists/bands where sometime they use their own names and sometimes they use a band name, where the band name is more a brand name than really meaning anything about the creative input.
e.g.
The Waterboys/Mike Scott
Cockney Rebel/Steve Harley & Cockney Rebel/Steve Harley

The only trouble I've had with this is it stops the cover art from internet feature working...

steveklein

  • Galactic Citizen
  • ****
  • Posts: 478
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #11 on: August 18, 2009, 11:00:40 am »

I'm glad I'm not the only person that likes that David Guetta/Kelly Rowland song  8)
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #12 on: August 18, 2009, 12:52:29 pm »

We could do everything manually... but then we wouldn't need useful software like, um... J River Media Center!
While I'm at it... I might just burn all my music back to CD and label with a permanent marker!
NO, just kidding!

While there are many methods to find duplicates manually, my goal, with my original post, was to discuss a new feature request to help automate the process. 
No one has expressed any interest in the feature, just suggested alternatives for doing it manually. While there is merit in these suggestions, thank you, I would now like to know if anyone else votes in favor of the [similar search] feature to find all those hidden duplicates?

So... are you for or against?


Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #13 on: August 18, 2009, 01:54:20 pm »

Defnitely for, but how its to be done escapes me  ?

Did you have any luck locating a similar feature with other software ?
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #14 on: August 19, 2009, 04:44:41 am »

I came across the following with a google search:

Similarity - http://www.music-similarity.com
Abee MP3 Duplicate - http://abeetech.com/mp3-duplicates-finder
Phelix - http://www.pekarna.si
EF Duplicate - http://www.efsoftware.com/d3/e.htm
Ashisoft Duplicate MP3 Finder - http://www.duplicate-finder-pro.com/duplicate-mp3-finder.htm
Easy Duplicate Finder - http://www.easyduplicatefinder.com/

I'll test them later and let you know if there is anything worthwhile for safely finding duplicates based on similarity.
Has anyone else got any suggestions as to an alternative tool for this job?

Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #15 on: August 20, 2009, 06:12:33 pm »

Okay, so I tested the programs above. Fortunately I have a virtual machine to do this.

Below are my recommendations for finding similar or duplicate music in your collection. This of course, until J River Media Center includes a similar feature for us.
A note to the J River dev team... I have some ideas for this, so please contact me via e-mail if you would like me to put these ideas forward.  

Similarity - http://www.music-similarity.com
Works well. Scans collection and then presents results with two columns, % data similarity and % tags similarity. You are able to sort by either.
The one main benefit of this program was that it detected a identical songs based on data, when the tags and file names were completely different.

EF Duplicate - http://www.efsoftware.com/d3/e.htm
Very simple, works well. Allows you to specify the similarity to scan for. i.e. Artist [70%] Name [50%].
Nice layout of results.

Easy Duplicate Finder - http://www.easyduplicatefinder.com/
Very simple, works well. Allows you to specify the similarity to scan for. i.e. Artist [70%] Name [50%].
Nice layout of results.

I did not like any of the other programs. But I am not going to get into reasons here.
Although the three programs above work well, they are by no means as good as they should be.
ONE feature that is missing is the ability to specify: [don't show these duplicates for future similarty scans]... so that you don't have to review the same list over and over again.

Hopefully J River Media Center will incorporate a 'similar feature' one day soon.
It will be especially useful for when initially sorting a music collection or when adding a whole bunch of new music to an existing collection, etc...


In the meantime, give Similarty a go... it is fun.
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #16 on: August 21, 2009, 04:19:07 am »

I would like to also suggest that if a similar feature is incorporated into Media Center it should be able to find similar songs based on the similarity:
  • Tags
    File Name
    Data (byte by byte)

Similarity (above) does this, I cannot believe the number of duplicate songs I had. Even after spending days getting rid of them manually using all sorts of tricks.

Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #17 on: August 21, 2009, 04:27:22 pm »

similarity looks nice :)

But....what did it say for the below ?

> David Guetta - When Love Takes Over (feat. Kelly Rowland) [3:33] [192kbps]
> David Guetta (feat. Kelly Rowland) - When Love Takes Over [3:50] [320kbps]

Imma assume the filenames are different

There is one potential issue here and that is scalability.

MC handles 100s of thousands of files. But i bet those programs prolly top out with just thousands.

How long would it take if run on a library of say 10k, 50k & 100k files, doubting it will be linear :)
Logged

pbair

  • Recent member
  • *
  • Posts: 33
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #18 on: August 22, 2009, 01:45:40 am »

How long would it take if run on a library of say 10k, 50k & 100k files, doubting it will be linear :)

I tried Similarity yesterday and it took 1 hour to scan almost 5,000 mp3's.

A nice little tool for scanning small batches IMO. For large batches, it would be more useful to me if it had the extensive sorting/filtering capability that MC has.

...just noticed that Similarity advertises the capability to "export results to a playlist".  I haven't tried it, but importing the playlist into MC may be more useful for my needs.
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #19 on: August 25, 2009, 05:51:00 pm »

Similarity is awesome! I have been testing it... you cannot believe what it finds for you! I don't know how it does it!!! Awesome.
Very pleased with this one.
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #20 on: August 26, 2009, 06:11:19 am »

Any inputs on how long it takes to analyse a given amount of files ?

Given pbair's comment above its unlikely that a smartlist would provide the desired results in as short a time as we have come to expect with MC.
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #21 on: August 26, 2009, 07:41:53 am »

It doesn't take too long to analyze the files with similarity, it takes far longer to check and choose which of the duplicate files to keep.
Last night I spent hours clearing duplicates from my library! Similarity found identical songs that we completely different file names and tags! Awesome!

I suggest running the search over night and reviewing the next morning.  I have an exceptionally well organized library... and it took me a couple of hours to clear out the duplicates (I like to check each myself).  Feel great now that I got rid of the clunk (*clutter junk).

It would be awesome if MC could include a feature like this! Especially for when importing new files. It could be an optional feature.
I'm just glad I found similarity for now!
Logged

Chico

  • Regular Member
  • World Citizen
  • ***
  • Posts: 136
  • Life is good!
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #22 on: August 26, 2009, 11:39:32 am »

I basically don't worry about duplicates as any dup that I have is because they are different albums.  Since there is no way (that I am aware of) to assign multiple albums to a file tag, the album in question would be incomplete if I removed all my duplicate songs.  If you like incomplete albums, continue to remove all duplicates.  My collection is based on Album Arist, then album, so let's say I want to remove all of the duplicates for, Steppenwolf - Born to Be Wild.  There must be 20 albums out there with that song on it ranging from multiple Steppenwolf albums, Compilations (Greatest Hits of the 70's, etc.) and Soundtracks (Easy Rider, etc..).   If I remove all my duplicates and deside I want to hear the Easy Rider album, then guess what?  No "Born to be Wild" in the soundtrack!
Removing Dupes are fine if you don't care about having incomplete albums.
I prefer to keep them all as disk space is not an issue anymore and complete albums are.
Logged
JRiver Media Center... If you don't have it, I don't want to hear it!

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #23 on: August 26, 2009, 11:47:02 am »

I basically don't worry about duplicates as any dup that I have is because they are different albums.  Since there is no way (that I am aware of) to assign multiple albums to a file tag, the album in question would be incomplete if I removed all my duplicate songs.

Absolutely and i dont want to remove duplicate tracks either. The utility is more to regularise tagging than anything.

I did intially want to remove stuff thinking why waste the space but i later realised it was better to keep the albums intact. If i dont like many tracks i dump the album entirely.

But this is because we come from an album centric viewpoint.
Logged

Chico

  • Regular Member
  • World Citizen
  • ***
  • Posts: 136
  • Life is good!
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #24 on: August 26, 2009, 11:59:31 am »

Absolutely and i dont want to remove duplicate tracks either. The utility is more to regularise tagging than anything.

I did intially want to remove stuff thinking why waste the space but i later realised it was better to keep the albums intact. If i dont like many tracks i dump the album entirely.

But this is because we come from an album centric viewpoint.

Point well taken!
Logged
JRiver Media Center... If you don't have it, I don't want to hear it!

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #25 on: August 26, 2009, 01:23:45 pm »

There are 'album people' and there are 'song people', I am 'song people'.  The tag that I care about the least is [album].
I see albums as a grouping system for marketing groups of songs. Thats it. But thats me. Each to their own.  

I prefer using tags such as [year] [rating] [genre] [genre intensity] [mood] etc... vs. listening to groups of songs in the same order (albums).  

Regardless of our filing and listening preferences, a duplicate/similar song finder is a necessary feature... and it doesn't force you to delete one or the other, it simple makes you aware of what you have repeated. You can choose what to do (ignore, remove A, remove B). Simple.
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #26 on: August 26, 2009, 01:26:52 pm »

Apart from a duplicate/similar song feature, it looks like there is a need for a great feature!
You should be able to label a song as part of multiple albums (multiple albums / multiple track numbers).  And it should be easy!

I'll put that across as a feature request!  If implimented, you wont need all those duplicates. But you will need a similar/duplicate song finder! :)

Here is the link to the feature request for multiple albums to each song:
http://yabb.jriver.com/interact/index.php?topic=53563.0
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #27 on: August 26, 2009, 01:39:55 pm »

However, regardless of or filing and listening preferences, a duplicate song finder doesn't force you to delete one or the other, it simple makes you aware of what you have repeated. You can choose what to do (ignore, remove A, remove B). Simple.

Exactly. The problem with this is i cant use it on my data in MC. Many are mixed albums or cue albums that dont really exist as seperate files.

More i think of it if similarity can accept a list (in a given format) i might just use that instead.

Quote
You should be able to label a song as part of multiple albums (multiple albums / multiple track numbers).  And it should be easy!

I'll put that across as a feature request!

That request has been pending for 5+ years now i think. If it were easy i think they would have done it by now. I've since weaned myself away for this, maybe it  indirectly contributed to me seeing albums as one unit instead of a mere bunch of tracks.

Can you point to any software that has this ability ?

The only thing that comes to mind is links (shortcuts) in the filesystem.

The only similar functionality in MC is the cue system, once done allows several links in MC to the same file.

But its a manual process as in you need to make the cue files which quickly becomes tedious and impractical.
Logged

Listener

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1084
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #28 on: August 26, 2009, 03:39:36 pm »

Spurred by this thread, I went through  about 7700 pop and jazz files with about 4900 different song titles.  I was not removing files but just fixing differences in names.  I used a pane with the (track) name in JRiver MC and just scanned downward looking for name variations.  When I found near duplicates, I selected both (or all) variants in the pane and examined the file list produced.  If I was convinced they were all the same song, I copied one version of the name into the Tag window name field.

After I finished, there were about 4700 distinct song titles.  It took about 5 hours of work with some distractions along the way.

Scanning the artist pane took about 15 minutes.

Bill


 
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #29 on: August 26, 2009, 05:48:52 pm »

Bill. Try [Similarity] software. You will be surprized... I guarantee it.

It works very well if you have two screens... otherwise it's going to be a bit painful. Unless you print the results.
Try sorting by data % / tag %.
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #30 on: August 26, 2009, 06:06:50 pm »

Remember, duplicate/similarity search functionality will be especially useful for when adding a bulk amount of music to an existing library.
Not just for eliminating current duplicates.
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #31 on: August 27, 2009, 01:48:45 am »

I tried Similarity yesterday and it took 1 hour to scan almost 5,000 mp3's.

It ocurred to me that Similarity also does in-file checking. In the sense if the tags are not there it tries to inspect the actual content of the file possibly to make a signature.

Is it possible to turn-off this in-file checking in similarity ?

Idea being to get it to use just tags & filenames to do comparison.

How fast is it now ?

Spurred by this thread, I went through  about 7700 pop and jazz files with about 4900 different song titles.  I was not removing files but just fixing differences in names.  I used a pane with the (track) name in JRiver MC and just scanned downward looking for name variations.  When I found near duplicates, I selected both (or all) variants in the pane and examined the file list produced.  If I was convinced they were all the same song, I copied one version of the name into the Tag window name field.

After I finished, there were about 4700 distinct song titles.  It took about 5 hours of work with some distractions along the way.

Scanning the artist pane took about 15 minutes.

Its curious you used the [Name] field in the Pane first instead of [Artist]. There are fewer [Artists] than track [Name]. Also its not uncommon to have the same [Name] but differnt [Artist] as in covers of a track.

However what throws me is tags of the type "Artist A & Artist B" whose names sometimes get interchanged or one is missing so they dont appear together or even in proximity in the pane. I guess using [Name] in the pane would catch this but its gonna be one very looooooong list.
Logged

Listener

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1084
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #32 on: August 27, 2009, 11:19:56 am »

> Its curious you used the [Name] field in the Pane first instead of [Artist].

The view I used has [Album], [Artist] and [Name] panes.  I was looking for variants in the Name field.

Sometimes, there would be several recordings of a song by the same artist.  However, it was more common to have recordings of a song by different Artists.  My objective was to get uniform names so that a song appeared once in the [Name] pane.


> There are fewer [Artists] than track [Name]. Also its not uncommon to have the same [Name] but differnt [Artist] as in covers of a track.

If I thought that 2 artists had recorded the same song, I might select those 2 artists in the artist pane and see whether any song appeared twice (with near-duplicate names.)  However, I would have no clue about most of the near duplicates.  Selecting a couple of artists to see if any near-duplicates occurred would be inefficient and probably not effective.

> However what throws me is tags of the type "Artist A & Artist B" whose names sometimes get interchanged or one is missing so they
> dont appear together or even in proximity in the pane. I guess using [Name] in the pane would catch this but its gonna be one very
> looooooong list.

In my Pop (songs) & Jazz view, the Artist pane has 540 entries.  I ran through that list in 15 minutes, making a few corrections.  I looked for near-duplicate spellings but I also just looked for anything out of place.  I'm sure that I don't catch everything in this sort of scan but  I'm trying to improve the situation in practical ways.

Bill
Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42445
  • Shoes gone again!
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #33 on: August 27, 2009, 04:49:09 pm »

I like this idea, but the algorithm for fuzzy matching is orders of magnitude slower than the current no-case match system that ~dup and ~nodup use.

Since, as an example, you would consider these three strings a match:
abc
ab
bc

I can't see a way other than a brute-force search (as opposed to hashing / mapping) to find all the similar items.

As an example, if we use the Levenshtein distance in the brute-force comparison, it's takes about a minute to handle a library of 8000 audio files when looking at name and artist.

To further complicate things, aren't strings like this mathematically similar (but really shouldn't be):
Etude No. 2
Etude No. 3
Etude No. 4

Suggestions welcome.
Logged
Matt Ashland, JRiver Media Center

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #34 on: August 27, 2009, 06:40:21 pm »

Thanks Matt.
'Similarity' (alternative software) searches songs and then produces results based on the similarity of the (actual sound data) and (tags).
The user is able to sort by highest-lowest % match. The user is then able to scroll the list of matches and compare songs and decide what to keep.

When I tested this software, I found that duplicate songs were in these ranges:

  • 100% - 75% data similarity
  • 100% - 90% tag similarity
  • 74% - 50% data similarity with 89% - 60% tag similarity

I originally suggested this feature as an aid to the user to find duplicates. As an alternative to manually scanning all the songs.
Logged

leezer3

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1589
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #35 on: August 27, 2009, 07:34:28 pm »

Perhaps this shouldn't be implemented as a base expression function?
A new Wizard instead, with appropriate warnings about the lengths of time taken?
This would also allow the implementing of match levels (IE. Match to % of tags etc. etc.) as nicely selectable presets.

TBQH, duration +/- 10 secs should also be included in any comparison too, as this is vital for determining whether something is identical; Live versions and remixes are things I want to keep, which would probably be caught in the first instance by a Name/ Artist comparison.
I'd personally search on Name/ Duration first, with Artist as a secondary paramater, as I'd say this is much more likely to catch compilations, greatest hits and typos :)

-Leezer-
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #36 on: August 28, 2009, 01:07:23 am »

Perhaps this shouldn't be implemented as a base expression function?
A new Wizard instead, with appropriate warnings about the lengths of time taken?
This would also allow the implementing of match levels (IE. Match to % of tags etc. etc.) as nicely selectable presets.

Yep, an expression as the thread title suggests is out so an extra item in Library Tools with its own dialog ike Analyse Audio.

Now as new files are added to the library...would MC still require a re-analysis of the whole library ?

Maybe if some sort of persistent index was built up during string analysis which could be used to speed up later searches.

To further complicate things, aren't strings like this mathematically similar (but really shouldn't be):
Etude No. 2
Etude No. 3
Etude No. 4

Suggestions welcome.

Right, because there is no difference between them and

Etude No. 2
Etudes No. 2

Edit distance is 1 in both cases. But the latter is a typo whilst your example is not.

So an alternative method is required. I'm tempted to wonder why not compare previously computed AA data for these files. But a more robust method would prolly involve acoustic fingerprints of some sort like similarity already does. Would have to be smart enough to detect the same track encoded via different encoders as well as if bits were missing from the start or end.

For those interested in more do an advanced search for "finger print" and set message age to 9999 days.

Course now, next thing ppl will want is this, but thats for another thread and another time ;)
Logged

chriswale

  • World Citizen
  • ***
  • Posts: 173
Re: Idea: ~Similar=[artist],[name] (Music)
« Reply #37 on: August 28, 2009, 05:34:40 am »

Quote
Perhaps this shouldn't be implemented as a base expression function?

After using 'Similarity' I agree completely!
Logged
Pages: [1]   Go Up