INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 [2] 3   Go Down

Author Topic: Duplicates Finder PlugIn  (Read 23773 times)

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #50 on: January 21, 2004, 03:49:52 pm »

JLee

One other thing i was thinking about, and had about an hour today to work on it is to take the artist name and song name strip it down and create a MD5 Hash for that.

Take a look at the updated pictures.

Basicly All Non A-Z Chrs Are Striped.

Song Names that are Like: "867-5309" would be striped down to Nothing, in this case the string reverts to the orginal string.

"867-5309/jenny" Charted At 02 In 1982

Listening to: '867-5309/jenny' from 'Sounds Of The Eighties (1982)' by 'Tommy Tutone' on Media Center 10
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #51 on: January 22, 2004, 02:52:14 pm »

Quote
Concatenate the first 5 characters (or less) of the artist name with the first 5 characters (or less) of the track title.

e.g. Fleetwood Mac - You make lovin' fun becomes FleetYouma and
Fleetwood Mac - You make loving fun (Radio edit) becomes FleetYouma

this sounds good, but how about we make this user selectable

0 = all of artist name and Song Name
5 = upto 5 chrs from each
6 = upto 6 chrs from each
etc....
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #52 on: January 22, 2004, 04:45:16 pm »

Quote
Concatenate the first 5 characters (or less) of the artist name with the first 5 characters (or less) of the track title.

e.g. Fleetwood Mac - You make lovin' fun becomes FleetYouma and
Fleetwood Mac - You make loving fun (Radio edit) becomes FleetYouma

this sounds good, but how about we make this user selectable

0 = all of artist name and Song Name
5 = upto 5 chrs from each
6 = upto 6 chrs from each
etc....
Great minds think alike hey King!  I made an excel macro that does a really good job of finding Dups.  It uses 5 char of each but I was just think about adding options like:
- High accuracy / more possible matches (this would use 7 char)
- Balanced (5 char)
- Low Accuracy / more possible matches (3 char).

Because I don't have your expertise I'm not able to actually update the tags from my program so I tile it with MC and use a specially constructed view scheme in tagging mode to manually work through and correct the dups.

My previous effort with Dups found about 1.5K out of 23K.  This macro instantly found a possible 3.7K duplicates!

It has a BIG drawback in that I cannot review and tag directly from the program.  I tried to use the program with a batch file and mjextman.exe commands to add the appropriate tracks to playing now where I could tag them (clumsy I know!).  This wouldn't work for me as any attempt to paste filename data into excel crashed MC due to the number of bytes being copied (my filenames are very long!).

However, it does prove that the algorithm works - for me anyway and the number of potential duplicates found was staggering.

I'll paste a link to it here in a few mins.
Logged

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #53 on: January 22, 2004, 05:38:39 pm »

Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #54 on: January 22, 2004, 06:06:16 pm »

Ok Picture Is Again Updated.

Min Is 5 Chrs, Max 256 Chrs Per Field (Artist name And Song Name)

I think This Is About It For Now, I May Play With It For A Day Or So And Then Put Out A Build For Everyone To Play With.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #55 on: January 23, 2004, 02:31:23 am »

Ok Picture Is Again Updated.

Min Is 5 Chrs, Max 256 Chrs Per Field (Artist name And Song Name)

I think This Is About It For Now, I May Play With It For A Day Or So And Then Put Out A Build For Everyone To Play With.


King.  Are you planning to do the 4-pass thing I mentioned earlier.  Through running my macro I've found a lot of instances where this has been useful.  As a reminder it concatenates:
- First 5 artist with Last 5 track name
- Last 5 with First 5
- First 5 with First 5
- Last 5 with Last 5

This helps as often there are extra characters added to the beginning or end of artist / song names e.g.

George Michael & Aretha Frankin - I knew you were waiting
George Michael - I knew you were waiting for me
Aretha Frankin & George Michael - I knew you were waiting for me

Logged

Zarius

  • Regular Member
  • World Citizen
  • ***
  • Posts: 178
  • Addicted to smilies.
Re:Duplicates Finder PlugIn
« Reply #56 on: January 23, 2004, 08:07:09 am »

I have found a few files that came up with the same MD5 and are not the same file. this can be overcome by including duration in on your ~Duplicate search in MC10. i was also thinking that i could add an option (normaly on) that would include the duration when computing the Checksum, this should get less of a false match (maybe).

Hmm... upon doing some reading on MD5 it seems incredibly unlikely to have any files with the same MD5 hash... (eg: this url or searching google for MD5,hash,clash and/or duplicate).... just wondering if I'm mistunderstanding how or what you are doing the MD5 on I'm assuming you did a 128bit MD5 on the whole mp3 file minus the tag header.)
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re:Duplicates Finder PlugIn
« Reply #57 on: January 23, 2004, 08:48:12 am »

Quote
One other thing i was thinking about, and had about an hour today to work on it is to take the artist name and song name strip it down and create a MD5 Hash for that.

What diff will

making a MD5 Hash on the name do ? vs just using the filename/song name ?

Seems like an unnecessary extra step to me.
Logged

Zarius

  • Regular Member
  • World Citizen
  • ***
  • Posts: 178
  • Addicted to smilies.
Re:Duplicates Finder PlugIn
« Reply #58 on: January 23, 2004, 09:06:37 am »

What diff will making a MD5 Hash on the name do ? vs just using the filename/song name? Seems like an unnecessary extra step to me.

Doing a MD5 on the name will allow you to find files with the same name, but the data may be different... then it's up to the user to determine whether they are the same.

This differs from MC's duplicate name checker in that KingSparta's MD5'ing strips non A-Z chars from the name before making the MD5... that's as much as I know at the moment.
Logged

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re:Duplicates Finder PlugIn
« Reply #59 on: January 23, 2004, 12:56:01 pm »

Quote
Doing a MD5 on the name will allow you to find files with the same name, but the data may be different... then it's up to the user to determine whether they are the same.

Still not convinced... why not just compare the file names or whatever instead of creating a hash and then doing the compare of the hash ...

King...care to clue us in here ?
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #60 on: January 23, 2004, 01:10:52 pm »

Quote
This differs from MC's duplicate name checker in that KingSparta's MD5'ing strips non A-Z chars from the name before making the MD5... that's as much as I know at the moment.

and also you can limit the scearch field Length.

So if you had

Car Wash ('98 (Remix)' by Rose Royce
and
Car Wash (2) by Rose Royce

and limited the string to lets say (10 chrs) you would have

Rose Royce Car Wash (

since we then strip spaces, and non alpaha chrs and convert all to lowercase

roseroycarwash

and also convert all "&" chrs To "and"

the chances may be bettter in matching.

the need for this to be a MD5 string is nothing more than I Can, it matters not then if it was left in text form or not.

Quote
Hmm... upon doing some reading on MD5 it seems incredibly unlikely to have any files with the same MD5 hash...

well it is unlikely but it happens, and i have looked into this and there are files that come up with the same MD5 hash. as a matter of fact when talking to someone from J river this was an issue when they were making there fingerprinting system so they also use some other elements from what i could understand.

as a sample from MusicBrainz

TRM Id: aa141094-b06b-4c2a-8925-3fbe55866974

is a song from Alan Jackson - Drive For Daddy, And Also A Song From Incubus

sure this could be made into a 64bit or 128 bit hash but that may be going a bit overboard.


Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #61 on: January 23, 2004, 01:57:28 pm »

Quote
So if you had

Car Wash ('98 (Remix)' by Rose Royce
and
Car Wash (2) by Rose Royce

and limited the string to lets say (10 chrs) you would have

Rose Royce Car Wash (

since we then strip spaces, and non alpaha chrs and convert all to lowercase

roseroycarwash

and also convert all "&" chrs To "and"

the chances may be bettter in matching.

I think my algorithm is a bit more aggressive King so will find more matches.  I do the stripiing out first in this order:

1. Replace x with x (From a configurable list)
2. Get rid of anything inside brackets
3. Non a-z / 0-9 (I think 0-9 needs to stay as it's relevant in many artist names)

Only after this is done do I strip down to 5 characters.

Is there anyway you can accomodate the first 5 / last 5 thing I mentioned earlier using the George Michael & Aretha example?  I can see how this would make things more complicated but I really think that this is what's made the difference when I've tested it with my macro.

Any views?

[Edit - just read PM - but I don't understand what this hash thing is? Aren't we just talking about adding a tag to a field in MC that we can filter in panes and view schemes?  

This needs a bit of thought when I'm sober but I'm thinking something like....We could have a separate field for pass1, pass2, pass3 pass4 etc.  You can get MC to check for dups on each field in turn.  So if you don't catch it on the review of pass1 matches you wil get it on the review of pass2 etc.]
Logged

midknyte

  • Regular Member
  • Recent member
  • *
  • Posts: 5
  • Yo!
Re:Duplicates Finder PlugIn
« Reply #62 on: January 23, 2004, 02:47:45 pm »

I have updated d'peg! (as a beta - download instructions below) to calculate the CRC and MD5 tags w/o the IDTags.

Also, it already does some amount of special character and numeral ignoring in filenames by way of match modes called Basename and Basename (SubString).   Matching against IDTags is available too.

You can play the files from within the matching interface for comparisons.  Once registered, it allows you to do scans against offline files (on CDs) without reloading them.

Get the skinny here.

http://www.GotDupes.com

Download the full install right now here

http://www.somewareonthe.net/anonftp/installdpeg610a.exe

Note - if the above link does not work, it means that I have posted a new version and the filename has changed.  Go to site download page instead.

Once you have it installed, here as an exe with the changes to the CRC and MD5 calculations.  I am waiting to post them into the next version until after I have a chance to look around some more for the ability to do some waveform analysis.

http://www.somewareonthe.net/anonftp/beta/dpeg.zip


Logged

midknyte

  • Regular Member
  • Recent member
  • *
  • Posts: 5
  • Yo!
Re:Duplicates Finder PlugIn
« Reply #63 on: January 23, 2004, 04:38:39 pm »

Good news.  I am on the cusp of audio fingerprinting solution.  But this raises a question.

The program, as it scans your music files, would likely have to load and play a few seconds of each file in order to sample it and generate the fingerprint.  Obviously time consuming (moreso than loading a picture and generating the fingerprint that it already does), though it is a task that you would leave the machine to do all by itself while you leave to do other things with your time.

My question is - is this acceptable?
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #64 on: January 23, 2004, 05:58:36 pm »

>> I have updated d'peg!
Way Cool

>> My question is - is this acceptable?
i think that depends on how long it is i guess, but i would think the answer is yes.

BTW: Both is outstandnig news for your program!

Mark
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #65 on: January 23, 2004, 09:00:05 pm »

I just Posted a beta build of my plug-in on my FTP Server

At IP Address: 66.57.193.58
User: anonymous
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re:Duplicates Finder PlugIn
« Reply #66 on: January 24, 2004, 05:16:09 am »

Quote
the need for this to be a MD5 string is nothing more than I Can, it matters not then if it was left in text form or not.

My point precisely !!!

i would strongly recomend following JLee's ideas..King.

It would be interesting to see what your reaction to them is when you try it out on your library. I'm betting it will reveal lots of new dupes.

Fingerpint method requires more research to be effective. Fingerprints based on hashes are not very useful to 99% of ppl that would need dupe checking.

Fuzzy fingerprinting looks to be more promising, based on the audio properties. It's an interesting topic, something i hope to learn more about.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #67 on: January 24, 2004, 05:27:57 am »

Quote
Fingerpint method requires more research to be effective.

I don't agree

Only when you fingerprint the file name would i agree

If you used the file name hash with the duration it will find all the dups and is 100% (so that i have found going thru my 50,000+ files)

Quote
Fuzzy fingerprinting looks to be more promising, based on the audio properties.
Yes it would be nice, but also comes with it's own problems.

Quote
i would strongly recomend following JLee's ideas.
don't se it happening just yet, the program makes Hashs, thats all it is ment to do. and they can be used in Media Center to find possible dups.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #68 on: January 24, 2004, 05:30:24 am »

If anyone has installed it, did it work?

I may need to change something in the install package if not.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re:Duplicates Finder PlugIn
« Reply #69 on: January 24, 2004, 06:44:46 am »

Quote
Fingerpint method requires more research to be effective.   

I don't agree

Only when you fingerprint the file name would i agree

when i say "effective: i refer you to my simple test. encode 2 files with diff  bit rates. If the program can tell they are the same which they are then it is working. By "program" i am referring to any program not yours specifically ...King.

your 100% method will work for a very small % of files or in your case which is quite unique very well. Fact that you say these files are downloaded off the net, in my experience many times files are often incomplete, not accurately tagged or more commonly encoded using diff encoders. Maybe the OTR world is more standardised.

The problem is i have lots of dupes that are not 100% exact ( which i suspect is a common occurrence). I need a way to be able to tell that files are similar.


Quote
Fuzzy fingerprinting looks to be more promising, based on the audio properties.   
Yes it would be nice, but also comes with it's own problems.

Sure, its quite challenging. I saw a cpl of papers the other day that went in to the gory details. The theory itself is quite complex.

But if JLee's method works for the majority of files, the incentive to develop a "real" fingerprint checker is moot. Maybe i should not use the term fingerprint as it menas unique. I am referring more to a program that can detect similarities by sound.

Question is how effective is JLee's method ? im guessing pretty good. It seems to address the shortcomings of the dupe checker built into MC. What are they again...JLee ?

I just wish JRiver would sort this out themselves.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #70 on: January 24, 2004, 08:50:35 am »

Quote
Question is how effective is JLee's method ?

Maybe good if you riped the files your self.

not too good if they are downloaded and could have any tags.

============================================

I am still trying to figure out a way to do an Audio fingerprint that could be used to compare the files when compressed at dif bit rates.

I have sent a few e-mails to a few companies, i will see what i get back, if i get something i will add it to the program.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #71 on: January 24, 2004, 10:14:31 am »

I just installed it on my wifes computer found a few mistakes in the install program and fixed them.

I created a web page for it now and you can try it at this link. it has pictures of how to setup Media Center And Use the Duplicate function along with some directions about the program.

http://www.spartasoft.com
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KeystoneCop

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 354
  • I hate computers..
Re:Duplicates Finder PlugIn
« Reply #72 on: January 24, 2004, 10:50:50 am »

Thanks.. They key was deleting the folder. LOOKS FINE NOW.
Logged
There is a way to compare tags

[=isequal([band],[album])]=1

thanks marko

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #73 on: January 24, 2004, 11:03:31 am »

glad it's working now.

let me know how the matches turn out etc....
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #74 on: January 24, 2004, 01:09:03 pm »

Quote
Question is how effective is JLee's method ? im guessing pretty good. It seems to address the shortcomings of the dupe checker built into MC. What are they again...JLee ?

If you have excel installed try it out for yourself from the link above.  You've nothing got lose as it doesn't touch your files.  You just cut and paste your library list from MC into the excel sheet and hit the various buttons.  You'll see whether the algorithm works.  

For me it finds a lot of dups but it also finds a lot that are not dups.  That's not an issue as I just click one button and it skips to the next.

Try it out first on a smallish number of files.  When I run the macro that initially analyses the filenames on my 2.2 Ghz machine against 27000 files it takes about 30 mins and uses 100% CPU.  I just go off and do something else while it's doing this bit.  When it's done you can review each dup individually.  I resize excel and tile it with MC so I can see both.
Logged

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #75 on: January 24, 2004, 06:44:21 pm »

glad it's working now.

let me know how the matches turn out etc....

Just installed it and did a random test of 500 and the results looked really promising.  The text hash found a possible 65 duplicates whilst the file hash found none.

My algorithm found 81 duplicates on the same files.  The ones that the plugin didn't find, which is as expected given our previous discussions, were those where the differences were at the start of the artist / name rather than the end.

I'm now waiting for it to run through my whole library (just the text search).  This will be a real timesaver King.  Thank you.
Logged

c1c9k72

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 332
  • So many worlds, so much to do, so little done...
Re:Duplicates Finder PlugIn
« Reply #76 on: January 24, 2004, 07:54:22 pm »

Just downloaded King's plug-in, and it's great.  If this is just the first version, I can't wait to see what other features he'll be adding in later.

One thing which I'm curious to see if it gets implimented is to use the MD5 Hash to error check existing songs.  Would it be possible to have it scan a song with an existing MD5 hash to see if it's been damaged?  I'm not sure how they are created, but would a single-bit error cause a shift in the Hash?

Anyway, a great new addition to an already great program.  Thanks.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #77 on: January 24, 2004, 08:07:49 pm »

Quote
Would it be possible to have it scan a song with an existing MD5 hash to see if it's been damaged?
Yes it could be done.

not sure how you would want to be notified that the hash has changed. I would hate for the batch to stop just to tell you the hash has changed. Maybe a Verify Field with "OK" if it passed the check.

Quote
I'm not sure how they are created, but would a single-bit error cause a shift in the Hash?
yes, it would change the hash
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

hit_ny

  • Citizen of the Universe
  • *****
  • Posts: 3310
  • nothing more to say...
Re:Duplicates Finder PlugIn
« Reply #78 on: January 25, 2004, 01:51:04 am »

Quote
One thing which I'm curious to see if it gets implimented is to use the MD5 Hash to error check existing songs.  Would it be possible to have it scan a song with an existing MD5 hash to see if it's been damaged?  I'm not sure how they are created, but would a single-bit error cause a shift in the Hash?

Do you mean sfv instead of MD5 ?

If you make any modifications to the file tagging etc, then sfv won't match.

If you really mean MD5, I don't know if there are any programs out there that will create a MD5 hash of just the audio content of the program ( ignoring the tagging part). Which could then be tested for errors ?

Upto now i have been using a program called mp3bookhelper that creates an sfv of the audio portion, the author of mp3bookhelper calls it sv.
Logged

Zarius

  • Regular Member
  • World Citizen
  • ***
  • Posts: 178
  • Addicted to smilies.
Re:Duplicates Finder PlugIn
« Reply #79 on: January 25, 2004, 04:41:34 am »

If you really mean MD5, I don't know if there are any programs out there that will create a MD5 hash of just the audio content of the program ( ignoring the tagging part). Which could then be tested for errors ?

Er......... [size=-2](from this same thread)[/size]
 
My program now copies the file to a temp folder, then removes the Id3v1 and ID3v2 tag and the Mpeg is evaluated (this should never change) and in the tests i just made it works well.

this however will work with mp3, but will not work on other file types.

this as we talked about will not evaluate between bit rates, encoders that generated the mp3 etc..
I have updated d'peg! (as a beta - download instructions below) to calculate the CRC and MD5 tags w/o the IDTags.
Logged

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #80 on: January 25, 2004, 05:02:34 am »

King, Some initial suggestions having reviewed part of my library:

- Please strip out the unwanted characters first and then compare what's left with however many characters we choose (5 for me).  This way unwanted characters are not using up our 5.
- Please strip out 'The'
- Don't replace '&' with 'AND'. Just strip both out of the matching process as they add little value.
- Or better still... Offer an option where we can specify that we want to replace x with y (where y could also be blank if we choose)  I would use this to replace 'Featuring', 'Feat' and 'Ft'.
- Offer an option to remove text inside brackets.

Let me know what you think as I'd rather wait for a new build if there'll be one before running through the whole library as the human review part is time consuming.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #81 on: January 25, 2004, 08:55:32 am »

Quote
Please strip out the unwanted characters first and then compare what's left with however many characters we choose (5 for me).  This way unwanted characters are not using up our 5.
Ok I Can See This, Working On It Now

Quote
Please strip out 'The'
OK, Working On It

Quote
Or better still... Offer an option where we can specify that we want to replace x with y (where y could also be blank if we choose)  I would use this to replace 'Featuring', 'Feat' and 'Ft'.
- Offer an option to remove text inside brackets.
Ok

Quote
- Don't replace '&' with 'AND'. Just strip both out of the matching process as they add little value.

I Don't See what this matters by removing both is the same as having both.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #82 on: January 25, 2004, 10:05:50 am »

How About This, This Should Do Everything you Ask





Two Things To Remember What You Type In Will:

1. Not Change Any Tags

2. If You Type In Remove "The" It Will Remove "The" From All Words That Have "The" In It. To Over Come This " The " Will Remove Only "The"

Samples: For Delete " The " Using "The Beatles At The Tower"

After: "Beatles At Tower"

Samples: For Replace Using "&" Replace With "And" Using "The Beatles & Jerry Springer"

After: "The Beatles And Jerry Springer"
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KeystoneCop

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 354
  • I hate computers..
Re:Duplicates Finder PlugIn
« Reply #83 on: January 25, 2004, 10:17:45 am »

KING..  THIS IS FANTASTIC..  I am only playing with the File Hash right now, but it is working 100% so far for me.  boy did I find miss labled songs.  Only small thing is if you do duplicates on MD5FileHash , and no dumplicates on name, artist you miss some of the bad ones, any easy way to only show the files where md5filehash is the same but name artist are not the same (I don't want to use MD5TextHash, as I am trying to get the names to be the same.)

Not a big deal, Cause THIS IS GREAT STUFF..
Logged
There is a way to compare tags

[=isequal([band],[album])]=1

thanks marko

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #84 on: January 25, 2004, 10:23:59 am »

Quote
any easy way to only show the files where md5filehash is the same

no Clue If you can do this by adding Duplicates option or "no Duplicates" or a combo of both

you might try Adding Modifyer "duplicates" with Hash Field, And Add Modifyer "No Duplicates" Using Title Or something
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

jleerigby

  • Guest
Re:Duplicates Finder PlugIn
« Reply #85 on: January 25, 2004, 11:07:52 am »

King - You are awesome!  King of all Plug Ins.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #86 on: January 25, 2004, 11:26:01 am »

You Can Download It Now. Version 0.0.2

It However does not have verify in it yet.

Version 0.0.3 Will Save To A Verify Field Like

FileHash=OK TextHash=Changed Verified On: 1/25/2004 1:23:39 PM

FileHash=OK TextHash=OK Verified On: 1/25/2004 1:23:39 PM



Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KeystoneCop

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 354
  • I hate computers..
Re:Duplicates Finder PlugIn
« Reply #87 on: January 25, 2004, 05:56:36 pm »

This has Been GREAT.  I got rid of over 500 duplicates that had differnt names using the FILEHASH.  I never found a file that was not the same.

Now Playing with the TEXTHASH.  Having cleaned a lot with the FILEHASH this was not as staright forward. I noticed if I had a BLANK artist, I did not get a code.  (no I don't wan't one when it is missing).  what I think would be nice would be a hash code for name only, or name + Artist.  I am sure you are busy getting back lots of disk space, so just not a major thing.



Logged
There is a way to compare tags

[=isequal([band],[album])]=1

thanks marko

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #88 on: January 25, 2004, 06:58:00 pm »

Quote
name + Artist
it does this

Name Only Not Sure Thats Wise

===============================

New Option Comming...

Binary Read Hash

A User Can Set The Number Of bytes To Read from The File, Then The Program Will Do A Hash On That Info.

Since File Hash Does It On The Whole File What Happens if The Dup File Is Not The Same Size but Was Cut A Second? Well The Hash Will Change.

What This Will Do Is Show You Possible Dups, And Allow You To Verify, And Select The Longer File Of The Two.

I think I may Need A Few Days with this To Read All My Files Again, and some more testing, but I have a working copy.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KeystoneCop

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 354
  • I hate computers..
Re:Duplicates Finder PlugIn
« Reply #89 on: January 25, 2004, 10:43:59 pm »

Quote
Name Only Not Sure Thats Wise
Not one of my best quailities, but I listen to the music before delete unless I am sure. much like the first dup program..

Anyway.. This Is GREAT.  

Rather than number of bytes anyway to deal with duration  + or - ? (I got this idea from some old KINGSPARTA requests)
Logged
There is a way to compare tags

[=isequal([band],[album])]=1

thanks marko

c1c9k72

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 332
  • So many worlds, so much to do, so little done...
Re:Duplicates Finder PlugIn
« Reply #90 on: January 27, 2004, 10:28:09 am »

King,

Still loving this plug-in, but I've having occasional Application errors from Media Jukebox when I run it.  It doesn't always do it, but when it does, it refers to an instruction referencing memory at different locations, then forces the program to shut down.  Is anyone else having this trouble, or could it be just me?
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #91 on: January 27, 2004, 10:38:44 am »

by the way i updated the plug-in.

it now will do a binary read the begining of the file and create a hash.

so incase one file is a dupe and it has been cut off it may still match where the whole file hash would not.

===============================================

about the crash, I somtimes get that when MC is just sitting there for no reason.

Not sure why, it may or may not have something to do with the plug-in, but i think it don't other than the fact it maybe telling MC to save the tags and MC craps out on it. But like i said i have had times when Mc starts to save the database and crashes like that when no plug-ins are running.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

c1c9k72

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 332
  • So many worlds, so much to do, so little done...
Re:Duplicates Finder PlugIn
« Reply #92 on: January 27, 2004, 10:56:41 am »

King,

Thanks for the update.  I've found that it's certain songs that trigger the error, though I can't figure out what they have in common.  I'm planning on reencoding them and seeing if the new copies cause the same trouble.

A request, if it's not too much trouble: In the MD5Verify attribute, would it be possible to have it print them in such a way as not to have '=.'  I'm trying to create smartlists for altered FileHashes and now BinHashes, and while I'm not completely sure, I don't think smartlists deal well with equals-signs in the equations.

Thanks again.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #93 on: January 27, 2004, 11:12:19 am »

Quote
way as not to have '=.'

sure, what would be good?
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

c1c9k72

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 332
  • So many worlds, so much to do, so little done...
Re:Duplicates Finder PlugIn
« Reply #94 on: January 27, 2004, 11:23:23 am »

Just off the top of my head, maybe shift

BinHash=OK FileHash=Changed TextHash=OK Verified On: 1/27/2004 12:10:44 PM

to

BinHash0 FileHash1 TextHash0 1/27/2004 12:10:44 PM

Oh, and just for your information, after re-encoding those songs at the same bitrate, I haven't had the error again.  So, it's not the plug-in, but some aspect of the song.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #95 on: January 27, 2004, 11:33:18 am »

Seems Kind Of Encoded

How about i just change "=" To ":"

about the other problem.

if the plug-in crashed you would see diagnal lines along the program with a message box telling you what the error was.

when Media Center Crashes it is a totaly diffrent error message
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

c1c9k72

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 332
  • So many worlds, so much to do, so little done...
Re:Duplicates Finder PlugIn
« Reply #96 on: January 27, 2004, 11:57:08 am »

I remember seeing plug-in errors before, and getting that diagonal effect.  I've also gotten a smartlist to work with the present system, so unless you really like another method, there's no reason to change it.  At least, none I can think of.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #97 on: January 27, 2004, 12:15:27 pm »

ok
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

KeystoneCop

  • Regular Member
  • Galactic Citizen
  • ****
  • Posts: 354
  • I hate computers..
Re:Duplicates Finder PlugIn
« Reply #98 on: January 27, 2004, 05:46:06 pm »

KIng, Each version gets better and better.  The ability to knock out special characters in the text hash really made it find a lot of matches for me.  THANKS

Now, you knew this was comming..  I still think it would be good to have seperate hash fields for name and artist.  Then I could do find all duplicate artist hash, and clean them up.. and the same for name.

 :D :D :D BUT I STILL THINK THIS IS FANTASTIC :D :D :D



I have not played much with the partial filehash  too slow to do in the daytime, maybe I will try it tonight..  what are you finding to be a good setting ?
Logged
There is a way to compare tags

[=isequal([band],[album])]=1

thanks marko

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20063
Re:Duplicates Finder PlugIn
« Reply #99 on: January 27, 2004, 06:04:24 pm »

Quote
what are you finding to be a good setting ?

> 11,000+

I think that may be min soon.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio, Music
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA
Pages: 1 [2] 3   Go Up