INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Dup Checking Images problem  (Read 2118 times)

drosoph

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 661
  • TiVo-aholic
Dup Checking Images problem
« on: January 19, 2007, 04:06:26 pm »

How do I dupe check for images in my archives ... when there are SLIGHT variations in the filesize of the images (due to MC adding/removing tag fields over the years).  The dimensions, name, filetype, and date are all the same, but the filesize is different.

Problem is ... I have many files that are NOT dupes that have the same Dimensions, Name, Type and Date ... Dimesions are standard for certain cameras, names arent unique depending on how the person imported the files (some are just 01.jpg, 02.jpg, etc), all are JPGs, and some have the same DATE as the date was assigned by the import utility or there is no date on the file so the 1/1/1980 date is used.  The only discriminator left if FileSize and has worked wonders, but there are too many tags now and there SLIGHT variations in file size due to new tags causes the dup-check to not find these.

Is there a way to "round" the filesize using a calculated field and then use that to dupe check on ?

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20049
Re: Dup Checking Images problem
« Reply #1 on: January 19, 2007, 05:29:43 pm »

Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

marko

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 8967
Re: Dup Checking Images problem
« Reply #2 on: January 20, 2007, 06:54:59 am »

Another option, Visipics, is a freeware app that does a surprisingly good job too.

Quote


    - VisiPics is a Freeware ! If you like and want to support it, please donate.
    - Extremly fast compared to most commercial software
    - Uses Hyperthreading and Multi-processors systems
    - Highly efficient results with adjustable similarity levels
    - Easy to use Interface, preview your duplicates easily and pick the ones to delete with a simple click
    - Starts to display the results while scanning, you don't have to wait to delete your duplicates
    - Smart Auto-Select mode, to save time while deleting pictures
    - Tested on 100.000 pictures, and more than 15Gb archives without a crash, full results in 3 hours

-marko.

drosoph

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 661
  • TiVo-aholic
Re: Dup Checking Images problem
« Reply #3 on: January 23, 2007, 08:55:58 pm »

http://www.gotdupes.com/index.htm
Ok, whats the trick?  I can't get this to scan a single file!  0 files scanned ...

I just want to dupe check for images ..

um, ive been working on this for about 4 hours now ... any tricks ?

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20049
Re: Dup Checking Images problem
« Reply #4 on: January 25, 2007, 07:12:59 pm »

Maybe Another Program Would Be...

Duplicate Image Finder from

http://www.rmsft.com/

I think it actually Works Better

I Think I Lost My Registration Code
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA

drosoph

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 661
  • TiVo-aholic
Re: Dup Checking Images problem
« Reply #5 on: January 29, 2007, 04:08:00 pm »

Do any of these like a DB with 200,000 photos or more ?

Alex B

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 10121
  • The Cosmic Bird
Re: Dup Checking Images problem
« Reply #6 on: January 29, 2007, 04:33:43 pm »

Do any of these like a DB with 600,000 photos or more ?

So you would like compare each image with all other images. That would make 600,000 x 600,000 comparisons (= 360,000,000,000).

If the programs do these comparisons individually and do not try to keep all results in RAM that should not be a problem. Probably it would just take some time. For example, if each comparison takes one second the job would be done after 11,415 years and about six months.

;)
Logged
The Cosmic Bird - a triple merger of galaxies: http://eso.org/public/news/eso0755

drosoph

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 661
  • TiVo-aholic
Re: Dup Checking Images problem
« Reply #7 on: January 29, 2007, 04:48:05 pm »

So you would like compare each image with all other images. That would make 600,000 x 600,000 comparisons (= 360,000,000,000).

If the programs do these comparisons individually and do not try to keep all results in RAM that should not be a problem. Probably it would just take some time. For example, if each comparison takes one second the job would be done after 11,415 years and about six months.
PERFECT!  I've got time ....

Unfortunately, its taking over 6 seconds to analyze each photo ... with a 4GB server running dual core ... Unfortunately, the app DIF cant take advantage of both cores ...

So, alas, is there a way to do this?   I'm serious ... this isnt just a futile task .. I really need to clean my library ... it actually consists of over 240,000 pics of my family (from all sources in my family) and there are LOTS of dupes ... diff names and diff tagging systems, so the filesize isnt exact!   Argh ...

If someone can write a Custom Field to Round FileSize to the nearest 10,000 bytes, I think that would clear up 95% percent of my issues!   Any way to do this??? PLEASE!?!?!?

drosoph

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 661
  • TiVo-aholic
Re: Dup Checking Images problem
« Reply #8 on: January 29, 2007, 05:20:25 pm »

Having a hard time working with a Calculated Field based on Filesize as the [File Size] field is actually displaying a truncated value, but ~dup=[File Size] uses the whole value ... but using a Mid([File Size],0,4) returns the truncated value of the field ...

eg:  Filesize is 4,857,456
[File Size]=4.8 MB
Mid([File Size],0,4)= 4.8

I want to get 4,857 out of it ..

Anyone ?

Alex B

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 10121
  • The Cosmic Bird
Re: Dup Checking Images problem
« Reply #9 on: January 29, 2007, 05:35:52 pm »

Quote
... The only discriminator left if FileSize and has worked wonders, but there are too many tags now and there SLIGHT variations in file size due to new tags causes the dup-check to not find these. ...

The only thing I can think of would be to temporally remove all physical tags from the files. You could first try to isolate all files that are supposed to not have duplicates and then remove tags from the rest of the files. After deleting the duplicates you can reapply tags from the library. (I assume you have backups of everything, otherwise I would not recommend mass changing thousands of files)

Besides MC's tag based smartlists I have used Rashid Hoda's Duplicate File Finder v. 1.1.0.3 utility for finding my possible duplicates: http://www.download.com/3000-2248_4-10506816.html. It can do filename, file size and content (binary data) based comparisons, but it does not have any fuzzy logic or image detection features.
Logged
The Cosmic Bird - a triple merger of galaxies: http://eso.org/public/news/eso0755

marko

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 8967
Re: Dup Checking Images problem
« Reply #10 on: January 30, 2007, 02:52:35 pm »

visipics makes the claim:
Quote
- Tested on 100.000 pictures, and more than 15Gb archives without a crash, full results in 3 hours

unsubstantiated, naturally ;)
Pages: [1]   Go Up