INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Find files with similar, not equal filenames  (Read 1226 times)

andrewberg

  • Galactic Citizen
  • ****
  • Posts: 418
Find files with similar, not equal filenames
« on: October 12, 2018, 03:08:52 pm »

Is there a way in MC to search for similar file names instead of duplicates?

I would like to compare the contents of my library against a backup drive, and see what file names have been changed, or files replaced (e.g. after re-encoding to other formats)...

For example, it should find both "Movie Title (USA 2000), HD.mkv" and "Movie Title (USA 2000).avi".

Any suggestions?
Logged
"To be is to do" (Socrates) - "To do is to be" (Sartre) - "Do be do be do" (Sinatra)

swiv3d

  • Guest
Re: Find files with similar, not equal filenames
« Reply #1 on: October 12, 2018, 04:48:55 pm »

As far as I am aware MC can only search for duplicates within the library which is open - it won't even know what is on your backup drive.
Logged

andrewberg

  • Galactic Citizen
  • ****
  • Posts: 418
Re: Find files with similar, not equal filenames
« Reply #2 on: October 12, 2018, 04:58:32 pm »

As far as I am aware MC can only search for duplicates within the library which is open - it won't even know what is on your backup drive.

Of course, the backup drive is also imported... ;-) Actually, what I need even more is a way to sync the main library with the backup drive, possibly using specific criteria to compare by file names (equal or similar), sizes, ratings, etc etc... MC has some useful filters, but comparisons are rather hard to do... Anything else?


Logged
"To be is to do" (Socrates) - "To do is to be" (Sartre) - "Do be do be do" (Sinatra)

swiv3d

  • Guest
Re: Find files with similar, not equal filenames
« Reply #3 on: October 12, 2018, 05:27:27 pm »

You could build a smartlist based on the find duplicates one but altered for media =video and then set a series of selection criteria to give a set of possible duplicates but how well that would work is very dependent on your tagging system and anyway would be a lot of work to sort through if you have a lot of files. Frankly I don't think MC was built as a system for backing up files. You could just copy the main library folders to the backup drive using windows explorer.
Logged

swiv3d

  • Guest
Re: Find files with similar, not equal filenames
« Reply #4 on: October 12, 2018, 05:35:30 pm »

As a post script I don't see the point of having the files on the backup disc imported into your main library?
Logged

andrewberg

  • Galactic Citizen
  • ****
  • Posts: 418
Re: Find files with similar, not equal filenames
« Reply #5 on: October 12, 2018, 06:33:17 pm »

Well, the backups are imported but not tagged (save for auto tags like year etc)... I'm keeping them in the main library as a quick reference to which files I've backed up, but the drive is usually not connected as I'm not using it otherwise...

Does that make more sense? Either way, I guess you're right in that MC is not designed for this particular purpose... ;-) Would be great though, e.g. by some extended searches, where similar names is just one option...

I am now making do by searching identical features like runtimes, file size, year etc. just to narrow down the results for easier comparison... Should have thought of that in the first place... ;-)
Logged
"To be is to do" (Socrates) - "To do is to be" (Sartre) - "Do be do be do" (Sinatra)

swiv3d

  • Guest
Re: Find files with similar, not equal filenames
« Reply #6 on: October 12, 2018, 07:12:39 pm »

I still think that having your main library distinct from your backup drive would be the preferred option. I have 2 usb3 drives attached to my computer and I make changes to the main library files and back up those to one drive while the files on the other drive are left in their original state. If I find something didn't work out well then I can delete the dud and pull a copy of the original back to the main library folder structure.
Logged

andrewberg

  • Galactic Citizen
  • ****
  • Posts: 418
Re: Find files with similar, not equal filenames
« Reply #7 on: October 12, 2018, 07:31:46 pm »

I still think that having your main library distinct from your backup drive would be the preferred option.

Adding files from different drives to the same library is no problem, as long as you manage them in separate views... I'm never mixing my backup copies together with the original files I watch & listen to, that would get a bit messy... ;-) The rest is done in similar fashion as yours...
Logged
"To be is to do" (Socrates) - "To do is to be" (Sartre) - "Do be do be do" (Sinatra)

RoderickGI

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 8186
Re: Find files with similar, not equal filenames
« Reply #8 on: October 13, 2018, 02:04:51 am »

A backup should just be a backup. You're doing it wrong.

Get a good backup application ( I use EaseUS) which uses a combination of Full and Incremental backups. Use that. Then you can pull an original backup from any point in time for which you have properly defined the backup. For example, I do a monthly Full backup on my important files, and then a daily incremental backup, which gives me the ability to pull a version of a file from any day in the period for which I maintain incremental backups, without consolidating.

Also, you are asking for heuristic pattern recognition functionality. The sort of thing that people have spent their lives trying to build, as just a precursor to AI. Google has spent a few billion building something like that...

... okay, perhaps I'm overstating that, but it isn't as easy as it might sound. Try writing out every rule you would need to apply to make this work reliably. Unless the changes were very simple, as per your example, that would be quite a complex task in itself.
Logged
What specific version of MC you are running:MC27.0.27 @ Oct 27, 2020 and updating regularly Jim!                        MC Release Notes: https://wiki.jriver.com/index.php/Release_Notes
What OS(s) and Version you are running:     Windows 10 Pro 64bit Version 2004 (OS Build 19041.572).
The JRMark score of the PC with an issue:    JRMark (version 26.0.52 64 bit): 3419
Important relevant info about your environment:     
  Using the HTPC as a MC Server & a Workstation as a MC Client plus some DLNA clients.
  Running JRiver for Android, JRemote2, Gizmo, & MO 4Media on a Sony Xperia XZ Premium Android 9.
  Playing video out to a Sony 65" TV connected via HDMI, playing digital audio out via motherboard sound card, PCIe TV tuner

andrewberg

  • Galactic Citizen
  • ****
  • Posts: 418
Re: Find files with similar, not equal filenames
« Reply #9 on: October 13, 2018, 06:44:10 am »

A backup should just be a backup. You're doing it wrong.

So it is, maybe you missed my initial post? And an external tool won't be as useful for backing up files based on their ratings; that's why I prefer doing it in MC...

The sort of thing that people have spent their lives trying to build, as just a precursor to AI. Google has spent a few billion building something like that...

I think it's not quite as difficult as that; finding similar file names is a feature many file mangers offer (e.g. duplicate finders, at least for images)... That's why I believe some search expression / formula should exist to do the same in MC -- eg. by looking for a given set of identical words, characters etc. Unfortunately, I don't know enough about the expression language to do this...

Unless the changes were very simple, as per your example, that would be quite a complex task in itself.

Differences are mostly simple, for example: I often back up larger, full HD, multiple language copies of movies while keeping smaller resolution copies on my 'active', always connected drives for watching... with the larger copy named in the pattern 'Movie Name (Country, Year), fullHD engl+germ+french.ext', and the smaller one named just 'Movie Name (Country, Year).ext'...
I only need a way to identify these files (both within the library) as two versions of the same movie, so I know it is backed up even when the drive is not connected.

Logged
"To be is to do" (Socrates) - "To do is to be" (Sartre) - "Do be do be do" (Sinatra)

RoderickGI

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 8186
Re: Find files with similar, not equal filenames
« Reply #10 on: October 13, 2018, 07:04:32 pm »

So it is, maybe you missed my initial post?

Nope, I read it all.

My point is that you are making lots of hard work for yourself, and ongoing maintenance. A good backup solution once setup is pretty much set and forget, and it can back up all versions of your movies, whether they are included in the MC Library or not. If you make a mistake with a file and need to recover the original, there it is in the backup. Discover you made the mistake last week, there it is in the Incremental backup from last week.

Having to do maintenance both on your media files for MC, and then compare to what is already on a backup drive, is doubling your work. Simplify.

Differences are mostly simple

My emphasis on "mostly", and that is the issue. You could create rules that support your simple example, but then you have an even bigger job when the situation isn't as per your example, and you wouldn't be expecting the difference. Even if you had said the example was always the case, I would be sceptical. Structured names of anything fail at some stage, as situations change. Then you get caught out by exceptions.

Basically, hard drive space is cheap, and disk sizes are up to 12TB each now. However, life is short and spending more time than necessary for maintenance isn't a good use of it.


Anyway, given what you asked, I had a bit of a think about it. You would probably have to base any solution on the file name you structure, as the attributes of the two videos are likely to be different. I'm thinking resoulution, audio format, compression type, etc. are all likely to change. There would be no obvious fingerprint for the files that matches.

So based just on name... sorting by name, the HD name includes the low definition version, or starts with it... but you aren't just looking for files that have a missing backup, you also want to identify files that have had changes done, or have been replaced...
I would like to compare the contents of my library against a backup drive, and see what file names have been changed, or files replaced (e.g. after re-encoding to other formats)...

What constitutes "changed"? That isn't reflected in the file name. A bookmark in a file will change its Date Modified. What is the rule for "changed"?

No, I don't know all of the parameters you want to consider, and even if it was only file name, I still can't think of a good solution.
Logged
What specific version of MC you are running:MC27.0.27 @ Oct 27, 2020 and updating regularly Jim!                        MC Release Notes: https://wiki.jriver.com/index.php/Release_Notes
What OS(s) and Version you are running:     Windows 10 Pro 64bit Version 2004 (OS Build 19041.572).
The JRMark score of the PC with an issue:    JRMark (version 26.0.52 64 bit): 3419
Important relevant info about your environment:     
  Using the HTPC as a MC Server & a Workstation as a MC Client plus some DLNA clients.
  Running JRiver for Android, JRemote2, Gizmo, & MO 4Media on a Sony Xperia XZ Premium Android 9.
  Playing video out to a Sony 65" TV connected via HDMI, playing digital audio out via motherboard sound card, PCIe TV tuner

andrewberg

  • Galactic Citizen
  • ****
  • Posts: 418
Re: Find files with similar, not equal filenames
« Reply #11 on: October 14, 2018, 08:33:31 am »

Q RoderickGI -- Thanks, you are raising some very good points there... way beyond of what I meant to suggest... ;-)

Still, as you rightly say, since any file property can change, only the unique movie title will remain (that is without my 'heuristic naming structure'... ;-) So file names still are the key -- "Nomen est Omen" as Latin has it... ;-), and for me it all comes down to identifying backup files by similarities to originals...

In terms of automated tools, I am rather untrusting, be they for backups or anything else (especially as I'm not backing up from scratch, just updating)... I have tried some folder synching tools before, but found them difficult to set up (often with unpredictable results), so that's not something I would like to pay for...

Anyway, I have now spent the better part of the weekend to manually align my backups, luckily assisted by some of MC's advanced search filters, including some of my own (e.g. a new 'Backup state' field to enter values like 'identical, re-encoded, different, new source, all languages' and so on)... This helped a lot narrowing down the list, first by selecting duplicates, mark as 'identical' and filter them out, then select equal year & duration, and (if non duplicate) mark as 'different' depending on source, and so forth... with each group filtered out once 'Backup state' has a value.

Quite a consuming task, I must admit, but the only reliable choice for my purpose... If I were to begin backing up my media today, I would rather use MC's 'Rename, Move & Copy Files' function, possibly by setting up some folder creation rules...

Logged
"To be is to do" (Socrates) - "To do is to be" (Sartre) - "Do be do be do" (Sinatra)
Pages: [1]   Go Up