INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Managing Corrupt Files  (Read 2596 times)

glynor

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 19608
Managing Corrupt Files
« on: March 15, 2014, 01:13:52 am »

I've been mostly keeping silent with big new feature requests, for the obvious reasons, but I think this needs to be looked at sooner rather than later.

MC has trouble when you try to Import crazy broken files (or crazy broken permissions on those files), or run Auto Import on directories that contain these kinds of files.  I've seen this.  We've all seen reports of this over and over on the forums (or things that look like this anyway).  I don't think I need to defend this assertion very much.

First off, it would be really nice if MC could somehow handle these error conditions better.  I know, I know... How crazy can it be expected to be?  I don't know, can't the import be more thoroughly isolated from the main process so that if it dies, MC can detect the death and recover?

In any case, that is almost certainly much more of a long-term solution kind of thing, unless one of you see something brilliantly simple that was previously missed.  I assume not.

But, in the meantime, could we get some kind of tool or reporting to use to find and destroy these bad files?  When I've seen it, personally, they were almost always zero-byte files (or those incredibly disproportionate to their expected sizes).  They're obviously broken, once you find them in the morass.  But, how to do it?  If you don't see it in the logs?  If you didn't just import a handful of new files, so you can quickly narrow it down?  If it is one of 3000, or 30000, or 100000?  Often, I find, that MC dies before it successfully logs the actual bad file, and you're left trying to trace what it was doing through multiple threads, and then find things "nearby" in the filesystem (which has mixed results).

How can we solve this?

Better import logging would be a start.  Maybe as it grabs new blocks of files, it can log the directory it is looking at first, or something to narrow it down?  Of course, always logging the filename (and and ensuring this info gets committed to disk, even in the event of a crash in the very next code block), before any "dangerous" operation would be better.  I'm sure you try to do this, but I'm also quite sure the strategy doesn't always work.

But even that only helps if you only have one, or two, or a group, of corrupt files.  The nightmare scenario is that you are trying to recover data from some partially failed volume, which might be strewn with bad files.  Three hundred needles in a 60,000 file library haystack, distributed randomly.  What do you do?

The advice I've often seen, and even followed once or twice in times long-since passed, is to import methodically, and narrow down the bad files.  But this only works well if you have mostly a "known good" library, and you can narrow down the possible offenders to a relatively small group of "possible new offenders".  I can't really recommend it with a straight face to a complete newcomer who just wants the thing to work, and it keeps crashing when trying to do the first thing you do with the program: import the files.  VLC and mpc-hc might crash when you throw the same kinds of issues at them, but they don't prevent you from effectively using them to view other content.  MC does, so I feel like MC needs to provide a solution.

Can this be automated somehow?  Could MC have some kind of The Hunt for the Bad Files import mode?  Where it isn't actually trying to do full imports on the files, but is doing more of a "passes the smell test" look at the files, without actually importing them?  Maybe you could build a small, stripped down version of the import logic, thrown into another process entirely (and not trying to actually import usable data, just pass/fail the files), and then show them in a non-imported Drives & Devices view, with possible bad ones flagged, sorted, and color coded or something?

I'm throwing out ideas... I guess I'm saying that I think it is a real problem, and one that has been ongoing, and can we come up with some mechanism to help people with this unpleasant (and unfortunately a bit more common than we'd like) reality.
Logged
"Some cultures are defined by their relationship to cheese."

Visit me on the Interweb Thingie: http://glynor.com/

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10787
Re: Find and Murder Corrupt Files
« Reply #1 on: March 15, 2014, 05:36:09 am »

Trying to detect something that is stuck and shutting it down is a bad solution, its generally problematic and error prone by itself.
Instead, the input plugins should just fail gracefully on broken files, and put them in the "bad" database. For this to be possible, we'll need to get our hands on a bunch of broken files which don't fail gracefully right now though.

I suppose we could look into trying to improve logging to help identifying these, but as you can image even trying to identify the logging short-comings might need a way to reproduce such a short-coming first.

Recently, I already fixed a case where broken FLAC files could cause the importer to get stuck.
Logged
~ nevcairiel
~ Author of LAV Filters

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Find and Murder Corrupt Files
« Reply #2 on: March 15, 2014, 03:53:00 pm »

I'd like to suggest that the Bad database is a OK solution, but there is no UI to it, so it is really only a partial technical solution (like a junk drawer).

Lacking a logging or reporting system, it would be better if Bad files were also placed in a Failed-To-Import playlist, and each file had an associated failure cause status message in some new field (which is cleared on successful import).  This playlist could auto-vivify just like the Recently Imported playlist, and users can examine the files at their leisure, and see the failure cause (the failure cause field column should be forced into the playlist's view).
Logged
The opinions I express represent my own folly.

AndrewFG

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3392
Re: Find and Murder Corrupt Files
« Reply #3 on: March 17, 2014, 07:00:03 am »

I suppose you guys might not be on speaking terms with Spoon over at dbPowerAmp (??) but if you were then perhaps you could run the imports past the AccurateRip database to see if they are good or not http://www.accuraterip.com/ (only applies to audio though).
Logged
Author of Whitebear Digital Media Renderer Analyser - http://www.whitebear.ch/dmra.htm
Author of Whitebear - http://www.whitebear.ch/mediaserver.htm

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10787
Re: Find and Murder Corrupt Files
« Reply #4 on: March 17, 2014, 07:11:24 am »

I'd like to suggest that the Bad database is a OK solution, but there is no UI to it, so it is really only a partial technical solution (like a junk drawer).

Could always introduce a default smart list for bad files, sorted by import date, if that would alleviate your concerns.
Putting failure causes into some database field seems like an awkward solution, especially because many error checks don't record an actual error, as many input plugins use external decoding libraries, which just don't offer fine-grained information.

More important is to actually detect bad files instead of breaking down on them, and then ideally putting them somewhere where the user can find them, and they won't cause any trouble anymore.
Logging the exact problem with the file is rather technical and in the first step not something that's viable or important, imho.

Another question is, what to do with half-broken files? If a file has a few decoding glitches in the middle, should it go into the bad database? If the file only decodes half and then fails, should it go in the bad database?
Obviously it should never cause MC to break down, but how aggressive should it be?
Logged
~ nevcairiel
~ Author of LAV Filters

InflatableMouse

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3978
Re: Find and Murder Corrupt Files
« Reply #5 on: March 17, 2014, 07:14:57 am »

I wonder if it would be possible for MC to generate a hash of each file during analysis, store it in some field in its DB and be able to recheck the file against the hash it made previously.

Obviously this would require a lot of cpu each time a tag changes, but what if you'd be able to generate the hash over the data (music bits) only? Then it wouldn't matter whether tags change or not.

Then if file corruption occurs, MC could detect this, log it and warn the user.

I suppose this would also help with files that import fine, play fine but spike or play noise. The file gets corrupt then MC knows it. Unless the file was already corrupt but there's always something isn't there ;).
Logged

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10787
Re: Find and Murder Corrupt Files
« Reply #6 on: March 17, 2014, 07:20:23 am »

I don't think MC should replace a file system feature for you (checksumming).
Not to mention that its hard/slow to check, as you need to run the whole file through the process first - and you cannot take "only the music" either, since if it corrupts, you might not be able to read the music out anymore, and you would want to detect that first before the parser has to deal with your broken file.

It also doesn't help the overall problem, of corrupted file breaking auto-import or something, since it wouldn't have any info there yet. Lets focus on that for now. ;)
Logged
~ nevcairiel
~ Author of LAV Filters

JimH

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 71666
  • Where did I put my teeth?
Re: Managing Corrupt Files
« Reply #7 on: March 17, 2014, 07:34:35 am »

I suppose you guys might not be on speaking terms with Spoon over at dbPowerAmp (??) but if you were then perhaps you could run the imports past the AccurateRip database to see if they are good or not http://www.accuraterip.com/ (only applies to audio though).
This problem seemed to grow a month or two ago.  My suspicion is that some ripper started creating files that cause problems with one of the decoders.

We don't have anything against Spoon or his software.  He does a very professional job.  I don't see the point of the AccurateRip database though.  I don't think we're seeing a problem with bad rips from MC.
Logged

glynor

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 19608
Re: Find and Murder Corrupt Files
« Reply #8 on: March 17, 2014, 08:28:50 am »

It also doesn't help the overall problem, of corrupted file breaking auto-import or something, since it wouldn't have any info there yet. Lets focus on that for now. ;)

I agree.

That is, in a perfect world, the job of the filesystem.  Unfortunately, most filesystems in actual use on systems where people use MC are horribly brain-dead in this regard.  But still, I don't think it would help here, is "out of scope" for MC, and suffers from all the other issues that Hendrik mentioned.

I'd like to suggest that the Bad database is a OK solution, but there is no UI to it, so it is really only a partial technical solution (like a junk drawer).

Could always introduce a default smart list for bad files, sorted by import date, if that would alleviate your concerns.

While I don't think this is necessarily a bad idea, it wouldn't really help with what I'm referring to in the original post.  When the current "bad files" system works, it works pretty well.  I agree, the main problem with it is that it is hidden by default and it takes some "power usery" maneuvers to locate it.

The issues I'm referring to are those where the files never make it into the bad files database, because MC crashes out completely when it "touches" the files.

But, perhaps this can be something of a solution, with tweaking...

What if, every time MC imported any file, before it does anything to the file itself, it adds it to the bad files database first (or, if you prefer, some kind of "importing in progress" database or something)?  Then, only once the import task is complete, it moves the entry to the regular database (or deletes it from the bad files one, anyway).  That way, the "fallback position" if MC does completely crash would be with the files "orphaned" in the bad files database (or, again, a special purpose "importing" one if you prefer) rather than just not listed anywhere.  There's "no way in" without first passing through a place where it can be captured by MC's databases, and therefore searched upon and found from within MC.

On the side of making a new special "importing" database is that MC could crash during import for reasons not related to the specific files that get orphaned in the "queue".  In other words, this could, if MC just happens to crash during an import, leave files in the bad database that aren't actually bad.  Likewise, I imagine there is a ton of multiple threads all working at once on this process, so there might be two or three files "in flight" at any one time, and so a crash caused by one of these might leave 3-4 others in the bad database too.  And then they'd be ignored on future auto-import runs, which might be bad.  I don't know (obviously) the code well enough to speak intelligently on what might be the best solution here.

But... I think something along those lines could work.  Just some kind of queue that if MC crashes while it was importing files, you can see easily "what was in flight" and check those files for sanity.

Another question is, what to do with half-broken files? If a file has a few decoding glitches in the middle, should it go into the bad database? If the file only decodes half and then fails, should it go in the bad database?
Obviously it should never cause MC to break down, but how aggressive should it be?

I'd say not aggressive at all.  You can't begin to guess what people might consider to be a "good file" versus a bad one.  Maybe the MP3 or video does have some messed up spots in the middle, but it is the only copy you have of the video from your daughter's second birthday party (or the last voicemail from your mom before she passed away)... So, even though it is a little borked up, it is still "very good" to you.

Only if they're so totally broken they can't be rendered at all should MC refuse to deal with them and shunt them off to the bad list.
Logged
"Some cultures are defined by their relationship to cheese."

Visit me on the Interweb Thingie: http://glynor.com/

JimH

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 71666
  • Where did I put my teeth?
Re: Managing Corrupt Files
« Reply #9 on: March 17, 2014, 10:48:45 am »

Logged

JimH

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 71666
  • Where did I put my teeth?
Re: Managing Corrupt Files
« Reply #10 on: March 17, 2014, 10:50:25 am »

If anyone wants to discuss off topic ideas, please use a different thread.
Logged

KingSparta

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 20054
Re: Managing Corrupt Files
« Reply #11 on: March 18, 2014, 05:06:27 pm »

you also have to understand that when the file changes like in a mp3 the "id3 tag" changes the checksum would also change, and there would be nothing wrong with the encode.

to do this all tags would need to be removed from the media file and then the checksum could then be done on the actual media file.

This would also take some time to do.

I actually made a Plugin once to do this on a mp3 as a MJ plugin, and then store the MD5 data in a user defined tag and then saved into MJ's data base.
Logged
Retired Military, Airborne, Air Assault, And Flight Wings.
Model Trains, Internet, Ham Radio
https://MyAAGrapevines.com
https://centercitybbs.com
Fayetteville, NC, USA
Pages: [1]   Go Up