INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Backups and Tagging  (Read 2974 times)

ErikN

  • Junior Woodchuck
  • **
  • Posts: 68
Backups and Tagging
« on: September 07, 2015, 11:06:21 am »


I do periodic, incremental backups of our storage including all of our media. This was the first backup since using MC. I had expected the back to be about 10GB since there were only a couple of new videos and, size wise, everything else is inconsequential.

For our music, I did audio analysis and some other tag adjustments. Of course that meant that every audio file changed. Not ideal but, in the grand scheme still not too bad. My 10GB backup became 20GB.

The real problem was videos. Luckily MC didn't directly tag mp4 and m2ts/ps. However, I have a small percentage of wmv. Every one of these changed. This caused my 20GB incremental backup to swell to 200GB.

Is this a problem for anyone else or did I miss a setting somewhere? 
Would it be possible to add a settings like:
  - Use sidecars for all video files
or
  - Use sidecars for all files > 100 MB  (size adjustable by the user)


Logged

Arindelle

  • Citizen of the Universe
  • *****
  • Posts: 2772
Re: Backups and Tagging
« Reply #1 on: September 07, 2015, 01:11:59 pm »

Hi,

I take it you want to reduce the time spent on incremental backups?(I presume space is not the issue as 200gb is not a real lot .. ) Are you backing up from or to a slow NAS or via a slow network connection? I have 4tb of audio without counting any videos, so I'm always backing-up incrementally a lot more than that (when you say all files were analyzed and that added 10Gbs).

I'm not sure if this will help or that you even want to do this, but you can choose which tags you want to be written to disk. The remaining will still be kept only in the library file.  To do this you can choose via Options=>Library&Folders=>and uncheck the box "Save in file tags when possible".  Make sure you are disciplined about keeping JRiver library backups and archiving multiple copies.

I'd try and find what is changing and causing these larger video files to be chosen for an incremental set ... I hardly ever have video files, once initially tagged, changed to trigger an additional back-up. Some playback stats you don't need, somebody adding a rating  - whatever ... if the main tags aren't modified, the files shouldn't be added to your "incremental" set right?

Although in theory you can avoid this entirely by unchecking all the fields as I indicated above, but the result would be tha no major modifications would be written to the files ... I wouldn't want to do that .. Anyway, maybe you created a custom field that changes all the time? or maybe there is just one or two fields  that are the culprits.

Might want to look at what trigger choices you have for your back-up software too. You might want to separate your other "storage" from your media as you don't need to do "full backups" in the cycle normally. A backup set of MC library files and all media kept current daily more or less + an archive of separate drives,  set aside in a safe place updated basically 2 to 3 times per month is all I do, personally.
 
As for sidecar files, I don't believe you can force a particular file format to use them.  I suppose you could convert the files to a format that do use them -- is this wise? Doubtful :)

PS- If you didn't know this, you can always write tags retained just in the JR library to be embedded in the file itself (see library tools) or vice-versa.
Logged

ErikN

  • Junior Woodchuck
  • **
  • Posts: 68
Re: Backups and Tagging
« Reply #2 on: September 08, 2015, 01:57:13 am »

The issue is the time it takes to perform a backup. My main library sits on a 12 disk hardware RAID. My backups push new/changed files over esata to an encrypted store + read back w/ hash check. The net-net is that a backup averages about 100MB/sec. What should have been a 2 min backup became a 30 min backup.

The vast majority of my library is video. My fear is taking some innocuous action in MC (audio analysis of video files, hand tagging, etc.) triggering a multi-TB incremental backup. It seems better to keep tags adjacent to large files instead of inside them.

I guess I'm asking if I am violating some kind of 'best practices' with MC? Or, is it normal for a small tag adjustment to cause a n-GB file to change?  Are most peoples' tags effectively invariant?

ps. For the 'wmv' files that got caught up in the backup, it appears MC set a field called 'beats-per-minute' into the wmv itself.
Logged

Arindelle

  • Citizen of the Universe
  • *****
  • Posts: 2772
Re: Backups and Tagging
« Reply #3 on: September 08, 2015, 07:58:53 am »

I guess I'm asking if I am violating some kind of 'best practices' with MC? Or, is it normal for a small tag adjustment to cause a n-GB file to change?

No, its normal.  if you have, made changes to tags and the fields are marked to write to disk. I gave you a solution to this above. Just remember to backup and archive the JRiver Library backup-zip files!


Quote
Are most peoples' tags effectively invariant?

I'd say so ...

I mess with my audio files all the time and add information etc. But for the video files,  I import em, tag them,  back them up once and archive them and thats it ... no playback stats are written to disk for me.

I'd say what is common is to import, verify the tagging and re-tag if needed, back-it up, archive it. However it is also normal to analyse your audio, rebuild thumbnails and things as part of the import process, if you do it later logical that it will had the info to the file container right?

Quote
ps. For the 'wmv' files that got caught up in the backup, it appears MC set a field called 'beats-per-minute' into the wmv itself.
normal, its part of audio analysis. You probably haven't run this on these files yet.
Logged

mwillems

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 5177
  • "Linux Merit Badge" Recipient
Re: Backups and Tagging
« Reply #4 on: September 08, 2015, 08:39:03 am »

The vast majority of my library is video. My fear is taking some innocuous action in MC (audio analysis of video files, hand tagging, etc.) triggering a multi-TB incremental backup. It seems better to keep tags adjacent to large files instead of inside them.

For almost every kind of video file I've encountered MC already uses sidecars instead of writing to the file.  Certainly true for .mkv, .ts, .avi files etc. So what you're asking for is kind of already the situation for most video files (IME anyway).

I don't have any .wmv files, but I would expect any issue you're experiencing would be limited to files that can store tags inside them (which is pretty atypical for video files).  Audio is different, as most formats support embedded tagging.

You can control whether files get tags written to them as Arindelle mentioned if this is a dealbreaker for you, but be sure to backup your library regularly.

Quote
I guess I'm asking if I am violating some kind of 'best practices' with MC? Or, is it normal for a small tag adjustment to cause a n-GB file to change?

Any change in tags that gets written to a file will obviously change the date stamp. Whether that triggers a backup of the entire file or just the changed section really depends on your backup solution, not on MC.  

For example, I use an rsync-based backup solution,* and if I edit embedded tags rsync just sends the changed parts of the file, not the whole file. Modern de-duplicating backup systems send even less information than rsync. However, some backup systems send the whole file everytime, which is obviously suboptimal (but it sounds like that's what your solution is doing).

So you're not violating MC best practices, but you might want to look into a more granular backup solution if your solution sends the whole file anytime something changes (which is kind of surprising behavior with modern backup software TBH given that rsync is FOSS and solved this problem 20 years ago).

*NB: On the off chance you're already using an rsync-based solution, it may be defaulting to whole file transmission if both locations are perceived as "local," in which case you need to make sure to pass the --no-whole-file flag to force it to only do a delta backup (this will mostly help speed things up if bandwidth is your bottleneck, rather than disk i/o, but will transmit less data for sure).

Quote
Are most peoples' tags effectively invariant?

After initial setup and analysis, my tags rarely change, but when there are significant tag changes to files with embedded tags (i.e. mostly audio files), I expect to see a somewhat larger backup load. Again, the load will depend on your backup solution, my load increases based on the number of files changed, but definitely does not require re-transmitting the full files (or anything close to it).  

Quote
ps. For the 'wmv' files that got caught up in the backup, it appears MC set a field called 'beats-per-minute' into the wmv itself.

That's an audio analysis field, and shouldn't ever be "reset" once it's set, so I wouldn't expect this specific issue to recur.
Logged

ErikN

  • Junior Woodchuck
  • **
  • Posts: 68
Re: Backups and Tagging
« Reply #5 on: September 09, 2015, 01:11:28 pm »


Thank you for the patience and detailed responses. It gives me a pretty good idea of what to expect.

One additional question in case someone happens to know. As noted in the responses I can see that ts, mp4, avi are not internally tagged by MC. However, for ts and mp4 there is nothing that prevents internal tagging -- I could store a pdf, poetry, and a picture of my grandmother in either. It just might be in a non-standard program or box respectively. You could even put it in user data or a custom nalu of h.264 elementary streams if motivated.

So the question. Was the choice not to internally tag ts/mp4/etc. because of limited usefulness/portability outside of MC or because these file are usually big? Or, put differently, is the decision likely to change?
Logged

mwillems

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 5177
  • "Linux Merit Badge" Recipient
Re: Backups and Tagging
« Reply #6 on: September 09, 2015, 01:30:18 pm »

Thank you for the patience and detailed responses. It gives me a pretty good idea of what to expect.

One additional question in case someone happens to know. As noted in the responses I can see that ts, mp4, avi are not internally tagged by MC. However, for ts and mp4 there is nothing that prevents internal tagging -- I could store a pdf, poetry, and a picture of my grandmother in either. It just might be in a non-standard program or box respectively. You could even put it in user data or a custom nalu of h.264 elementary streams if motivated.

So the question. Was the choice not to internally tag ts/mp4/etc. because of limited usefulness/portability outside of MC or because these file are usually big? Or, put differently, is the decision likely to change?

I can't speak for mp4 (I mentioned .mkv above, not .mp4), and I don't know for certain what the rationale is for not tagging .ts (or .mkv for that matter).  I seem to recall some discussions about sidecars for .ts pointing to a desire to maintain compatibility with various hardware player boxes (i.e. improved portability/compatibility).

Yours is the first post I've seen expressing concern about internal tagging based on file size/backup consequences, so I'm not sure that's on the devs' radar (but it may be eventually as the Id project moves along).  Maybe Hendrik or Matt can chime in with the rationale for writing tags to some video formats (like .wmv) but not all?
Logged

glynor

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 19608
Re: Backups and Tagging
« Reply #7 on: September 09, 2015, 05:16:44 pm »

Maybe Hendrik or Matt can chime in with the rationale for writing tags to some video formats (like .wmv) but not all?

I can't speak to why, specifically, they chose to write tags to wmv.  However, I can speak a bit to why they don't write tags to many video formats that do support a tagging architecture (like MKV, etc).  It really comes down to this: Tagging standards. For audio file formats, there are existing, well supported standards over the tagging formats used. By this I do not mean how you technically embed the tags within the files. As you pointed out, many video containers are technically capable of having all sorts of tags and other bits of data stuffed into the container. But instead, standards around how these tags are written and read and interpreted by other applications.

Essentially, the equivalent of ID3 for video.  There is no generally agreed upon standard that many applications use when deciding precisely how to store [Series] or [Description] or [MPAA Rating] within a file, or even what tags should exist and what they should be named.

This is what causes the issue mwillems mentioned about fragile support in other players (particularly hardware players).  Many of them puke and die when they encounter something their simple playback code wasn't built to encounter, and they aren't built to encounter it because there is no good standard.

And, so, if you did embed them in the files, not only would you risk breaking playback on such-and-such dumbly coded player, but only MC would be able to use them. And tags embedded in the files that only MC can use are of very limited utility (since MC has a database, after all, and doesn't really even use the embedded tags except at import time).  Tags are all about interchange of metadata between applications, and without an agreed upon standard, this can't effectively be done.

You could go the Apple route and just make up your own standard, but then that doesn't support interchange with anything else. The sidecars, on the other hand, are simple XML and can be parsed by simple scripts (or even by a human in a text editor).

So, in general, the sidecars are more effective due to our current lack of standards around video tagging methodology.
Logged
"Some cultures are defined by their relationship to cheese."

Visit me on the Interweb Thingie: http://glynor.com/

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10721
Re: Backups and Tagging
« Reply #8 on: September 10, 2015, 02:09:52 am »

I think the only reason we write to wmv is because we use a Microsoft reader for WMV which has a tagging interface.
Many other video containers all have their own way to store tags, and it would be a format specific implementation for each and everyone of them. Not only that, some video formats don't really allow adding tags after-the-fact, so you would somehow have to re-write the entire file to make space for tags in the right place, which is insane effort to validate that we don't break or change the remaining content of the files.

Then of course a whole bunch of video formats don't even allow tags at all, or only an extremely limited subset of pre-defined tags.

So in short, video tag support is inconsistent at best, potentially days/weeks of development effort, and generally not worth it.
Logged
~ nevcairiel
~ Author of LAV Filters

ErikN

  • Junior Woodchuck
  • **
  • Posts: 68
Re: Backups and Tagging
« Reply #9 on: September 10, 2015, 07:12:50 pm »


Great responses.

In case there was any confusion, I wasn't asking for extending embedded tags to more video formats  :)  My hope was the opposite, that nothing changes -- except, apparently, not internally tagging wmv. I don't have many wmv and quality isn't a concern for the few I do have so I may just covert them all to mp4.

Logged

blgentry

  • Regular Member
  • Citizen of the Universe
  • *****
  • Posts: 8009
Re: Backups and Tagging
« Reply #10 on: September 10, 2015, 08:02:00 pm »

After reading what Glynor and Hendrik have to say, I'm inclined to say that MC should not try to tag ANY video files.  Even WMVs.  I know it works, but based on this thread, and the general philosophy of "video tagging is non-standard", it makes sense to just not do it on any format.  It would solve the OP's problem and make things more consistent.

IMHO.

Brian.
Logged

flight16

  • Junior Woodchuck
  • **
  • Posts: 50
Re: Backups and Tagging
« Reply #11 on: September 10, 2015, 09:00:11 pm »

I agree that MC shouldn't write any tags to video.  That would be the most consistent and make for the least number of surprises.  Then I just know "MC never writes tags to video" and not have to wonder "I'm tagging this video, what formats does MC tag?  Ok, now what format is this video?  Will this video be tagged...?"
Logged
Pages: [1]   Go Up