Topic: Wish for an Integrated Automatic Meta Data System (Read 9842 times)

rick.ca · « **on:** September 06, 2011, 04:30:25 pm »

Quote from: Bryanhoop on September 06, 2011, 03:15:06 pm

Now we just need an automatic scrape on import (fingers crossed).

We actually need considerably more than that. Information for recently released titles will be quite different in a month or so. Changes become less frequent and significant over time, but there's always something changing. We need the ability to batch update files as needed or on a scheduled basis.

We also need the ability to combine data from different sources in whatever manner we prefer. That means the ability to specify the order in which sources are queried, what fields available data are written to, and whether existing existing data should be overwritten. The latter is not only essential for combining sources, but for retaining user modifications to data. Such a system should also allow incoming data to be modified by expressions. This would allow the data to be modified, transformed and/or reformatted in any manner desired.

There are also limits to the sources JRiver can provide, for financial and copyright reasons. Hopefully, whatever system is created will be open enough to allow for third-party plugins that get data from different sources. Ideally, an integrated script engine could be provided (at some later date) that would allow users to create and modify scripts for their own purposes.

For anything to happen automatically on import, MC must be able to reliably extract information from filenames so it's able to find the correct titles at the online sources. This entails pretty much the same thing as "Auto-tagging," so that often-requested feature might as well be considered a prerequisite.

So there are a number of important pieces to the puzzle, and they must be put together in an integrated system to produce the result most of us are hoping for. That's a MC that automatically recognizes, imports, tags and obtains meta data for any video files added to the file system. Developing such a system will be a huge undertaking—probably similar to RO in scope. But the payoff for JRiver is huge—the lack of this functionality seems to be the one remaining objection of many clinging to otherwise inferior media managers.

struct · « **Reply #1 on:** September 06, 2011, 05:36:21 pm »

+1 for Rick's suggestions.

It is the great elephant in the room with JRiver. All the tools in the world to sort, query, find, associate, etc huge amounts of data. Very few tools to help you get that information into MC. This goes for music, movie and tv. Here's hoping for MC17

I know my father in-law went in another direction just because YADB had a 20% success rate on CD's (says something about is music:)), my brother is going elsewhere because data scraping of movies didn't exist and he didn't want hassle of third party. I don't agree with their choices, but for them getting the information was more important than smartlist expressions. Seems a pity.

Craig

fitbrit · « **Reply #2 on:** September 06, 2011, 07:22:54 pm »

Also agreed here.

MC has seen lots of new interest. As Rick and struct have said/alluded to, this one missing feature set is the difference between the great MC we have now and the ulitmate, show-stopping MC that would be the de facto standard IMO.

We're close now - closer than ever before. A first step that I'd welcome would be to be able to scrape the metadata from the existing sources for multiple file selections. Then we'd just have to alter the name of the files to match the official entries in the databases (if needed) and we'd get the data. This would be far better than having to right-click for every single file and re-enter the video look-up environment repeatedly.

Exciting times!

HTPC4ME · « **Reply #3 on:** September 06, 2011, 08:58:35 pm »

One thing i think would be cool with the movie, tv, music auto tagging your discussing, would be once a person has there whole library tagged, that jriver media center could scour it's own library and once a new file becomes available for purchase or release via IMDB, AMAZON, TVDB etc...
In JRiver we have could have a smartlist setup, of all new release available to buy. Example i have all the Tesla CD's, well if Tesla ever makes a new cd - jriver would then notify me, Hey Amazon, imdb, TVDB has just listed that a new cd is available by that artist, A new dvd for a series (say a new nightmare on elm street part 76) or a new entourage season has started)

I'm not sure how this would be done, or if it could be done, but it would be cool.

I guess it would be along the same lines of our distributers... When a new shirt, sweatshirt item is available it just automatically updates on our generic webpages they give us.

A neat addon. thoughts?

rick.ca · « **Reply #4 on:** September 06, 2011, 09:50:30 pm »

Quote from: struct on September 06, 2011, 05:36:21 pm

Very few tools to help you get that information into MC. This goes for music, movie and tv.

You're absolutely right. I shouldn't have implied the issue applies to video only. It's just that most users seem to take for granted whatever solution they've settled on for audio, and believe the issue is restricted to video. The system I'm advocating has no such restriction. As described here, I use my own methods to import album information from AllMovies. I'd be lost without that, so I do understand the value of this applying to all media.

Quote

All the tools in the world to sort, query, find, associate, etc huge amounts of data.

There's a subtle but very important general principal behind such a system. Our inclinations towards configuring such things varies widely, as do our preferences for the nature and amount of meta data required. But, in the end, we all want the same thing—for the media to be imported with all the information we want without any hassle. The awesome power and flexibility of MC should be brought to bear in the configuration of such a system. If the meta data is not exactly what I want, I'd much rather modify the configuration to fix it automatically as it's imported than apply the same tool to each and every file after the fact.

This is not about catering to "power users." Some of us like to configure and tweak things to our hearts' content, while others are frustrated by the need to configure anything. But the chances of a default configuration meeting the latter user's needs are much enhanced by the power and flexibility of such a system. It avoids the frustration of things not working because the user's circumstances don't comply with some arbitrary assumption used in an effort to make things "simpler." While I enjoy configuring such things, I prefer to do so only when I'm in the mood. Otherwise, my reaction to new media being unusable until I've done a bunch of tagging and housekeeping—regardless of the awesomeness of the tools provided—is the same as most users. It's an unacceptable PITA.

Quote from: fitbrit on September 06, 2011, 07:22:54 pm

Then we'd just have to alter the name of the files to match the official entries in the databases (if needed) and we'd get the data.

Such actions should be rare in a properly designed and configured system. There can be a chicken or egg issue around how files in the file system are named. With a little common sense, however, it's not difficult to adopt practises that ensure file pathnames are consistent enough such a system can handle them automatically with an acceptably low error rate. That simply requires Regular Expressions (a capability recently added to MC's expression language) configured to recognize and extract the necessary information that both identifies the media sub type and provides accurate data for looking up data at online meta data sources. Then, for example, if you were to adopt a practise of including "Title (Year)" as the folder name or first part of the filename of all movies, you could pretty much guarantee 95% accurate meta data lookups. In the context of an integrated system, that means all those movies are fully tagged with all the information you want—automatically.

Quote from: xtacbyme on September 06, 2011, 08:58:35 pm

One thing i think would be cool with the movie, tv, music auto tagging your discussing, would be once a person has there whole library tagged, that jriver media center could scour it's own library and once a new file becomes available for purchase or release via IMDB, AMAZON, TVDB etc...

I think this would be casting too wide a net, and I'm not sure how it could be intelligently restricted to new media most likely to be of interest. But I do hope to see a more general capability not yet mentioned here. That's that ability of MC to maintain records for media that does not exist in the file system. I currently use dummy files to record information about movies I've seen but do not own, and series. I could do the same for movies I wish to see or purchase, but prefer to download and tag a trailer for those. I believe it's important this capability be built into MC so these things can be done in a more natural an intuitive manner. That would also allow such records to make a seamless transition to representing a real file when the media is acquired.

This also has an important role in the case of a currently running series. As mentioned, some way of recording information about the series itself is necessary. But information about upcoming episodes is usually available for at least a few weeks in advance. It doesn't make sense to restrict information to just the episodes for which files are available. I generally watch and delete episodes. That doesn't mean I wouldn't appreciate having information about what I've watched. I might also be more interested in seeing information about what's being downloaded right now or coming next week than anything else.

Even if not automatic, this capability—along with such a meta data system—would still be fairly efficient at obtaining such information. You would just create a pseudo record (e.g., with [Name]="Some New Movie Release" and maybe [Year]="2011") and update it with your chosen meta data sources. It should also be possible to import a list of such items, and then update them all. An import facility should be able to import a list, not adding a pseudo record for anything already existing in the library. You'd then be able to do things like import the IMDb Top 250, and create records for all the movies in that list that you don't own.

fitbrit · « **Reply #5 on:** September 06, 2011, 10:40:25 pm »

Rick: I'm finding more and more that TMDB, at least, has trouble with movies where [name]=[title] ([year]). Title only works MUCH better. Rotten Tomatoes is better at this though. I originally added the year for use with PVDImport, and also to differentiate between movies of the same title, so that the same cover art wasn't applied to them automatically; for example, three different versions of King Kong all end up with the same cover art if you use a designated cover art folder... because MC renames the cover art as Video - [Name].jpg.
I'm now going to start removing the last 7 characters of all my movies that have a year in parentheses after the title. I can use the removeright expression to do so. However, I just remembered why I started adding the year in the first place (cover art). I really would like MC to handle cover art differently. That's another thread.

So far I'm not really finding anything 'simple' about regular expressions because I simply don't have time to learn how they're used. Next month, maybe. Or if I find a step by step tutorial, or even a few examples broken down...

rjm · « **Reply #6 on:** September 06, 2011, 10:59:53 pm »

JRiver would be wise to study and learn from the Collectorz.com Movie Collector experience before launching this project. Collectorz had brilliant features for getting movie data from all the best sources. They removed the features after being threatened by several data owners (and I think because they hoped to generate revenue from their own database).

There are several public domain data sources available but I prefer the private data sources like AMG, IMDB, Amazon, and DVD Empire. The risk is JRiver could invest a lot of effort but not be able to provide data from the sources people really want.

rick.ca · « **Reply #7 on:** September 06, 2011, 11:58:12 pm »

Quote from: fitbrit on September 06, 2011, 10:40:25 pm

Rick: I'm finding more and more that TMDB, at least, has trouble with movies where [name]=[title] ([year])...

I'm not sure why you would want or need to do that. I suggested [Filename (path)] might include "Title (Year)" as one way of ensuring a system that extracts information from the filename (MC does not yet have this capability) is able to get the information it needs. It would save the Title in [Name] and Year in [Date (year)]. I'm not sure how the existing TMDb lookup works (or, for that matter, how the equivalent PVD plugin works), but surely that's what they assume those fields will contain. If a source can perform a more precise search with the year included, then it can be included. In the case of TMDb, I think you'll find it works much better with "Title Year," while "Title (Year)" may be worse than just "Title."

As for cover art, I have no problem with the covers being saved beside the video. That is, the image is named the same as the movie, with a JPG extension.

rick.ca · « **Reply #8 on:** September 07, 2011, 12:49:24 am »

Quote from: rjm on September 06, 2011, 10:59:53 pm

JRiver would be wise to study and learn from the Collectorz.com Movie Collector experience before launching this project.

You're right. JRiver providing a system takes data from anywhere for the benefit of it's customers would be a flagrant violation of copyright. But there has been no suggestion here that they should do that. There are are a number of good sources that will grant permission to use their data without change. For other sources, JRiver must remain hands-off. But that doesn't mean the data available at any other sites cannot be legally used by any individual for their personal use. Once I've been allowed to view a web page, I'm not constrained from using it in any manner I chose. It's already downloaded to my HDD. If I choose to parse it and save data from it in my personal database, that's my business. So one solution is for JRiver to provide a script engine that does nothing more than support me in doing so.

rjm · « **Reply #9 on:** September 07, 2011, 01:06:16 am »

Quote from: rick.ca on September 07, 2011, 12:49:24 am

So one solution is for JRiver to provide a script engine that does nothing more than support me in doing so.

Maybe ok, maybe not. JRiver profiting from providing scripts that capture copyrighted data may not be ok. I don't know. They should check carefully before starting project.

justsomeguy · « **Reply #10 on:** September 07, 2011, 01:32:48 am »

As far as copyright and charging for a product that uses data from an outside source. Would it make a difference if jriver created a small stand alone app that they provided free of charge to anyone that wanted it with or without a MC license. The app could just provide raw info for anyone for personal use. Then MC could just be made to interface with that free external app to use that data.

steelman1991 · « **Reply #11 on:** September 07, 2011, 01:39:38 am »

What about creating our a JRiver database purely for .....well JRiver. I would think there are more than enough users on here who with a small script in MC wouldn't mind contributing what data they have, to a central db. Just thinking out loud. And only for movies at this time - thetvdb seems a far more stable project at the moment and would probably suffice for meta collection.

Isn't there already an option similar for audio data.

Keep your eye on the moviedb - there might be a free for all there shortly - maybe get all the data you need to start away

flac.rules · « **Reply #12 on:** September 07, 2011, 02:30:13 am »

Quote from: rjm on September 07, 2011, 01:06:16 am

Maybe ok, maybe not. JRiver profiting from providing scripts that capture copyrighted data may not be ok. I don't know. They should check carefully before starting project.

Seems like it should be ok, a lot of programs have options for user-generated scrapers and have no trouble? And is it legally questionable to capture this data for a user even? Is it copyrighted when its "public data" they have published for free for all too see (I am curious, i am not a lawyer, or even a US citizen)

As for the technical side of things, a function like this would be very nice, I am more or less in 100% agreement with all that rick.ca has said in this thread.

MrHaugen · « **Reply #13 on:** September 07, 2011, 05:35:39 am »

This is a very important topic indeed. Scraping is really needed for MC, if it wish to capture more customers from other dumbed down Media Centers that does things automatically. Red October was a good, and necessary step. Next big one is in my opinion this topic.

rick.ca · « **Reply #14 on:** September 07, 2011, 05:58:36 am »

Quote from: rjm on September 07, 2011, 01:06:16 am

Maybe ok, maybe not. JRiver profiting from providing scripts that capture copyrighted data may not be ok. I don't know. They should check carefully before starting project.

Sorry, but I'm having difficulty following your logic. Users might not be happy with freely available meta data sources, they may not be willing to pay for a subscription service, someone might take offence to users gathering data for their own use (even though this is perfectly legal)—therefore "the project" should not be started? This sounds like a good prescription for never doing anything.

I only mention the script engine as one option that would counter your suggestion the idea is not viable because users will not be happy with the available sources. I might have just pointed out that idea doesn't seem very plausible. Not long ago, MC itself offered nothing. All it offers now is a manual tool that adds a fixed set of data from one of a few different sources, one movie at a time. Clearly, most users would see an automated system providing data from TMDb, TVDb, Wikipedia and more a huge leap forward.

Will they want more? Of course they'll want more. The same system would be a perfect platform for providing one or more subscription services. The problem is, it doesn't seem there are any providers willing to offer a business arrangement that's viable. So I suggest the script engine idea. Not just as a solution to this specific problem, but because I believe the ability to efficiently harvest data from the web is eventually going to become essential to MC's survival. As web applications become more functional, the lines between the abilities of a local database and the same application on the web will blur. No matter how awesome it's data management capabilities, MC can't afford to be the option with no data.

No, there's no good reason to delay this project any further. The only good reason for it not being done already is RO. Things take time. This will take time. There's plenty to do before any decisions are necessary on things like scripting engine or subscription data services. What I've described does not require such things immediately. Yet without them, it will largely fill the biggest remaining gap in MC's features. Even if the same selection of data sources are not initially available, the functionality of the system itself will blow away the competition. It will surely boost JRiver's revenues enough to hire more bodies for the development of a scripting engine, and to retain a good lawyer to keep them out of trouble.

Quote from: justsomeguy on September 07, 2011, 01:32:48 am

As far as copyright and charging for a product that uses data from an outside source. Would it make a difference if jriver created a small stand alone app that they provided free of charge to anyone that wanted it with or without a MC license. The app could just provide raw info for anyone for personal use. Then MC could just be made to interface with that free external app to use that data.

No, I don't think this would make a difference. The substance is much the same as the script engine idea, except JRiver is more closely associated with the scraper, and it's arbitrarily separation would only make it less convenient and probably not as automatic. The point of the script engine idea is not to be coy about what is being done. What is being done is perfectly legitimate. If I want, I can get an application designed for harvesting data from the web, and configure it to get the data I want. (In fact, that's what I'm already doing—with my own scripts running in PVD's script engine.) But that's a lot of work, and it still isn't integrated with MC. (Fortunately, PvdImport does that for now.) All I'm suggesting is JRiver provide the same thing as an integrated solution.

Before assuming that no commercial web site could possibly tolerate anyone scraping their data, consider this. A user's primary purpose is just to gather reliable information with which they may categorize and organize their media. One of the pieces of data they capture is a link back to the source. This link serves a secondary purpose of enabling them to return to the site to view more information, find related media, and use the shopping links found there. Other users, like xtacbyme, will actually want the related media and shopping links automatically added to their database. What web site owner in sound mind is going to be against this scenario? Who knows—maybe some are even enlightened enough to encourage this sort of thing.

Quote from: steelman1991 on September 07, 2011, 01:39:38 am

What about creating our a JRiver database purely for .....well JRiver. I would think there are more than enough users on here who with a small script in MC wouldn't mind contributing what data they have, to a central db.

I don't think so. It would never come close to what's already available using TMDb and TVDb, and some don't consider them good enough because they're comprised of user-contributed data.

Quote from: Elvis133 on September 07, 2011, 02:30:13 am

And is it legally questionable to capture this data for a user even? Is it copyrighted when its "public data" they have published for free for all too see (I am curious, i am not a lawyer, or even a US citizen)

No, not in the least—unless you've hacked into a private site. When you view a web page, you've downloaded it. The data is captured. You're free to do whatever you like with it, as long as it's for your own personal use. It's not a matter of whether or not the publisher's copyright exists. It does. It's just not applicable to the circumstances unless you violate it. That can't happen unless you give or sell the data to someone else.

The risk to JRiver is being construed as having provided you the data. That in itself would violate copyright, but I imagine the legal consequences might be worse to the extent it were evident this is part of what you're paying them for. But if JRiver is only providing a scripting engine, they're not associated with the data any more than they are with the media played using the software.

steelman1991 · « **Reply #15 on:** September 07, 2011, 11:46:41 am »

Quote from: rick.ca on September 07, 2011, 05:58:36 am

I don't think so. It would never come close to what's already available using TMDb and TVDb, and some don't consider them good enough because they're comprised of user-contributed data.

Agreed regarding the TVDb, as stated in my post at this time. Its unclear how long TMDb will continue to operate in its present format, therefore perhaps a move to a specific db purely for movies, may be beneficial to JRiver.

Surely the anal retentive users on this site, and I include myself in that, could ensure that the data is uncompromised and in the absence of a properly regulated commercial db, at the moment TMDb is all we got. Hell even IMDb is user generated. The vast majority of users will already have scraped their data from that source, therefore all we would be doing is feeding it back in its original form, only this time to a proprietary db.

HTPC4ME · « **Reply #16 on:** September 07, 2011, 12:06:01 pm »

may i ask... What is wrong with Autometa plugin?

Is it not possible to buy that plugin from the owner.. and then have JRiver built on that? i've tried pvd, and autometa, and for me (not being as savy as most here) it works awesome, there isn't a big learning curve (new customers could use it easily). JRiver would just have to get the music side of auto meta to work. statistically auto meta has found over 99% of my movie, tv series when i've searched for them.

Just a question/suggestion.

glynor · « **Reply #17 on:** September 07, 2011, 02:10:57 pm »

Quote from: xtacbyme on September 07, 2011, 12:06:01 pm

may i ask... What is wrong with Autometa plugin?

Have you ever tried to use it thoroughly on a large library?

I could document ALL of the wacky problems I have with it. But mainly, the UI is absolutely TERRIBLE. I use it, and the actual data it grabs is quite good (usually), but man it is a pain to get a few new files updated.

For example, I've described my setup in great detail elsewhere, but I'll boil it down for this purpose. I have a large Media Drive that stores basically all of my long-term stored files mounted as drive M. All of the video files live in M:\video\[Media Sub Type], eventually, but they don't get moved there until they have been thoroughly tagged and "vetted". New files "arrive" generally in either T:\recordings\ or M:\incoming\ depending on the source. Including all of my external drives, I have probably 60-70 series total in the "All TV Shows" view. Both of the "new files" locations are constantly full of hundreds of new files that either haven't been tagged yet, haven't been checked yet (to make sure they are what they say they are). I'll clear them out eventually, but I have a constant medium back-log and a large set of files that I don't intend to ever tag (see below).

So... What happens is my recording system will record a few new episodes of shows through the course of the week. Most of these I don't bother to tag (or tag well), because I'm going to watch them once and delete them. The "good stuff" gets tagged. But, there is no point in tagging today's 3 episodes of the BBC news, if they're going to be auto-deleted tomorrow when tomorrow's episodes air. Likewise, I don't tag my recordings of The Daily Show, The Colbert Report, and shows like that that are "temporary". Plus, I have lots of recordings that I want, but I'm not going to ever watch more than once (Nova and Nature episodes, for example). For these, I tag [Media Sub Type] and [Series] the way I want them (so they show up in the right views in Theater View), but I'm not going to go through and look up and set Episode numbers for each of these and run them through AutoMeta. Most of these "junk files" just live in the T:\recordings\ directory until they're watched or auto-deleted. But, they're in the directory...

When I'm going through doing this with AutoMeta, this is the process:

1. I open AutoMeta in MC.

2. I switch to the AutoMeta options dialog (which is a great example of poor UI design) and set it to import ONLY either M:\incoming\ or T:\recordings\. (If I run it on the full Library, it is even MORE annoying, so that is basically useless, as you'll see in a moment.)'

3. It scans this location. While it does this, I am forced to click through a seemingly unending set of one-at-a-time pop-up modal dialog boxes matching the Series names to the "data location" names. Even though I've done all of these before, it still pops up at-least an OK dialog box for each one. One at a time. Sometimes it crashes while I'm doing this, but even when it doesn't, it takes FOREVER and is an absurd user experience. Probably 95% of these actually require me to do nothing but hit "OK" like a monkey. Why in god's name does it do this?

4. Worse, if the [Series] field in MC doesn't match the "Series Name" that AutoMeta wants to use (and many of mine don't because theirs are bad or just plain wrong, like Law & Order SVU is called "Law and Order: New York", for example), then AutoMeta complains about this. I usually have to go through 5 or 10 of these dialogs each time. It also doesn't remember what you picked before, because I don't like to keep their badly titled/styled results. This is despite the fact that AutoMeta appears to have some sort of Series ID internal database that it matches against. But you can't define your own [Series] names, and have it remember that "match" from then on. Nope, you have to use whatever badly formatted dreck it finds, or you have to click through and search through these tedious dialog boxes one at a time each time you scan for new files to tag. AND, many times, I go through all of these "OK, OK, OK, OK, OK, OK, OK, OK, search for the right series, OK, OK, OK, OK" dialogs for shows I'm not even going to bother to actually apply any metadata to at all! Even worse, if I happen to try to use AutoMeta on my full library, this set of OK dialogs is even more absurd, and includes TONS of files that are stored on external disks that are already fully tagged and that I have no desire to deal with.

5. Then, I finally get the list of files that I want to tag. This part works pretty well, but it can be VERY finicky to check what files are actually going to have metadata applied. My list usually is several hundred files long, of which, I'm applying metadata to maybe 5 or 10 files (often one or two, but I'm too lazy to do it then because it is such a pain, so I save them up a bit). That's because, like I said, I don't fully tag all of the recordings I make. You can't filter the list. You can't sort it to show all of the ones "checked" at the top. You can't do anything but manually scan through and check them one at a time. Want to check the data that is going to be applied side-by-side against what is already in the files? Nope. Just the list with the funky color scheme. Want to UNCHECK everything and go through and check only a handful of files? Get your clicking finger ready. Want to shift-select a set of files and toggle the checkbox on or off for the whole set at once? Nope. Get your clicking finger ready.

6. Lastly, I apply the results. This is the best part about AutoMeta. This part Just Works, in almost all cases. It is fast and works reliably.

7. Except, now all of those files where the [Series] tag autometa wants to use doesn't match what I want to use? Yep, it overwrites what I've manually tagged. No way to turn it off. I can de-select all of the other fields if I don't want them tagged, but [Series] is "special" and AutoMeta expects you to blindly follow whatever convention it finds in the "cloud" and overwrites what you manually tagged to get this far (which is why it shows the annoying one-at-a-time dialogs at the beginning, but still is a terrible idea). So, I have to go through and find all of the files it just tagged, and set "Law and Order: New York" back to "Law & Order SVU" and "C.S.I." back to "CSI" and "Star Trek: The Next Generation" back to "Star Trek TNG" and so on and so forth.

Rinse, wash, repeat next week.

It works, but it is very clunky, manual, and a tedious, time consuming process with such a poor user experience I'd never even HOPE to have my wife do it on occasion.

I want this:

1. I set, either manually or preferably auto-grabbed from the filename/path of the incoming file, the following tags: [Media Sub Type], [Series], [Season], [Episode].
2, MC AUTOMATICALLY, in the background and on-the-fly, without any user action of any kind, fills in the [Description] field, [Episode Title] field (I don't go directly to [Name] because I want the [Name] tag to follow a strict formatting system), [Actors], etc.
3. If I later go "oops" and change the [Episode] tag from 20 to 21, it fixes the relevant tags automatically.
4. If it finds a [Series] name and I need to match it, it pops up a "match to data source" dialog ONCE at the time I first tag that particular [Series]. Then, once I match it, it continues to use what I manually entered as the [Series] name, but relates it to the [Series ID] in the background itself. Leave my names alone!
5. If I tag only [Media Sub Type] and [Series] (but not [Season] and [Episode]) then it should fill what it can about the show, like [Series Description] for example.

Vincent Kars · « **Reply #18 on:** September 07, 2011, 03:41:03 pm »

I do think the meta data is the weak spot.
YADB is too small.
FreeDB is a bit chaotic

Even WMP allows allows you to select an album ain the interface and do a manual lookup if needed.
You can’t even configure the lookup in MC.
It bugs me that a middle of the road player not costing you a penny like WMP does a better job.
As I’m into classical music I wouldn’t mind at all if MC would support AMG.
I’m willing to pay the licence fee (AMG is paid just like Gracenotes).

For ripping I use dbPoweramp, it is fast, reliable at has much better meta data.
For tagging I might even resort to MP3Tag or WMP to get better results.
Can’t stand it that my favourite player is down on this aspect.

rick.ca · « **Reply #19 on:** September 07, 2011, 05:11:14 pm »

Quote from: mark_h on September 07, 2011, 08:38:24 am

Clarification from IMDb: They still allow full access to datasets for non-commercial use, which isn't much use here and there's no public API anyway, just full datasets...

This seems to support my contention users scraping such sites is not copyright infringement, nor does a site like IMDb have any other objection to the practise. This is, no doubt, exactly the sort of thing Amazon wants us to be doing—because it ultimately brings us closer to some revenue-producing action. This also explains why this is exactly what some MC competitors have been doing for years—without any legal problems.

So I don't believe there is any risk to JRiver in providing a script engine that supports the creation and use of scripts that scrape data from IMDb and any similar site. They could even provide a "sample" script for a source like IMDb to get the user community started in developing it's own scripts.

rick.ca · « **Reply #20 on:** September 07, 2011, 05:41:33 pm »

Quote from: steelman1991 on September 07, 2011, 11:46:41 am

Agreed regarding the TVDb, as stated in my post at this time. Its unclear how long TMDb will continue to operate in its present format, therefore perhaps a move to a specific db purely for movies, may be beneficial to JRiver.

There's no reason to believe TMDb will cease to operate in short or medium term. Beyond that, everything ceases to operation, including JRiver, you and me. What's far more relevant is the proposed system will make it easier to use multiple sources, and to adapt as sources come and go (as they surely will).

Regardless of the risk of sources changing and disappearing, a database maintained by JRiver is simply a very bad idea. A mish-mash of inconsistent data provided by users doesn't suit anyone. It's completely contrary to the idea of (and significant benefits from) combining data from various sources. JRiver would be exposed to the legal risk of copyright infringement by users uploading data obtained from elsewhere. What would JRiver do if a company like Rovi advised them some of the records contained Rovi data. Having no way to identify such records, they would have no choice but to shut-down the database.

Quote

Hell even IMDb is user generated. The vast majority of users will already have scraped their data from that source, therefore all we would be doing is feeding it back in its original form, only this time to a proprietary db.

And it certainly makes no sense to violate the copyright of a source that's perfectly happy to have users do whatever they want with their publicly available data in the first place. I doubt the fact it would be users uploading the IMDb data to JRiver's database would mean that JRiver would not be infringing upon IMDb's copyright.

Daydream · « **Reply #21 on:** September 07, 2011, 05:53:32 pm »

Adding my 2 cents on this big discussion, the XBMC crowd (as in people already having metadata in other formats) may find MC more appealing if there would be a nfo-2-sidecar or nfo-2-database import option. (nfo is well documented)

Quote from: rick.ca on September 07, 2011, 05:11:14 pm

So I don't believe there is any risk to JRiver in providing a script engine that supports the creation and use of scripts that scrape data from IMDb and any similar site.

You think a specialized script engine or just let's add Python support (free even for commercial software) and we're done here? I'd really like the Python option.

rick.ca · « **Reply #22 on:** September 07, 2011, 06:10:57 pm »

Quote from: glynor on September 07, 2011, 02:10:57 pm

I want this...

Examining the shortcoming of current plugins is somewhat beyond the scope of intended topic. And I'm intentionally trying to avoid discussion implementation details, of which there are many. But I would like to comment (and I'm sure you get this) that all of what you want would or could be delivered by the proposed system.

Quote from: xtacbyme on September 07, 2011, 12:06:01 pm

may i ask... What is wrong with Autometa plugin?

I'll attempt to answer this more directly. There would be no significant advantage in any attempt to "build on" AutoMeta or any other plugin or program. The developers could get some ideas about what to do and what not to do by trying out things like this, but I doubt the code or even the general design is of much use. The various functions than need to be performed are not complicated or foreign to JRiver developers. But most importantly, some of the most significant advantages of the proposed system stem from the fact it would all be part of, or fully integrated with, the main program. That means the meta data lookup function can be directly integrated with auto-import (and auto-tagging). It means data can be mapped directly with fields and the expression language used to manipulate the data. And just very generally speaking, however it works, it's configuration and maintenance just has to be a much more seamless, intuitive and efficient experience.

rick.ca · « **Reply #23 on:** September 07, 2011, 06:30:07 pm »

Quote from: Daydream on September 07, 2011, 05:53:32 pm

Adding my 2 cents on this big discussion, the XBMC crowd (as in people already having metadata in other formats) may find MC more appealing if there would be a nfo-2-sidecar or nfo-2-database import option. (nfo is well documented)

Maybe. But unless there's user-created data in them (e.g., rating, date viewed, comment), I would wonder what purpose it might serve. Any other data is just going to be updated from the desired sources anyway. This is not to say there isn't a need to import existing data, but that's not a need that arises out of this proposed system. And there are existing ways to import such data.

Quote

You think a specialized script engine or just let's add Python support (free even for commercial software) and we're done here? I'd really like the Python option.

I'm not familiar with Python, and my programming phobia makes me very afraid. I find the process of writing a script for a very task-oriented script engine (that in PVD) challenging enough. Perhaps you could explain more, particularly its usability by non-programmers. Maybe it wouldn't make any difference to me, as my programming skills are limited to modifying what other have done by trial and error anyway. $:-\$

HTPC4ME · « **Reply #24 on:** September 07, 2011, 06:53:49 pm »

I agree with you Glynor, and Rick in regards to autometa... the tv series/discogs (music) would need ALOT of work, but for me it WAS a life saver (over 2 tera of tv shows) when i first tagged everything it saved me alot of manual input. the autometa is rough around the edges, hence my suggestion of building on it. after all it has been the only (easy to use) plugin that jriver has, I still use it for FULL tv series's that need to be tagged, and i use the movie/film side of auto meta religiously due to the fact it usually finds what i need it to find, and it fills it in for me...i thank the developer for his efforts and offering it to us. (It's not and end all solution, but it's done well for me)

There def is a need for a change, i've yet to tag all my music due to not having the time, and having to use 3rd party software to accomplish it, It would be nice to have an all in one solution built into jriver.

Daydream · « **Reply #25 on:** September 07, 2011, 08:07:01 pm »

Quote from: rick.ca on September 07, 2011, 06:30:07 pm

Maybe. But unless there's user-created data in them (e.g., rating, date viewed, comment), I would wonder what purpose it might serve. Any other data is just going to be updated from the desired sources anyway. This is not to say there isn't a need to import existing data, but that's not a need that arises out of this proposed system. And there are existing ways to import such data.

As I see it there are 3 instances:
- people with no metadata anywhere; massive imports required
- people with metadata already somewhere else. Just as with PVD, me as an XBMC user would like to bring over what I already have locally and not rescrape data for 1000 movies and 5000 episodes.
- people with metadata in MC that need to populate metadata for new additions (sooner or later everybody will fall into this category after going through either of the previous ones). The scariest example so far for this is Glynor

(I don't record anything, I download; so my approach is different)

I agree the focus should be on what MC can do by itself, but as there are around many users that already use other media center solutions, it will matter, in a bigger picture, how JRiver wins them over.

Quote

I'm not familiar with Python, and my programming phobia makes me very afraid. I find the process of writing a script for a very task-oriented script engine (that in PVD) challenging enough. Perhaps you could explain more, particularly its usability by non-programmers. Maybe it wouldn't make any difference to me, as my programming skills are limited to modifying what other have done by trial and error anyway. $:-\$

I'm not a programmer either. I can script various things here and there but that's all. I understand regex but not at wizard level. I'm all for a script engine that would be relatively simple to use and able to do amazing stuff (scrape complex sources). But I did not see that turning up anywhere. So in case there is no solution like that, please remember Python. It's not for everybody, but it can do wonderful things. So much so that I would choose to learn it even if my programming-aversion is quite noticeable.

Non-programmers will do what they do everywhere else: take the script and run it; modify it if they feel so inclined. Maybe you envision a situation where we can get out of that loop. But as long as the target remains mobile (scrape stuff from various sources, accounting for the fact that some may change their format/sites, disappear or new ones may appear) I'm afraid we need a powerful engine.

fitbrit · « **Reply #26 on:** September 07, 2011, 09:58:03 pm »

Quote from: rick.ca on September 06, 2011, 11:58:12 pm

As for cover art, I have no problem with the covers being saved beside the video. That is, the image is named the same as the movie, with a JPG extension.

To be honest, what you have no problem with doesn't really help with the problem I have. I guess all I can say is that I'm happy for you.
I got into MC12 a few years ago, and maybe made a few sub-optimal (Ill let you disagree and explain in three paragraphs how 'abysmal' is more accurate than 'sub-optimal'

) decisions on how to organise my media. These were choices that the program offered, such as keeping cover art in a separate folder. Back then, I believe cover art in an external loation took on [filename].jpg, which I much preferred. Now, I have 30TB+ of media and am in a situation where a cover art naming problem occurs if I don't differentiate between movies with the same title with something like the year.
Do you know of a way I can quickly (or in the background, with minimal intervention) move cover art into the relevant folders?
Any help would be appreciated.

glynor · « **Reply #27 on:** September 07, 2011, 10:59:10 pm »

Quote from: xtacbyme on September 07, 2011, 06:53:49 pm

i thank the developer for his efforts and offering it to us. (It's not and end all solution, but it's done well for me)

Here here!

I totally agree. You just asked what was wrong with it. I answered. But, yes, I've used it, and continue to use it, and it sure-as-heck beats having no way to import the data!

glynor · « **Reply #28 on:** September 07, 2011, 11:01:21 pm »

Quote from: glynor on September 07, 2011, 02:10:57 pm

Including all of my external drives, I have probably 60-70 series total in the "All TV Shows" view.

I guessed quite low. The actual number is 95. I have 52 in "online storage" (on drive M), and the rest on external disks.

Daydream · « **Reply #29 on:** September 08, 2011, 12:06:21 am »

Quote from: glynor on September 07, 2011, 11:01:21 pm

I guessed quite low. The actual number is 95. I have 52 in "online storage" (on drive M), and the rest on external disks.

Oh, snap! This changes _everything_. You heard I have 66 and thought to pull ahead, eh?! See what you did, now I have to go out of my way and get more...!

rick.ca · « **Reply #30 on:** September 08, 2011, 01:58:15 am »

Quote from: Daydream on September 07, 2011, 08:07:01 pm

I agree the focus should be on what MC can do by itself, but as there are around many users that already use other media center solutions, it will matter, in a bigger picture, how JRiver wins them over.

Yes, MC could support more methods of importing data. I've often wished it would support direct imports from Excel, or at least CSV. There's lots of reasons why users may want to import data. As I said, the need doesn't just arise in the context of the proposed system.

At the same time, my experience supporting PVD suggests new users tend to over-estimate the value of their existing data, especially when coming from an inferior application. In configuring the new system, they discover they want to make some change (for the better) to the data from a particular source, which means updating all existing records anyway. When running a script overnight can easily update thousands of records, doing so is not much of an issue for most people. I do agree, however, the existence of convenient import tools is an important factor in winning over new users understandably concerned about preserving existing data.

Quote

I'm all for a script engine that would be relatively simple to use and able to do amazing stuff (scrape complex sources).

Not knowing much about Python, your comments prompted me to do the obvious—I Googled "Python scraper." Near the top of the list was...

Quote

Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.

Now I'm not recommending this—I've barely scanned the website. But the fact I can find such a thing so readily suggests there might be something tailored for this particular purpose that would be much easier to use than Python itself, and save JRiver from having to reinvent the wheel. But it's also an "open source 100% Python" application, suggesting it could be modified to include any unique requirements of an MC implementation. My glance at some examples suggested to me it might even be simpler to use than the scripting engine I'm used to.

It would be helpful if someone who understands such things would take a look and comment. I'm now wondering if providing a "script engine" is not something a whole lot simpler than I assumed it would be. If so, it might make more sense to implement such a thing earlier rather than later. Interested users could then get to work on things like an IMDb script as JRiver continued to develop the host system.

CountryBumkin · « **Reply #31 on:** September 08, 2011, 11:04:29 am »

Maybe just using another program like "MyMovies Collection Management" would be the answer (at least for Video). I doesn't look like MM objects (the last sentance in the quote below seems to indicate it is okay to use this data for other programs) to using their program and database since its free and already available. MM does not charge for the basic level program (which is all you need). It collects and stores the meta data into various formats that could be easily imported into JRiver. I use it now for Movies and TV Shows I rip and it works well at identifying the program/movie and downloading the data (then I do some manual tagging in MC). If JRiver chooses a format that MM provides and provides some basic scripting to take "that" meta-data and populate the correct MC fields, I think it would help alot of MC users.

Here's the MM product descrption:

Quote

Collection Management
Introduction

The My Movies Collection Management product lets you maintain and manage your entire movie collection, letting you add your movies with high-quality meta-data from our online service. You can either manage titles that you have stored on your computer, or home storage device, or you can manage titles you have on the shelf in your home, to bring you in control of which movies you have, including all the details about each movie.

You can also use the My Movies Collection Management product to store our high-quality movie meta-data for other products such as Sage TV, XBMX, Windows Media Center's built-in movie library and many other products.

newsposter · « **Reply #32 on:** September 08, 2011, 12:14:29 pm »

Metadata is really the holy grail of our machines here.

It's not the ripping, or the recording, or the playback. It's the human-readable info that tells us what programming we have stored in ways that are meaningful to us.

Much like the value of a conventional public library is not wholly in the books; it's in the cataloging system and the librarians that keep everything checked in/out and re-shelved as expected.

Google learned this long ago with their search engines; it's not the web content itself that holds the value. The value at google is in the search results.

Same thing at the IMDB. The raw data takes a back seat to the clickable interrelationships generated.

It seems as though we've been adding playback bells/whistles to MC for quite a while now. I've expressed the opinion before that it might be time to take a break and concentrate on the metadata/tagging and cataloging aspects of the system.

Crowd-sourcing the metadata through YADB is fine, it appears to work well, yada yada. But cross-referencing the data needs to start happening. And how do we crowd-source that?

There are several layers of complexity in MC. The ripping/input/playback engines run at one level, the management of the raw content/library data (vids, audio, pics, etc) runs at another level, and the metadata/tagging runs at a third level. Is it time to consider splitting off the metadata/tagging 'engine' from the rest of MC?

rick.ca · « **Reply #33 on:** September 08, 2011, 03:15:01 pm »

Quote

Crowd-sourcing the metadata through YADB is fine, it appears to work well, yada yada. But cross-referencing the data needs to start happening. And how do we crowd-source that?

I agree, but only by interpreting "cross-referencing" loosely. Clearly, just grabbing data from various sources without regard for what it means or where it comes from is a wasted effort. This is why I advocate a system powerful enough we can be selective of the data we incorporate into our database and how to do so. The value of any particular element of data depends on the standards that define it and the quality control applied in maintaining it. If it's of value to us, we can incorporate it into our data base—with suitable confidence we understand exactly what it means, that the data is consistent, etc. This is extremely important for large collections. When the data element is one of a hundred in a database of many thousands of records, manually reviewing and editing the data is generally not feasible. Consistent reliable data, however, can be used effectively to categorize and associate records—thereby "cross-referencing" them.

An example to illustrate the idea in practise...Director is a data element we would expect to be the same regardless of the source, and we want and need only one field to record it. To avoid problems like alternate spellings of name, we would prefer it come from just one source (with standards ensuring the same name is consistently used for the same person). In the event a record does not exist at this primary source, but does at a secondary source, we would want to get the data from there. In other words, we would get Director from the secondary site, but not overwrite data obtained from the primary source.

A fundamentally different example...A movie Description is something that varies widely between sources, and even at the same source when user-contributed. Any attempt to save such data from different sources in the same field will only result in something even more inconsistent. The simple answer, of course, it to record the data from each source in a different field. Then the decision as to how to use that data can be made in configuring a view. The briefest one might be shown in a summary view, and all of them in a detailed view. Or an expression might be used to pick the most appropriate one. So in configuring a source in this system, we need the ability to specify a custom field (e.g., [Description.TMDb]).

Putting these two ideas together, this is the sort of thought process one would go through in configuring such a system...The IMDb is clearly the most authoritative and complete source for recording the association of people to movies. It should be the primary source not just for Director, but Producers, Writers, Actors, etc. As for other factual information, it may or may not be the best source. It's Description (or Synopsis), however, is user-contributed and therefore inconsistent and often "unprofessional." Another source like AllRovi has descriptions written by a handful of professional writer-reviewers subject to some kind of editorial overview. So the system will be configured to get people from IMDb and [Description] (or [Description.Rovi] if there are to be alternatives) from AllRovi.

Quote

It is time to split off the metadata/tagging 'engine' from the rest of MC?

I understand how meta data can be conceptualized a third "level," but what do you mean by this? Why would it be necessary to separate anything?

INTERACT FORUM

Author Topic: Wish for an Integrated Automatic Meta Data System (Read 9842 times)