INTERACT FORUM
More => Old Versions => Media Center 15 (Development Ended) => Topic started by: glynor on March 08, 2011, 10:25:54 am
-
You guys might just want to grab this (http://blog.echonest.com/post/3639160982/million-song-dataset), just in case it could come in handy:
For far too long, researchers and engineers working on Music Information Retrieval (MIR) have been forced to pay a hefty ante before being able to conduct their research: namely, they’ve had to build a set of data on which test their theories and hone their algorithms.
It may have started as a flippant suggestion for how to solve that problem, but The Million Song Dataset is now real, and anyone can download it. A collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio), The Million Song Dataset has four main objectives:
* To encourage research on algorithms that scale to commercial sizes
* To provide a reference dataset for evaluating research
* As a shortcut alternative to creating a large dataset with The Echo Nest’s API
* To help new researchers get started in the MIR field.
The Million Song Dataset offers researchers, engineers and commercial developers detailed sonic and cultural attributes for each song, as well as extensive metadata, both provided by The Echo Nest.
-
Its size is 280 GB! ...or 180 GB if you calculate from the 1.8 GB / 1% / 10,000 random songs sample file. The latter is said to be compressed (odd that such loose data doesn't compress more).
MC should be able to store a library of 1,000,000 extensively tagged songs in about 150 MB (based on 15 MB per 100,000 files, which is about correct for me. In my experience a zipped library backup file for a 15 MB database is about 6 MB. The complete 1 million song db should fit in a 60 MB delivery package.)
-
I just don't understand why they aren't distributing it via bittorrent.
-
It seems to me that this is worth having for the metadata alone. I've jumped on it. So much for having just 37,000+ songs. I see this as a great opportunity to be able to preview music before buying. I still plan to stick to lossless.
-
It seems to me that this is worth having for the metadata alone. I've jumped on it. So much for having just 37,000+ songs. I see this as a great opportunity to be able to preview music before buying. I still plan to stick to lossless.
Does this actually include the songs? it sounds like it only includes the metadata, and links to 30 second samples.
I could see JRiver using this to benefit their search and track lookup processes, but other than that, I don't see a good use for the "average" user.
however, I could certainly just be missing it :)
-
Does this actually include the songs? it sounds like it only includes the metadata, and links to 30 second samples.
I could see JRiver using this to benefit their search and track lookup processes, but other than that, I don't see a good use for the "average" user.
however, I could certainly just be missing it :)
Nope. You got it. It is a developer tool.
-
Thanks for clarifying, it seems awfully big for metadata.
I wonder if/hope J River can use it to augment YADB and/or music fingerprinting/ID.
-
Thanks for clarifying.
I believe that it will help identify new music that I may not be familiar with. I haven't opened it yet but I am looking forward to seeing what they have developed. I agree, it would be great if this was available through JRiver lookup processes.
Tunetyme
-
I believe that it will help identify new music that I may not be familiar with. I haven't opened it yet but I am looking forward to seeing what they have developed. I agree, it would be great if this was available through JRiver lookup processes.
The Performer Store inside MC has about 8 million tracks. It's free to play the samples.
If you find something you like, you can click on the $ sign to buy a high quality MP3 track or even a CD from Amazon.