INTERACT FORUM

More => Old Versions => Media Center 15 (Development Ended) => Topic started by: glynor on March 08, 2011, 10:25:54 am

Title: Free 1 Million Song Data Set
Post by: glynor on March 08, 2011, 10:25:54 am
You guys might just want to grab this (http://blog.echonest.com/post/3639160982/million-song-dataset), just in case it could come in handy:

Quote
For far too long, researchers and engineers working on Music Information Retrieval (MIR) have been forced to pay a hefty ante before being able to conduct their research: namely, they’ve had to build a set of data on which test their theories and hone their algorithms.

It may have started as a flippant suggestion for how to solve that problem, but The Million Song Dataset is now real, and anyone can download it. A collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio), The Million Song Dataset has four main objectives:

    * To encourage research on algorithms that scale to commercial sizes
    * To provide a reference dataset for evaluating research
    * As a shortcut alternative to creating a large dataset with The Echo Nest’s API
    * To help new researchers get started in the MIR field.

The Million Song Dataset offers researchers, engineers and commercial developers detailed sonic and cultural attributes for each song, as well as extensive metadata, both provided by The Echo Nest.
Title: Re: Free 1 Million Song Data Set
Post by: Alex B on March 08, 2011, 11:38:09 am
Its size is 280 GB! ...or 180 GB if you calculate from the 1.8 GB / 1% / 10,000 random songs sample file. The latter is said to be compressed (odd that such loose data doesn't compress more).

MC should be able to store a library of 1,000,000 extensively tagged songs in about 150 MB (based on 15 MB per 100,000 files, which is about correct for me. In my experience a zipped library backup file for a 15 MB database is about 6 MB. The complete 1 million song db should fit in a 60 MB delivery package.)
Title: Re: Free 1 Million Song Data Set
Post by: glynor on March 08, 2011, 12:45:11 pm
I just don't understand why they aren't distributing it via bittorrent.
Title: Re: Free 1 Million Song Data Set
Post by: tunetyme on March 08, 2011, 01:27:14 pm
It seems to me that this is worth having for the metadata alone.  I've jumped on it.  So much for having just 37,000+ songs.  I see this as a great opportunity to be able to preview music before buying.  I still plan to stick to lossless.
Title: Re: Free 1 Million Song Data Set
Post by: JustinChase on March 09, 2011, 06:13:01 pm
It seems to me that this is worth having for the metadata alone.  I've jumped on it.  So much for having just 37,000+ songs.  I see this as a great opportunity to be able to preview music before buying.  I still plan to stick to lossless.

Does this actually include the songs?  it sounds like it only includes the metadata, and links to 30 second samples.

I could see JRiver using this to benefit their search and track lookup processes, but other than that, I don't see a good use for the "average" user.

however, I could certainly just be missing it :)
Title: Re: Free 1 Million Song Data Set
Post by: glynor on March 09, 2011, 08:27:43 pm
Does this actually include the songs?  it sounds like it only includes the metadata, and links to 30 second samples.

I could see JRiver using this to benefit their search and track lookup processes, but other than that, I don't see a good use for the "average" user.

however, I could certainly just be missing it :)

Nope.  You got it.  It is a developer tool.
Title: Re: Free 1 Million Song Data Set
Post by: JustinChase on March 09, 2011, 11:07:28 pm
Thanks for clarifying, it seems awfully big for metadata.

I wonder if/hope J River can use it to augment YADB and/or music fingerprinting/ID.

Title: Re: Free 1 Million Song Data Set
Post by: tunetyme on March 11, 2011, 08:28:01 am
Thanks for clarifying. 

I believe that it will help identify new music that I may not be familiar with.  I haven't opened it yet but I am looking forward to seeing what they have developed.  I agree, it would be great if this was available through JRiver lookup processes.

Tunetyme
Title: Re: Free 1 Million Song Data Set
Post by: JimH on March 11, 2011, 08:37:34 am
I believe that it will help identify new music that I may not be familiar with.  I haven't opened it yet but I am looking forward to seeing what they have developed.  I agree, it would be great if this was available through JRiver lookup processes.
The Performer Store inside MC has about 8 million tracks.  It's free to play the samples.

If you find something you like, you can click on the $ sign to buy a high quality MP3 track or even a CD from Amazon.