 Overview In the last decade, the size of digital music collections has increased dramatically. The capacity of MP3 Players has increased from a dozen songs to 40,000 songs or more. Online music stores offer millions of songs for sale at a dollar per song. Digital music subscription services offer unlimited access to millions of songs for a few dollars per month. Even though the size of music collections has grown, the tools offered to music consumers to find music have not changed much. Music consumers still browse by music genre or search for music by artist, album or song title just as they used to do in a record store. As music collections get larger it is getting a lot harder for people to find music, especially new music that they will like, using these primitive search tools. The goal of the 'Search Inside the Music' project is to explore new methods of analyzing, categorizing, indexing and organizing large collections of music to allow us to build more effective tools to explore, discover and recommend music. This project extends music search to search 'inside the music', that is, to search not just titles, keywords and artists, but to search and recommend music by music content and context. We want to help people find and organize their music based on all of the properties of the music including such properties as acoustic similarity, mood, lyrics, musical theme, melody, tempo, rhythm, and instrumentation. We are currently focusing on two areas: using social data to recommend and organize music based upon the listening habits of people with similar musical tastes and using signal processing and machine learning models to 'autotag' new or unpopular music. Mapping Audio onto Words One major goal of SITM is to build a machine learning model which can generate useful, descriptive words (aka 'autotags') by "listening" to audio. The resulting word set can be used to measure similarity among songs and artists. Moreover, the words can be mixed with other descriptive word sets such as those garnered from social websites like Last.fm. As an example consider an old version of a song Doug Eck wrote 15 years ago, Keep the Change Keep the Change (mp3). (We use Doug's music because the only time anyone listens to any of his music is when we link to it in scientific demos. Also his music was not used to build our models. Thus it's a fair test of the model.) Here are the most relevant results for genre and emotion ("emotion" here construed loosely): | Genre words | Emotion words | | 1 bluegrass | 1 beautiful | | 2 irish | 2 sad | | 3 slowcore | 3 gentle | | 4 indie pop | 4 melancholy | | 5 alt-country | 5 sexy | | 6 americana | 6 relaxing | Are these good? It's hard to say. Certainly we can find bad words a bit further down the list ("female vocalist" for example ) but overall we are pleased to see these words come out of our models. The paper Automatic generation of social tags for music recommendation has technical details on how these models work. Here's a one paragraph summary: We assign each word in our vocabulary its own machine learning model, specifically a large-margin ensemble learner (AdaBoost or FilterBoost) which maps features from 5-second segments of audio onto a label such as americana. During training we select positive examples for a given word from our audio database. Positive examples are decided via data mining results. In other words, we use the words from social taggers to generate our machine learning training sets. Specifically, the positive examples for say, the word americana is drawn from the songs most labeled americana by Last.fm users. We select the audio for these songs from our lab database and train a classifier to find the relevant features in the audio necessary to "hear" whether a song is americana or not. Our training is done over 5-second segments of audio. To label an entire song, we take the average prediction over all 5 second segments in the song for a particular word. By training lots of individual word models we are able to build a set of relevant words for a song, album or artist. To summarize: a set of non-linear classifiers takes audio as input and generates relevant words as output. Below are the features used to train the model. Very broadly speaking they are features sensitive to rhythm and meter (autocorrelation; top), pitch (spectrum; middle), and musical timbre / instrumentation (cepstrum; bottom). Autotag similarity We use the term "autotag" to describe one of the words generated by our machine learning models. Once we have generated a set of autotags for a music collection, we can use them as a means to measure song, album and artist similarity. Because autotags are words like any other word, we can easily blend autotags with evidence from other sources of data such as Last.fm, wikipedia, etc. Here we offer a few examples of what it sounds like to navigate the resulting space of similar artists. We applied a dimensionality reduction technique called Isomap to create a nearest neighbor graph of artists. We could then find the shortest path joining two artists. We sampled 5-second segments from songs by these artists and chained them together in an Mp3 file. These demos are a bit rough because we sampled 5 seconds randomly from a random song selected from each artist. There are many better ways to do this by smartly sampling the song and also smartly choosing the 5 seconds. Click on graphics for larger versions.  Beethoven to The Prodigy (mp3)  Coltrane to System of a Down (mp3)  Mozart to Nirvana (mp3) Visualization Another goal of Search Inside the Music is to create new ways to help people explore and discover new music. In particular we have explored using interactive 3D visualizations of a music similarity space to allow a music listener to explore their music collection, to receive recommendations for new music, to generate interesting and coherent playlists and to interact with the album artwork of a music collection. The resulting user interface is arguably more engaging and enjoyable to use than currently available interfaces. More details can be found in the paper Using 3D Visualizations to Explore and Discover Music. People and Places Search Inside the Music is a project of Sun Labs, Burlington, MA. Current Team: - Paul Lamere is a Senior Staff Engineer at Sun Labs, Burlington, MA and is the Principal Investigator of the project.
- Douglas Eck was a Visiting Professor at Sun Labs in 2007 and has since returned to University of Montreal. He remains involved in the project on the level of machine learning algorithm development.
- Francois Maillet is an intern at Sun Labs in Summer 2008 and currently a Master's student at University of Montreal with Douglas Eck.
We collaborate closely with the Advanced Search Technology group in Sun Labs that includes: - Steve Green, Senior Staff Engineer at Sun Labs, Burlington, MA and Principal Investigator of the Advanced Search Technology group.
- Jeff Alexander, Sun Labs, Burlington, MA and member of the Advanced Search Technology group.
Alumni: - Sten Anderson, an independent contractor, contributed to the 3D visualizations of the music space.
- Thierry Bertin-Mahieux an intern at Sun Labs in Summer 2007 and about to finish his Master's at University of Montreal with Douglas Eck.
- Rebecca Fiebrink an intern at Sun Labs in 2006 and now a Ph.D student at Princeton with Perry Cook.
- Kris West an intern at Sun Labs in 2005.
|