未加星标

Building Spotify's Song Radio in 100 lines of Python

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Somebody said that when it comes to deep learning, it is still very early days, Like 1995 (I was going to lookup the quote, but I can't find - it was probably Churchill or Mark Twain ). I disagree. The early days are gone, it is more like 1958. Fortran has just been invented . The early days of having to implement the mechanics of neural networks is akin to writing machine code in the Fifties. Platforms like Tensorflow, Theano and Keras let us focus on what we can do, rather than how.


Building Spotify's Song Radio in 100 lines of Python
Marconi the inventor ( from Wikipedia )
Marconi is a demonstration of this. It shows how to build a clone of Spotify's Song R adio in a less than a hundred of python using open source libraries and data easily scraped from the Internet. If you include the code that scrapes and preprocesses the data, it is almost 400 lines, but still.

Pandora was the first company to successfully launch a product that produced playlists based on a song. It did this by employing humans who would listen to songs and characterize each on 450 musical dimensions . It worked well. Once you have all songs mapped into this multi-dimensional space, finding similar songs is a mere matter of finding songs that occupy a similar position in this space.

Employing humans is expensive though (even underpaid musicians). So how do we automatically map songs into a multi-dimensional space? These days the answer has to be Machine Learning. Now you could build some sort of model that really understands music. Probably. It sounds really hard though. Most likely you'd need more than a hundred if not more than a thousand lines of code.

Marconi doesn't know anything about the actual music. It is trained on playlists. The underlying notion here is that similar songs will occur near each other in playlists. When you are building a playlist, you do this based on the mood you are in, or maybe the mood you want to create.

A good analogy here is Word2Vec . Word2Vec learns word similarities by feeding it sentences. After a lot of sentences, it knows that "coffee" and "espresso" are similar, because it notices that in a sentence where you use the one, you might as well expect the other. It even learns deeper relationships between words, for example that the words "king" and "man" have the same relationship as the words "queen" and "man" .

Fascinating stuff. Usefully, the Python library GenSim contains a great implementation of Word2Vec . So if we feed this playlists containing song ids, rather than sentences containing words, it will after a while learn relationships between songs. Suggesting a playlist based on a song becomes than again a straightforward nearest neighbor search.

The trick is collecting the data. How do we get our hands on a large set of playlists? To their great credit, Spotify has a wonderful API that lets you get info on songs, artists and playlists. It does not, however, grant access to a dump of (public) playlists, which would be the ideal input for this project.

The workaround I use, is to search for words in the title of playlists . We start with the word 'a'. This will return a thousand playlists containing the word 'a'. We store those. Then we count for all playlists scraped so far, how often we see any of the words in the titles of those playlists. We pick the word that appears most often that we haven't searched for. Rinse and repeat. So after 'a', you'll see 'the', 'of' etc. After a while 'greatest' and 'hits' appear.

It works and quickly returns a largish set of public playlists. It's not ideal in that it is hardly a natural way to sample playlists. For example, by searching for the most popular words, chances are you'll get the most popular playlists. The playlists returned also seemed very long (hundreds of songs), but maybe that's normal.

Next up: get the tracks that are actually in those playlists. Thanks to Spotify's API, that's quite simple. Just keep calling:

res = session.user_playlist_tracks(owner_id, playlist_id,

fields='items(track(id, name, artists(name, id), duration_ms)),next')

There's a bunch of parsing, boiler plate and handling timeouts etc to make it work in practice, but it's all fairly straight-forward.

Once we have the training data, building the model is also quite easy . I throw out playlists that are too long or that have only songs from one artist and the model is trained with a lowish number of dimensions. To make this accessible online, I import the song vectors into postgres. Recommending music then becomes as simple as this sql statement:

SELECT song_id, song_name, artist_name,

cube_distance(cube(vec), cube(%(mid_vec)s)) as distance

FROM song2vec ORDER BY distance

LIMIT 10;

Where mid_vec is the vector representing the song that was used as an input, or the middle of a set of vectors if multiple songs were provided.

How does it perform? Well, you can tryfor yourself, but I think it works pretty well !

There's a lot of room for more experiments here, I think. Building an artists only model would be a simple extension. Looking at the meta information of the songs and using it to build classifiers might also be interesting. Even more interesting would be to look at the actual music and see if there are features in the wave patterns that we can map onto the song vectors.

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

主题: PythonWord
分页:12
转载请注明
本文标题:Building Spotify's Song Radio in 100 lines of Python
本站链接:http://www.codesec.net/view/522553.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(20)