Hi there, my name is Robby Macdonell, I'm a user experience designer and sometimes front-end developer. I live in Seattle, WA but grew up in The South (and spent time in Alaska and Massachussetts in between).

Getting data for a visualization design with Nokogiri, mysql, and R

I recently did some analysis of my Last.fm listening history to figure out what my ideal rainy fall mix would be (see final list here). It was fun, and I learned a couple cool things in the process.

Step one, Get the data:

First things first, I needed to get a list of all the songs I've ever scrobbled on Last.fm, and some historical weather data to mash it up with.

Last.fm gives you some fantastic data feeds, but there's no way to export your entire listening history. And for all the weather sites out there, I couldn't track down a single one that would give you historical data in a structured format (like JSON or even CSV).

However, both Last.fm and Weather Underground had an HTML interface for browsing the data I wanted, page-by-page, in small pieces.

The thought of writing a scraper for these sites seemed pretty daunting, but it actually turned out to be REALLY easy. Using this post as a guide, I wrote a couple of small Ruby scripts using Nokogiri and FasterCSV to scrape the pages and save the relevant data to a .csv file.

disclaimer: this code isn't pretty, by any means, but it was easy and it worked.

See the scraper for Last.fm

See the scraper for Weather Underground

Once I had those written, I could just do this in an IRB sesion:

> require 'LastFM'
> require 'WeatherData'
> LastFM.get_tracks
> WeatherData.get_weather

All in all, it took me about 45 minutes to wrap my head around Nokogiri and hack together something that got the data I wanted from both sites. Not too bad. Way easier than I expected it to be, anyway.

STEP TWO: Get rid of the sunny good-weather stuff

Now, I needed to filter it, so I dumped both csv files into mysql and ran this query. I think it's pretty self explanatory:

SELECT s.artist, s.song, w.conditions FROM songs s JOIN weather w ON w.date = s.date WHERE w.conditions IN('Rain','Partly Cloudy','Scattered Clouds','Snow','Overcast','Fog','Mostly Cloudy') AND MONTH(s.date) IN(9,10,11,12)')

Now, I've got over 9,000 songs that I listened to between September and December, when it was overcast, raining, or snowing.

STEP THREE: Visualize

I could've done all the analysis with mysql queries, but I know this will probably turn into some sort of visual exploration on the mixtape jacket design, so I exported the results to a CSV and brought them into R.

From there, it was pretty easy to get to what I wanted, the top artists I listed to on rainy days.

fall_song_data = read.table('~/fall-rainy-songs.csv', header= TRUE, sep="\t")
fall_artist_counts <- table(fall_song_data$artist)
pie(fall_artist_counts[1:30])

The first pass at visualizing my fall bands

Not super pretty at this point, but interesting. Also, relieving. I can actually make, in my opinion, a pretty damn good fall mix out of these bands.

There were all sorts of other neat facts I was able to pull out of this data, but this gives a general idea at how easy it was to get the data and do something with it.