A few days ago, I wrote a bit about my IMDb Ratings and watching patterns, so I decided to write a bit to show how I got the data and how I manipulated it.
1. Getting Data from IMDb
This is the easiest part, when you’re logged in and you’re rating a movie, it automatically gets added to a list. Click on your name in the top right, then “Your Ratings”.
A download should start, with a file named ratings.csv that should look like this:
The structure is fairly simple, and we make a note of how information is structured
2. Parsing the IMDb data
I’ve decided to do the parsing using PHP, as it’s the language I’m most familiar with. Maybe you can use Python, go, or whatever else you know. Experiment, have fun!
First we need to initialize our variables, to have our code error-free. We’ll declare the “day of the week” variable a bit more explicitly, because otherwise the days will be scrambled and the graph will be a bit harder to order. Put Sunday at the beginning or at the end of the week, depending on your country’s calendar custom.
$csv=array_map('str_getcsv',file('path/to/ratings.csv'));foreach($csvas$data){if($index>0){// we're skipping the first line, the CSV header
$my_ratings[$data['1']]++;$imdb_ratings[$data['6']]++;$types[$data['5']]++;$totalruntime=$totalruntime+$data['7'];$rating_dotw=date('l',strtotime($data['2']));$rated_dotw[$rating_dotw]++;$rating_year=date('Y',strtotime($data['2']));$rated_year[$rating_year]++;$rating_month=date('n',strtotime($data['2']));$rated_month[$rating_month]++;$rating_day=date('j',strtotime($data['2']));$rated_day[$rating_day]++;$release_year=date('Y',strtotime($data['11']));$release_years[$release_year]++;$genres=explode(', ',$data['9']);// genres come as an array
foreach($genresas$genre){$allgenres[$genre]++;}if($data['12']!=''){// directors come as an array too
$directors=explode(', ',$data['12']);foreach($directorsas$director){$alldirectors[$director]++;}}}$index++;// increase the index, so we'll know later how many items we have
}}
Then, we just need to do a bit of clean up, so we’re sorting the arrays by key (day, value, etc), and we’re re-iterating over the directors array, to keep only the ones with at least 5 movies (the list itself was pretty big). Note that we don’t need to sort the $rating_dotw as we already ordered it when we declared it.
We then add all the arrays to a single variable, and we’re outputting it as a json_encoded string, that we can use further in the website. We also reverse the $famousdirectors array, so we put the most prolific ones at the top.
We’re outputting the result like that, because it’s very easy to script it and add it to your hugo (or whatever site) build system, and run it with something like
1
php imdb-data.php > static/imdb-ratings.json
The resulted file, in my case, can be found at /imdb-ratings.json. If you want, you can make it readable using JSONLint
3. Displaying the data
In order to display the data, I’ve used the same code that I’m using for my site statistics page.
In the markdown file, I use the shortcode
1
{{<graphgraph_name>}}
In the shortcodes folder, I have a file named “graph.html” which contains:
1
2
3
4
5
{{- if .Get "id" -}}
<divclass="graph"><canvasid={{.Get"id"}}></canvas></div>{{- else -}}
<divclass="graph"><canvasid={{.Get0}}></canvas></div>{{- end -}}
The magic happens though in the stats_imdb.js file, which I won’t include in the article, because you can see it uncompressed at /js/stats_imdb.js. Please keep in mind, that it’s using the ChartJS plugin from chartjs.org, and it’s using a fairly old version of the library, 2.8.0, in my case, because it does its job for now.
βAny fool can write code that a computer can understand. Good programmers write code that humans can understand.β β Martin Fowler
Webmentions allow conversations across the web, based on a web standard. They are a powerful building block for the decentralized social web.
If you write something on your own site that links to this post, you can send me a Webmention by putting your article's URL in the form and it will appear in a list, below.