A few days ago, I wrote a bit about my IMDb Ratings and watching patterns, so I decided to write a bit to show how I got the data and how I manipulated it.

1. Getting Data from IMDb

This is the easiest part, when you’re logged in and you’re rating a movie, it automatically gets added to a list. Click on your name in the top right, then “Your Ratings”.

A download should start, with a file named ratings.csv that should look like this:

1
2
3
4
5
Const,Your Rating,Date Rated,Title,URL,Title Type,IMDb Rating,Runtime (mins),Year,Genres,Num Votes,Release Date,Directors
tt1001526,10,2014-07-22,Megamind,https://www.imdb.com/title/tt1001526/,movie,7.3,95,2010,"Animation, Action, Comedy, Crime, Family, Mystery, Sci-Fi, Thriller",247221,2010-10-28,Tom McGrath
tt0100405,8,2014-06-24,Pretty Woman,https://www.imdb.com/title/tt0100405/,movie,7.1,119,1990,"Comedy, Romance",310915,1990-03-23,Garry Marshall
tt0100502,7,2014-04-21,RoboCop 2,https://www.imdb.com/title/tt0100502/,movie,5.8,117,1990,"Action, Crime, Sci-Fi, Thriller",84869,1990-06-22,Irvin Kershner
.....................................

The structure is fairly simple, and we make a note of how information is structured

2. Parsing the IMDb data

I’ve decided to do the parsing using PHP, as it’s the language I’m most familiar with. Maybe you can use Python, go, or whatever else you know. Experiment, have fun!

First we need to initialize our variables, to have our code error-free. We’ll declare the “day of the week” variable a bit more explicitly, because otherwise the days will be scrambled and the graph will be a bit harder to order. Put Sunday at the beginning or at the end of the week, depending on your country’s calendar custom.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$index = 0;
$allmoviesdata = array();
$rated_dotw = array(
	'Monday' => '',
	'Tuesday' => '',
	'Wednesday' => '',
	'Thursday' => '',
	'Friday' => '',
	'Saturday' => '',
	'Sunday' => '',
);
$rated_year = array();
$rated_month = array();
$rated_day = array();
$my_ratings = array();
$imdb_ratings = array();
$release_years = array();
$types = array();
$totalruntime = 0;
$allgenres = array();
$alldirectors = array();

Then we do the magic stuff, which is simply parsing the CSV and incrementing stuff here and there, where needed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$csv = array_map('str_getcsv', file('path/to/ratings.csv'));
foreach ($csv as $data) {
	if($index > 0) { // we're skipping the first line, the CSV header
		$my_ratings[$data['1']]++;
		$imdb_ratings[$data['6']]++;
		$types[$data['5']]++;
		$totalruntime = $totalruntime+$data['7'];
		$rating_dotw = date('l', strtotime($data['2']));
		$rated_dotw[$rating_dotw]++;
			
		$rating_year = date('Y', strtotime($data['2']));
		$rated_year[$rating_year]++;

		$rating_month = date('n', strtotime($data['2']));
		$rated_month[$rating_month]++;

		$rating_day = date('j', strtotime($data['2']));
		$rated_day[$rating_day]++;

		$release_year = date('Y', strtotime($data['11']));
		$release_years[$release_year]++;

		$genres = explode(', ', $data['9']); // genres come as an array
		foreach($genres as $genre) {
			$allgenres[$genre]++;
		}

		if($data['12'] != '') { // directors come as an array too
			$directors = explode(', ', $data['12']);
				foreach($directors as $director) {
					$alldirectors[$director]++;
				}
			}
		}
		$index++; // increase the index, so we'll know later how many items we have
	}
}

Then, we just need to do a bit of clean up, so we’re sorting the arrays by key (day, value, etc), and we’re re-iterating over the directors array, to keep only the ones with at least 5 movies (the list itself was pretty big). Note that we don’t need to sort the $rating_dotw as we already ordered it when we declared it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ksort($imdb_ratings);
ksort($my_ratings);
ksort($release_years);
ksort($rated_year);
ksort($rated_month);
ksort($rated_day);
ksort($types);
ksort($allgenres);
asort($alldirectors);
foreach($alldirectors as $director=>$count) {
	if($count > 4) {
		$famousdirectors[$director] = $count; 
	}
}

We then add all the arrays to a single variable, and we’re outputting it as a json_encoded string, that we can use further in the website. We also reverse the $famousdirectors array, so we put the most prolific ones at the top.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$alldata = array();
$alldata['alldirectors'] = array_reverse($famousdirectors);
$alldata['allgenres'] = $allgenres;
$alldata['types'] = $types;
$alldata['rated_dotw'] = $rated_dotw;
$alldata['rated_year'] = $rated_year;
$alldata['rated_month'] = $rated_month;
$alldata['rated_day'] = $rated_day;
$alldata['release_years'] = $release_years;
$alldata['imdb_ratings'] = $imdb_ratings;
$alldata['my_ratings'] = $my_ratings;
$alldata['totalruntime'] = $totalruntime;
$alldata['counttitles'] = $index;

echo json_encode(utf8ize($alldata));

We’re outputting the result like that, because it’s very easy to script it and add it to your hugo (or whatever site) build system, and run it with something like

1
php imdb-data.php > static/imdb-ratings.json

The resulted file, in my case, can be found at /imdb-ratings.json. If you want, you can make it readable using JSONLint

3. Displaying the data

In order to display the data, I’ve used the same code that I’m using for my site statistics page.

In the markdown file, I use the shortcode

1
{{< graph graph_name >}}

In the shortcodes folder, I have a file named “graph.html” which contains:

1
2
3
4
5
{{- if .Get "id" -}}
<div class="graph"><canvas id={{ .Get "id" }}></canvas></div>
{{- else -}}
<div class="graph"><canvas id={{ .Get 0 }}></canvas></div>
{{- end -}}

The magic happens though in the stats_imdb.js file, which I won’t include in the article, because you can see it uncompressed at /js/stats_imdb.js. Please keep in mind, that it’s using the ChartJS plugin from chartjs.org, and it’s using a fairly old version of the library, 2.8.0, in my case, because it does its job for now.