Disclaimer

This is NOT advertising for gambling, it’s actually the opposite. That’s why you won’t find links towards lottery websites, only some screenshots here and there.

Don’t fall into the scam that lottery is! This is not a scientific research on how the lottery works, but will show some statistic patterns that occur in real life lottery as well.

If you decide to play, never play money you can’t afford to lose.

Because I got overwhelmed with news about big lotto Jackpots and reminds me how they usually pump advertising around the holidays for people to play, as the game is usually rigged for someone to win. So I decided to make a Lotto simulator in PHP, to show how many winners there would be.

The type of the game I chose is 6 numbers out of 49, as this is the format currently played in Romania and it’s the one most of my readers are familiar with. The Romanian National Lottery company also plays a 5 out of 40 version, but it’s not as popular.

1
2
$max_number			= 49;
$numbers_on_ticket 	= 6;

The code part was pretty simple, I created a function is a simple random array_pop, a function that randomly plucks numbers from a series, and it takes as parameters how many numbers to pick (in our case 6) from the total $pool, in our case 49.

1
2
3
4
5
6
7
8
9
function pick_numbers($pool, $qty) {
	$numbers = array();
	shuffle($pool);
	for($i = 1; $i <= $qty;  $i++) {
		$picked = array_pop($pool);
		$numbers[] = $picked;
	}
	return $numbers;
}

This is used both for picking the winning combination

1
$winning_combination = pick_numbers($max_number, $numbers_on_ticket);

and for the randomly played tickets

1
2
3
for($i = 1; $i <= $number_of_players;  $i++) {
	$registered_tickets[] = pick_numbers($lottery_numbers, $ticket_limit);
}

Additionally, for an extra layer of realism, I could get a played number distribution, but this is nothing that I’ve reliably found, because definitely some numbers are more played (3, 7, 12, 33) than some others (4, 13, 17). And this also changes a lot from culture to culture. 4, 9, 43 are considered unlucky by the Chinese, 17 by Italians, 13 by many cultures, and so on. And there are also the crazy people who play unlucky numbers, hoping they’ll be lucky for them.

What could go wrong?

Then it’s a simple task of checking each played ticket against the winning numbers and see how many numbers were guessed on them. For the sake of brevity, I usually show only tickets with 4 numbers guessed or more because they represent a small amount of the winning pool, and also in Romanian Lottery tickets with 4, 5 and 6 numbers give significantly bigger prizes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$all_played_numbers = array();
$winners = array();
foreach ($registered_tickets as $ticket) {
	$index++;
	foreach($ticket as $number) {
		$all_played_numbers[] = $number;
	}
	$guessed_numbers = array_intersect($ticket, $winning_combination);
	$winners[count($guessed_numbers)] = $winners[count($guessed_numbers)]+1;
	if(count($guessed_numbers) > 3) { // show only tickets with 4 numbers and up
		echo "#$index: " . implode(', ', $ticket) . " guessed " . count($guessed_numbers) . ' numbers.<br>';
		$fourlucky++;
	}
}

After that, it’s just a matter of statistics. I take the winners and sort them by numbers guessed and print them in a pretty table.

1
2
3
4
krsort($winners);
foreach($winners as $key=>$value){
	echo "<tr><th>$key numbers</th><td>$value</td></tr>";
}
Numbers FoundWinning Tickets
6 numbers2
5 numbers188
4 numbers9697
3 numbers176033
2 numbers1323360
1 numbers4130250
0 numbers4360470

I’ve also added a table to show the occurence of numbers that were played, but the graph here is pretty flat, as given enough random picks, the distribution will be fairly equal (in a 10 million tickets run, every number was picked 1.22 Million times, with less than 5.000 picks difference between the least picked and most picked).

While doing some research on the lottery’s page, I found out that they’ve shown the lucky numbers distribution since 2004. Maybe one day I’ll skew the random picking of the lucky numbers using their weights.

Also, at the end of the page, I’m showing how fast it took to generate the page and how many tickets were generated.


You can test this yourself on a batch of 100.000 tickets played by following this link, or you can follow the links below for already generated simulations using 1 Million and 10 Million tickets. I’ve had to cache these, as the generation time is pretty big (running at about 2 million tickets per minute on my Apple M2 Max).

One Million Tickets (82-90KB each) - Run 1, Run 2, Run 3

Ten Million Tickets (800-900KB each) - Run 1, Run 2, Run 3, Run 4, Run 5

In the 5 simulations I’ve run on 10 million played tickets, the six numbers were guessed only twice, during the same run, so that means the grand prize was split. The other runs did not hit the jackpot out of 10 million tickets. This is somehow expected considering that Loteria Nationala reported 80 jackpots since January 1st 2005, that means 80 in 963 weeks (8.5%).

In the same period, only four times out of the 963 runs there was a split pot where two tickets guessed all the six numbers. There was also the rare case on 10.10.2010 where the jackpot was split three ways.

In the end, I’ve managed to get a few other observations, which I’ll list below, without a particular order:

  • More than 98% of tickets played are not winning (0, 1 or 2 guessed number).
  • About 1.7% of tickets played are of the lowest winning category (3 guessed numbers).
  • On the same sample size of 10 Million tickets, four noumbers are guessed always less than 10.000 times, so that means 1 in 1000 tickets gets a decent prize (what “decent” means will be discussed below).
  • Using the same dataset, 5 numbers were guessed always less than 200, so that means 0.02% of players actually get something more consistent.
  • If a medium skilled developer with basic statistics knowledge could come up in half an hour of coding with a system to know which combination was not played, imagine what a multi-billion company that works with money can do. I’m not saying that the game is rigged, I’m just implying it.

Rigging a ticket

I was planning to expand the article by adding some code that takes all the played tickets and finds a combination that wasn’t played on any of them, but statistically, on a batch of 10 million tickets, the wins are so rare, they fall into the “let’s give the suckers a bone to chew, so they won’t think this is rigged”.

And for curiosity, I went to the Loteria Romana page, to see what prizes were. And I was fucking flabbergasted. The amounts below are in Romanian Lei, so mentally divide those sums by 5 to get an approximate value in euros.

'Winnings' kek

Comparing the numbers from my simulation with the reported winnings we can conclude that from a ticket that costs 1.5 Euro, you have:

  • 98% chance to get nothing
  • 1.7% chance of getting a pack of cigarettes and a coke
  • 0.1% chance of getting two-three days’ worth of salary
  • 0.02% chance of getting about 1/3 of an apartment’s value, or a fancy car (32k euro)
  • a very rare (greatly less than ONE IN TEN MILLION) chance to win a life-changing amount of money.

If I were to extrapolate the number of their winners to the number of winning tickets in my simulation, with less than 300 winners with 4 numbers as opposed to almost 10000 in my sim, I’d venture to say that each run by the Romanian Lottery is played on about 300.000 tickets. Lol, with that amount of players, no one should ever find the six lucky numbers, unless they’re super extra mega lucky, or if the game is rigged.


This post is a part of Agora Road’s Travelogue for the month of December, an effort to promote blogging.