04 December 2011

Counting Stuff – Statistics by Simulation

Ontario's Lotto 649 lottery involves a draw of 7 different numbers between 1 and 49. Each ticket has 6 different numbers from the same range. Prizes are awarded for various coincidences. This is a good excuse to do some 'statistics by simulation.'

Here's the challenge: What's the probability that at least one number on a ticket will match one of the numbers drawn?

I'm not a statistics wiz, so I remind myself that statistics is just about counting stuff. The clever formulas are there to help us avoid the drudgery of counting stuff, but now we have computers and we can let them do the counting. If you're a statistics wiz, you can probably work up a formula – you could do it right now and see if we get the same answer. (Update: It could be fun - I just ran into a description of this problem that refers to a "Multivariate Hypergeometric Distribution")

The easy solution is to simulate the process and let the the computer do the counting. In this case I'm going to use javascript to whip up a small model of a lottery draw and numbers on a ticket, run it thousands of times, and count the number of times there's a hit – at least one match between the lottery draw and the ticket.

To keep it simple, I'll embed the javascript in an HTML file, so the simulation can run on any browser. Oh, yeah: This is not about doing production quality html or javascript; it's just a hack. There's a running version Here

The first thing is to set up a bit of a skeleton to fill in with the code.

<!DOCTYPE html>
<html>
<head>
  <script>
    numDraws = 100000 
    simulate = function() { 
     document.write("Number of Draws: "+numDraws)
     document.write("<br>Hits: "+"count")
     document.write("<br>Probability: "+"probability"+"%")
    }
  </script>
</head>
<body>
  <h1>Counting Stuff - Lotto 649</h1>
  <h2>
  <script>
    simulate(); 
  </script>
  </h2>
</body>
</html>

I set numDraws to the number of draws I want to simulate. simulate() and a couple of functions is where all the code will end up. The document.write lines are where the results will be written into the web page between the H2 tags.

The next step is to set up for the model. First, we're going to need something that gives us random selections without duplicates between 1 and 49, – six for the ticket, seven for the draw. Except for that difference, it's the same for both cases. Just generating random numbers between 1 and 49 won't do it, but there's a variation on the KFY shuffling algorithm to do this kind of sampling.

shuffle = function(n, vector){
    for(var i=0; i<n; i++){
      var r = i + Math.floor((49-i)*Math.random())
      var t = vector[i]
      vector[i] = vector[r]
      vector[r] = t
    }
  }

The vector argument elements are the numbers between 1 and 49, in any order. The first pass through the loop selects one at random and exchanges it with the first element. The second pass selects an element not including the first and exchanges it with the second. This continues on, accumulating n randomly selected numbers at the front of the vector.

Javascript passes arrays by reference, the array passed as the argument will be changed in situ, so there's no need of a return value.

Now we're ready to simulate one draw – with random number selections both for the draw and the ticket.

oneDraw = function() {
    shuffle(7, draw) 
    shuffle(6, ticket)
    for(var i=1; i<6; i++){
      for(var j=1; j<7; j++){
        if(draw[j] == ticket[i]){return 1;}
      }
    }
    return 0;
  }
 

The two vectors draw and ticket will have to be initialized. We'll do that in the main function. oneDraw() uses shuffle to get the draw and ticket number sets, then looks for a match. If it finds one, it returns a 1, if not, a zero.

Now we can pull it all together. simulate() initializes the two arrays and then runs oneDraw() as often as requested, keeping count of the number of draws that produced a match. Finally, it writes the results to the web page.

simulate = function() {
    for(var i=0; i<49; i++) {
      ticket[i] = draw[i] = i+1
    }
    for(var i=0; i<numDraws; i++){
      count += oneDraw()
    }
    probability = 100*count/numDraws
    document.write("Number of Draws: "+numDraws)
    document.write("<br>Hits: "+count)
    document.write("<br>Probability: "+ probability+"%")
  }

The results should look something like this:

Counting Stuff - Lotto 649

Number of Draws: 100000
Hits: 49437
Probability: 49.437%

Hit Refresh to run it again. Change numDraws to get more or less resolution.

A related question is whether the probability changes if the same numbers are played on every draw. To find out, comment out the shuffle(6, ticket) line.

Here's the finished html/javascript:

<!DOCTYPE html>
<html>
<head>
  <script>
    numDraws = 100000    //Number of draws to simulate
    draw = []
    ticket = []
    count = 0
    probability = 0
  
    //Run the simulation
    simulate = function() {
      // Set up the initial states for the draw and ticket
      for(var i=0; i<49; i++) {
        ticket[i] = draw[i] = i+1
      }
      // Now do the draws
      for(var i=0; i<numDraws; i++){
        count += oneDraw()
      }
      // Change the count to a percentage
      probability = 100*count/numDraws
      // Display the results
      document.write("Number of Draws: "+numDraws)
      document.write("<br>Hits: "+count)
      document.write("<br>Probability: "+ probability+"%")
    }
    //Shuffle draw[] or ticket[] to randomize the numbers
    //Go through the vector, exchanging pairs at random
    shuffle = function(n, vector){
      for(var i=0; i<n; i++){
        var r = i + Math.floor((49-i)*Math.random())
        var t = vector[i]
        vector[i] = vector[r]
        vector[r] = t
      }
    }

    // Do one draw, return a 1 if there's a hit
    oneDraw = function() {
      shuffle(7, draw) 
      shuffle(6, ticket)
      //Compare the draw and the ticket
      for(var i=1; i<6; i++){
        for(var j=1; j<7; j++){
          if(draw[j] == ticket[i]){return 1;}
        }
      }
      return 0;
    }
  </script>
</head>
<body>
 <h1>Counting Stuff - Lotto 649</h1>
 <h2>
 <script>
  simulate(); 
 </script>
 </h2>
</body>
</html>

3 comments:

  1. Hi Marc,

    This is very good indeed to have access to the running version of your code and to see the numbers varying in real time.

    You are mastering the javascript language no doubt. However, I am loosing you when you write the javascript code. My javascript skills are minimal and I cannot follow your train of thoughts by reading the code. Could you add more explanations if possible (or comments in the code)?

    Thanks!

    Gilles

    ReplyDelete
  2. Hi Gilles,

    Thanks for the feedback. Commenting the code is now on my to do list.

    Cheers,
    Marc

    ReplyDelete