13 August 2011

A Proper Shuffle

Working with sample distributions, we often need to permute them in a random fashion, that is, shuffle them. The reasons for doing this have to do with managing correlation, and that's a separate subject.


The typical approach is to go through the vector swapping each element with another randomly chosen element. Something like

 n = dist.length
 for(i=0; i<n; i++){
  r = randomInteger() % n
  t = dist[r]
  dist[r] = dist[i]
  dist[i] = t
 }

Unfortunately, this doesn't make all possible permutations equally likely. Some elements are swapped more than once, and that creates a bias.

The Knuth Fisher Yates Shuffle removes the bias without adding any computational load. Instead of swapping with any other element, you swap only with downstream elements, like this:

 n = dist.length
 while(--n){
  r = randomInteger() % n
  t = dist[r]
  dist[r] = dist[n]
  dist[n] = t
 }

If you want to study this in more depth, Wikipedia is a good place to start.

No comments:

Post a Comment