16 February 2012

The Expected Value is Not Expected

The time and duration estimates produced by conventional planning tools are usually the Expected Value (EV), also known as the mean or the average. In most cases the EV is very close to the median – the 50% probability region.

The thing is, 50% is also the region of greatest uncertainty, subjective and objective. Heads you're under budget, tails you're over budget.

That doesn't seem like a wise choice. Especially since it's between a little under budget and a lot over budget. There's this thing called Jensen's Inequality. One of the ways it gets you is that there's a limit to how little a project will cost but there's no limit to how much it will cost.

As Sam Savage has pointed out many times many ways: On average, the average is wrong. So when you're tempted to perpetuate the Flaw of Averages, use a distribution instead.

08 February 2012

What's a Sample Distribution?

A sample distribution is a list of numbers, in most cases lots of numbers – hundreds or thousands of them. Each item in the list holds a possible value of something whose actual value is unknown – an uncertain variable.

How long will it take to complete this project or task? There isn’t just one answer; there’s a whole bunch of answers – each with its own probability of being right. This is an uncertain variable. A sample distribution establishes a connection between all those answers and their probabilities. A sample distribution is a simple and versatile form of probability distribution.

The essential feature of a conventional, parametric probability distribution is a formula. It’s based on expert analysis and curve-fitting of a suitably large number of observations of an uncertain variable. Coupled with a pseudo-random number generator, it produces equally-probable samples that are, hopefully, characteristic of the uncertainty in the real-world variable being modeled.

To make a sample distribution, we do away with the analysis and the formula but keep the observations – renamed ‘samples’ – and put them in a vector (array, matrix, list). We let the real world data speak for itself. There are some constraints on how we do this, so that the sample distribution meets some basic specifications, but that is essentially it.