22 April 2012

Reading Sample Distributions

Usually, "How long will it take to do this?" doesn't have just one answer. Whether 'this' is a small task or a large project, it has a whole bunch of answers - each with its own probability of being right or nearly right. The answer is an uncertain variable; it's a variable because it can have more than one value and it's uncertain because we don't know what the variable's value is (or will be). We don't know how long it will take to finish the task and we won't know for sure until it's done. Until then, it's an uncertain variable. After that, we refer to it as an actual value.

Unfortunately, we can't leave it at that. There's a third value we call the plan value. That's the value we aim at, the value we hope will be right, the value we use to coordinate with other related activities. That's a value we choose and, hopefully, we make that choice knowing what the associated risks and probabilities are. Tagging along with that choice is the project schedule we hope will produce it, and the associated project cost we hope not to exceed. We don’t “manage” uncertainty; we stare it in the eye and make a decision.

The probability management approach to uncertain variables is to quantify that uncertainty with a sample distribution. If you aren't familiar with sample distributions, I cover them in this blog post.

The key to reading a sample distribution is that it's a list of numbers and each number is a possible value of the uncertain variable. Also, it's unbiased; each number is as likely as any of the others to be the closest to the variable's actual value.

The list will usually have hundreds or thousands of sample values, so looking at the numbers is not going to be terribly useful. We could calculate its average and other statististical properties, but this throws away a lot of information and gives only the illusion of relevance.

We need to see the range of possibilities, the relative probabilities, and most of all, if we choose a particular value as the plan, the odds of meeting or beating the plan. We need some pictures.

One marginally useful picture is a histogram.

Histogram

This shows you the relative probability of different values. In this histogram we see that the most likely value is around 4 of whatever the unit is. On the other hand, there's a lot of the graph to the right - the probabilities are lower but there are more of them. Since we're really more interested in the probability of meeting or beating any particular plan, we need a better picture.

The better picture is a probability management version of what statisticians refer to as a cumulative distribution function, but we won't need any calculus to build it. A percentile curve just plots the sample values against their rank.

pChart

If you pick any point on the curve, the horizontal axis gives you a value and the vertical axis tells you what percentage of samples are equal to or less than that value. There's a complementary version of the chart for "equal to or more" for things like revenue and profit. The dotted red line is usually hooked to a control so it can be moved, making it easy to read-off values and percentiles.

On this chart, if the value axis shows weeks to complete the task, we can see that 50% of the sample values are six weeks or less, and 80% are twelve weeks or less.

How we read this is a function of where we got the sample values, who the audience is, and what kind of decision we're making.

The Judgement of History

"We looked at a large collection of similar tasks and used their actual durations as the sample values. We don't have any special magic, so the probability that we'll meet or beat a particular target is no better than that of the sample tasks."

The Judgement of Experts

"Since we started our program of estimator training and periodic calibrations, our senior developers have been giving us fairly reliable estimates. This is their consensus distribution for this job."

The Spin

If we plan on six weeks it's an even money bet; heads we'll make it, tails we run over.

If we want something more like a sure thing, the 90% scenario is 17 weeks.

There's a 75% probability, 3 to 1 odds, that 10 weeks will be enough.

With a twelve-week plan there's a 20% probability of missing the date, and a 10% probability of overrunning by five weeks or more.

These are all characterizations of the same estimate.

Manage the Risk

In other words, "Do you feel lucky?" The choice to peg the plan at six weeks or seventeen or somewhere in between is about managing risk. What are the consequences of being late? What are the consequences of estimating high? Somewhere between those two is a management decision.

No comments:

Post a Comment