24 November 2012

Calculating Uncertainty


It took me a lot longer than I thought it would to write this paper. I wanted a gentle introduction to simulation with SIPs and it turned out that gentle is not easy. So it's 20 pages with lots of examples and charts and two Excel workbooks to go along with the text. The workbooks aren't necessary but they help.
In many ways, the essence of Probability Management is how to do probabilities by counting stuff – and having a computer do the counting. This monograph focuses on that.

Update:
Now available as a paperback and as a Kindle e-book.

Pdf format and Excel workbooks:
http://smpro.ca/sipmath

20 November 2012

Risk = PxI is wrong

You're estimating a project.

Let’s say you have a risk element and the event has a 25% chance of happening. If it does, it will add $100,000 to the cost of a particular task. You’ll resist the temptation to just add $25,000 to the task cost, because that’s not what happens in the real world. It’s one project, not a million transactions, so the average is invalid. In each possible future, it’s $100,000 or nothing.

It’s possible that downstream events would be triggered by the $100,000 while $25,000 would fly under the radar. Also, looking at the range of possible project costs, the high numbers would be $75,000 low, and the low numbers would be $25,000 high.

So don’t use Probability x Impact. Ever.

08 November 2012

The Art of the SIP

Sam Savage has put another brick in the wall with Distribution Processing and the Arithmetic of Uncertainty, an article in the ORMS Analytics Magazine (2012 Nov-Dec).

The article expands on the concept of SIPs (Stochastic Information Packets) as packaged uncertainty. It shows how to use SIP math and raw Excel to do Monte Carlo Simulation "without the Monte Carlo."

He also introduces SIPmath – an Excel add-in to simplify building models that use SIP math. Once the model is built the add-in is no longer needed and the simulation can run without it.

Probability Management is on a roll. Read the article and then go to sipmath.com to learn more.

10 September 2012

The Underestimation Double-Whammy

The main thing we're trying to fix with Probability Management for projects is that conventional tools and techniques give us wrong estimates, and the errors are all one-sided; they consistently underestimate project cost and duration.

Underestimating resources makes it more likely that a project will be approved, and makes it more likely that it will fail. That's a double-whammy that results in more failed projects.

04 September 2012

Just Fix The Math

You see, there's this mystery: Spend a few minutes with Google and you can get a long list of the things that cause projects to fail; we know what they are and how to deal with them. To that easily accessed tradecraft, add the fact that institutions like PMI are certifying over 50,000 project managers a year. Project failure rates should be plummeting. But, for any given industry, failure rates have remained unchanged for decades. This leaves one thing to fix - the math.

Sam Savage has shown us what the problem is, and pointed us at the solution. The Art of the Plan includes my attempt at fixing the problem in project planning.

31 August 2012

The Only Good Risk Register Is An Empty Risk Register

By Mark Powell

Have you ever seen a risk register with 500 or more risks on it? It seems that these days a lot of projects have huge risk registers. How does this happen?

Most people believe that this is natural for a large and complex project.

A good friend recently described a proposal for the California High Speed Train that would go from San Diego to San Francisco and Sacramento. His pre-project draft risk register covered everything from track, signals, routes, station interchanges, software, train sets, health and safety, Environment, etc., and it was huge. Well, that, of course, is no surprise; it is one big, complex, project!

18 August 2012

The Book is Done

It's taken way longer than I thought it would, but I've finally got The Art of the Plan written and published. The e-book version is available from Smashwords in all the useful formats.

The printed version is still in process. I'm guessing early September for release.

The book covers most of the topics I've been writing about in The Art of the Plan blog – from identifying crystal-clear objectives and requirements through to modeling and simulation using Probability Management techniques to produce realistic project plans. There's an Excel workbook loaded with examples to go with it and, of course, it uses SDXL.

13 July 2012

Benefits Realization

Benefits realization – building on (un) safe foundations or planning for success?

Here's a really good article on closing the gap between project predictions and realization. Jenner covers the well-known sources of error and misrepresentation. Unlike other writers on this topic, he doesn't just wring his hands but responds with well-thought-out prescriptions.

His prescriptions include effective planning (start with benefits and requirements, design the solution later), Science (seek disconfirming evidence), Reference Class Analysis, Probability Management (distributions rather than point forecasts).

In short, this is an article I wish I had written.

28 June 2012

More Evidence for More Graphics

An article in the Harvard Business Review adds more evidence that presenting analytic results as charts instead of numbers improves the interpretation of the data.

Economists Are Overconfident. So Are You reports on a study that makes a good case for just charts and no numbers – charts and numbers produced worse interpretations.

It's a good read

11 June 2012

Sample Distributions for Excel - SDXL Ver.0.4.0

SDXL is a free, open-source Excel Add-in I developed to bring Probability Management techniques and strong array handling to Excel. Most of the functions manipulate and calculate with sample distributions represented as arrays. The arrays passed into functions as arguments can be of several types that the functions detect and handle automatically. The types are:

  • a VBA array
  • an XML string.
  • a CSV (Comma Separated Values) string
  • A cell range

There are functions for converting one format to another and for reading and writing XML files containing sample distributions. There are also simple functions to present sample distributions graphically with histograms, scattercharts and percentile charts (cumulative probability).

The math functions include all the usual arithmentic, boolean and comparison functions. It's also possible to mix array and scalar arguments.

The code to multiply two sample distributions and return the result as a CSV string in a cell is:

= toCSV( sdMul( arrayA, arrayB))


The arguments arrayA and arrayB could be just about anything that can be interpreted as a sample distribution.

There are also a lot of functions that assume the arrays are sample distributions and do statistical stuff. There's a complete set of sampling, sorting and permuting functions dedicated to manipulating the array elements – in particular, their order.

And, of course, there are a bunch of random number generators using both VBA's built-in generator and an efficient Mersenne Twister implementation.

I think everything needed to develop fairly interesting simulations and models is there. There are two things that are conspicuous by their absence: Coherent sets of distributions and distribution time-series. I like to have real use cases to drive a design and since I've been concentrating on project planning and estimating, I haven't seen any of these – yet.

To get SDXL, go to smpro.ca/SDXL .

There's also a Google+ Community at +SDXL Users

10 June 2012

Probability Management for Project Estimates

Calculating with averages, point estimates and expected values runs afoul of the Flaw of Averages. The answers we get aren’t just wrong; in the case of project estimates, they’re hopelessly optimistic. This bears getting hammered on. The math, the formulas, the computation is fundamentally, intrinsically, wrong, on several fronts.

And the errors are one-sided; they underestimate cost and time.

There’s a long dog-eared list of things that contribute to project failure. Most of them have to do with human factors or natural disasters like hundred-year storms – hard problems to deal with. Here, on the other hand, is an easy problem to deal with – just fix the math.

03 June 2012

A Tale of Two Dice

Let’s do a little experiment.

Take two dice, one six-sided, the other ten-sided. If you don’t have the dice and there isn’t a game shop handy, you can always use Excel to simulate the experiment. We’re going to throw both dice many times and each time we’re going to record the maximum value of the two sides that turn up. After a few hundred throws, we’ll calculate the average of the values we’ve recorded.

But do we need to do the calculation? We know that the average of the six-sided throws is 3.5 and that the average of the ten-sided throws is 5.5. The maximum of 3.5 and 5.5 is 5.5.

But let's do the calculation anyway. What we get is an average of about 6.1. That can’t be! So we do it again. And again the average is about 6.1.

So what's going on here? The common sense explanation is that, since you always take the larger of the two values, the combined average should be bigger. How much bigger?

More generally, is there a general-purpose mathematical/statistical function that will produce the probability distribution that's the maximum of two probability distributions? The dice problem is easy – it involves the simplest distribution there is. And yet ...

I dredged the internet looking for a formula and came up empty – except for this valiant effort: http://www.cecs.uci.edu/~papers/iccad06/papers/3C_4.pdf

These guys find a formula that works for a very special shape of distribution that may or may not match the real world and, if I read it right, they force the result into the same shape.

The irony in this paper is that they use simulation to validate the formula. If the simulation results are the measure of the formula's validity, why not just simulate and be done with it, especially since simulation solves the problem for any shape of distribution?

That's one reason I say "Don't formulate. Simulate!"

25 May 2012

Why Simulate?

The inputs to a project - primarily the time to execute a given task - are uncertain; they have many possible values and each of those values has its own probability of being right. You can't just take the averages (expected values) and calculate with them because they aren't additive. One example of the constraint is two workflows converging on a milestone. The milestone date is the maximum of the two finish dates, and the maximum of two averages has no meaning. That's one reason PERT and CPM are so consistently wrong.

The uncertainty in the inputs translates to uncertainty in the outputs. You have to calculate with the inputs as probability distributions, producing probability distributions for the project duration and cost. The standard way of doing this is to simulate running the project many times, each time with a different combination of input values (a trial). What CPM does once, you do a thousand times. You do this in a way that makes sure all the trials are equally probable, so the outputs will all be equally probable (more or less - this is such a huge improvement that you don't need to worry about minor discrepancies).

A bit of curve plotting gives us some insight into the range of possible project outcomes and their probabilities. We can use that as input to whatever resourcing decisions we might want to make.

Monte Carlo Simulation is one way to do the simulation. Probability Management and SIP Math is a superior way to do it. They have one thing in common and that is that they use PRNGs to get the input combinations sufficiently mixed to meet the requirement. Beyond that they part company.

04 May 2012

Order in the Distribution

Sample distributions have two independent properties: shape and order. The shape is the list of values in the distribution without reference to the sequence they're in. Order is the sequence they're in. Order has no impact on a distribution's statistical properties looked at in isolation, but it has an effect when we're calculating with multiple distributions.

I've posted about Shape a few times; now it's Order's turn.

23 April 2012

Why Critical Path Scheduling (CPM) is Wildly Optimistic

Pat Weaver has a really well written paper about the fatal problems with CPM. He goes beyond the flaw of averages issues and covers the topic in depth.

I'm not sure I agree with all his prescriptions, but it does have me thinking about my preconceptions.

The abstract and link to download the pdf are here.

22 April 2012

Reading Sample Distributions

Usually, "How long will it take to do this?" doesn't have just one answer. Whether 'this' is a small task or a large project, it has a whole bunch of answers - each with its own probability of being right or nearly right. The answer is an uncertain variable; it's a variable because it can have more than one value and it's uncertain because we don't know what the variable's value is (or will be). We don't know how long it will take to finish the task and we won't know for sure until it's done. Until then, it's an uncertain variable. After that, we refer to it as an actual value.

Unfortunately, we can't leave it at that. There's a third value we call the plan value. That's the value we aim at, the value we hope will be right, the value we use to coordinate with other related activities. That's a value we choose and, hopefully, we make that choice knowing what the associated risks and probabilities are. Tagging along with that choice is the project schedule we hope will produce it, and the associated project cost we hope not to exceed. We don’t “manage” uncertainty; we stare it in the eye and make a decision.

14 April 2012

Cobb's Paradox

"We know why projects fail, we know how to prevent their failure -- so why do they still fail?" Martin Cobb, CIO, Treasury Board Secretariat, 1995

Seventeen years - and still no answer.

Actually, there is an answer, but the question seems to have become unpopular.

28 March 2012

Somewhat Object-Oriented Excel

One of the good things about Excel/VBA is that VBA supports classes and object-oriented designs.

The bad thing is that the objects and methods aren't accessible with worksheet cell formulas. That means that if you define a class module, you still need an ordinary module for the functions that will provide an interface between the spreadsheet and the class.

20 March 2012

Monty Hall Simulation

Not to be confused with Monte Carlo Simulation, this is about the so-called "Monty Hall Paradox."

In the Monty Hall game, there are three doors, one of which hides a prize. The contestant chooses one of the doors, at which time Monty opens another door with no prize behind it. Now the contestant is given an option to switch doors. The question is, "Should the contestant switch or not, or does it matter?"

07 March 2012

How to do a proper histogram in Excel

A few months ago I posted an article on the right way to do a histogram. The example I showed was done with JavaScript, because there was no way apparent to do it in Excel.

I found this annoying, so I've been picking away at it. The problem is that the labels should be between the bars, not underneath them, and Excel insists on putting them underneath.

16 February 2012

The Expected Value is Not Expected

The time and duration estimates produced by conventional planning tools are usually the Expected Value (EV), also known as the mean or the average. In most cases the EV is very close to the median – the 50% probability region.

The thing is, 50% is also the region of greatest uncertainty, subjective and objective. Heads you're under budget, tails you're over budget.

That doesn't seem like a wise choice. Especially since it's between a little under budget and a lot over budget. There's this thing called Jensen's Inequality. One of the ways it gets you is that there's a limit to how little a project will cost but there's no limit to how much it will cost.

As Sam Savage has pointed out many times many ways: On average, the average is wrong. So when you're tempted to perpetuate the Flaw of Averages, use a distribution instead.

08 February 2012

What's a Sample Distribution?

A sample distribution is a list of numbers, in most cases lots of numbers – hundreds or thousands of them. Each item in the list holds a possible value of something whose actual value is unknown – an uncertain variable.

How long will it take to complete this project or task? There isn’t just one answer; there’s a whole bunch of answers – each with its own probability of being right. This is an uncertain variable. A sample distribution establishes a connection between all those answers and their probabilities. A sample distribution is a simple and versatile form of probability distribution.

The essential feature of a conventional, parametric probability distribution is a formula. It’s based on expert analysis and curve-fitting of a suitably large number of observations of an uncertain variable. Coupled with a pseudo-random number generator, it produces equally-probable samples that are, hopefully, characteristic of the uncertainty in the real-world variable being modeled.

To make a sample distribution, we do away with the analysis and the formula but keep the observations – renamed ‘samples’ – and put them in a vector (array, matrix, list). We let the real world data speak for itself. There are some constraints on how we do this, so that the sample distribution meets some basic specifications, but that is essentially it.

29 January 2012

The Shape of a Sample Distribution

When I'm writing about sample distributions, I realize that I throw the 'shape' and 'order' terminology around assuming they're understood in this context. What I've never done is a proper treatment of either one, so here's 'shape'.

The use of the term shape comes from its informal use in statistics, referring to the characteristic shape of the graphs used to picture various probability distributions. A distribution’s shape refers to the relationship between values and their probabilities. The normal, or Gaussian, distribution has its characteristic bell shape, the power law distribution has its ski-hill shape, and so on. That also applies to sample distributions, but they just are what they are – we don't give them names.

There are a bunch of things you can say about a sample distribution's shape:

26 January 2012

Risk is not discovered, it's chosen

Every so often I get a thought that puts an edge on one of my knives, and "Risk is not discovered, it's chosen" is one of them.

When you tell a stakeholder, "This project will be done in seven months," are you honest enough to tell her that there's a risk of not meeting that target, but she shouldn't worry about it, that you've made the decision about how much risk is acceptable on her behalf?

Given whatever method you use for estimating, do you even know what the probability of missing the target is?

Risk is a choice. The big question to ask is, "Has the choice been made knowingly? – And by the right person?"

13 January 2012

Bake Risk Management into the Plan

How do you manage the risk in a project? The default is a risk management program that's essentially out of band. The project and the risk management are separate tracks, often staffed with different people.

The first problem this creates is synchronization - making sure that changes in the project are reflected in the risk management plan and vice versa. The next problem is that there are two different groups of people with diverse objectives - and only one of them is focused on a successful project.

Integrated Risk Management

The probability management solution to this is to bake the risks into the plan with everything else. This way, there's one plan, one process, one team, one manager executing the project.

What does that mean? It means that we include risk events and responses as elements of the plan along with the tasks and milestones.

04 January 2012

Simulation the Probability Management Way


Let's start at the beginning:

Modeling with Uncertainty

We use computer models to help us foresee the course and consequences of decisions we might make. As it is with all science, the goal is clairvoyance.

If it's well-designed, a model is faithful to the real-world process the decisions will affect; it tells us what's likely to change in the real world if we make this decision or that assumption.

Modeling a process when the inputs are assumed to be correct and exact – using a formula that puts out The One True Answer as a result – is relatively straightforward. It's what we do when we calculate what will happen to our bank balance if we decide to buy an expensive toy. It's also what we do to estimate projects with CPM or PERT, assuring optimistic estimates that help get projects approved.

But it's not always so simple; sometimes a scalar mathematical model won't give us what we need to make an informed decision. This is the case when some of the input assumptions are uncertain variables.