13 January 2012

Bake Risk Management into the Plan

How do you manage the risk in a project? The default is a risk management program that's essentially out of band. The project and the risk management are separate tracks, often staffed with different people.

The first problem this creates is synchronization - making sure that changes in the project are reflected in the risk management plan and vice versa. The next problem is that there are two different groups of people with diverse objectives - and only one of them is focused on a successful project.

Integrated Risk Management

The probability management solution to this is to bake the risks into the plan with everything else. This way, there's one plan, one process, one team, one manager executing the project.

What does that mean? It means that we include risk events and responses as elements of the plan along with the tasks and milestones.

It means we assume the risks and put them anywhere in the workflow that they can have an impact. The only difference between ordinary elements and risk elements is that the risk elements include a probability filter.

I have an Excel spreadsheet with a plan snippet to go with this discussion. You may want to fire it up. It's at http://smpro.ca/ProbMan/snipBakeRiskIn.xlsm. It shows a small piece of a development plan with the possibility of a strike or lockout that could cause work to stop for some period of time. It's a little busy because I've crammed everything into one sheet that would normally be spread over three or four.

Here's how it works for time planning the probability management way:

We give each risk element two sample distributions: one for the probability of the event, the other for the time impact if the risk event occurs.

We carry the probability in a random binary distribution - all 1s and 0s. If the number of trials is N, and the probability is P, we'll have P*N ones in the distribution. We'll also shuffle the distribution with a random permutation index.

In the model, the upper spinner updates the Gantt chart for each individual trial. There's no Trial#0, so we use that to present the averages. They never happen in the model or in the real world, but some people seek comfort in averages, so I put them there. As you click through the trials, you'll see that the strike/lockout bar is often absent. That's the zeros in the filter distribution. (There's some kind of race condition with the spinner control that I haven't been able to fix, so the first click may not always work - just hit it again.)

For the second distribution, we use a sample distribution that assumes the risk event has occured. The uncertain variable is the incremental time the response (or sitting on your hands delay) adds to the workflow. In the model, this distribution sets the size of each strike/lockout bar, when it's there.

This way, we're not concerned with the PxI calculation issue. A 25% probability of winning a million dollars never delivers $250,000. It delivers a million dollars or nothing. In our simulation, the same principle applies. For any one scenario, workflow is either impacted or not, and the probability affects how often it's impacted, not the severity of the impact. This puts the risk calculation where it belongs - in the frequency, just as it is in the real world.

In the model, the strike/lockout risk affects the time distribution of Milestone A, shown in the weeks to finish pChart. The lower spinner changes the probability assumption, with a corresponding effect on both the Gantt chart and the pChart.

One of the things probability management handles especially well is the case of a risk that has impacts in multiple workflows, possibly in many different plans. A strike or lockout could well affect plans and projections in a variety of company divisions. It's important that, when division plans are aggregated into a corporate projection, the strike versus no-strike scenarios are coordinated. The results would be misleading if, for a particular scenario, Division A's numbers assumed a strike while Division B's didn't.

The probability management solution is for a corporate source to send a DIST of the strike/lockout sample distribution to all the divisions. All the plans and projections will use it without permuting it, synchronizing all of the division-level results. This makes sure that, for a particular scenario, all the divisions and the aggregate model agree on the risk event either happening or not.

This approach lends itself to a risk management strategy that's global in scope and local in application. It puts the concerns and capabilities exactly where they should be for optimum effectiveness.


I've added a software development model with an instance of integrated risk management to the SDXL site.

No comments:

Post a Comment