Dread-free Software Estimating

You may believe that time estimates are useless. You may be correct. Estimates fulfill a specific purpose, predictability, which may not apply to your project or initiative. Any schedule-driven project, Agile or otherwise, hinges on the accuracy of estimates. If you don’t need predictable delivery, you can skip this headache altogether. If you’re lucky enough to work on a product that has no timetable, don’t waste your time on extensive planning. Make yourself a list of things to do, a Product Backlog, and knock them off at whatever pace your team can manage.

You may also prefer Story Points over time estimates. For some teams the combination of Story Points and average Velocity yield predictable timelines, especially for stable, experienced teams who’ve worked together through many Sprints. Points differ enough from time estimates to merit a separate article, so I won’t address that technique here.

I’ve explained why and when time-based estimates are useful or essential in my previous four-part series of posts, “The Date-driven Project: Cracking the Agile Paradox.” In this article, I’ll move on to addressing estimating’s specific challenges and how to tackle them.

Accuracy

Stating the obvious, what makes an estimate useful is its accuracy. I wish I could say there was a prescription for accurate estimates. There isn’t. Estimating is a soft science whose accuracy depends on several factors, especially experience. Consequently, it’s unreasonable to expect inexperienced engineers to estimate accurately. Therefore, the greater the cumulative experience among members of the team, the greater predictability you can expect.

To complicate matters, the raw career experience of the people responsible for estimating isn’t the only experience factor. Their tenure with the team, product (aka program), project, and subject matter also affect accuracy, often by wide margins.

Software estimating truly is uniquely difficult. Software development is not entirely repetitive. Even the most mundane enterprise applications can be riddled with novel problems. By definition, it’s all but impossible to accurately predict the effort required to solve a problem never before seen.

It seems like an intractable problem. However, there are ways to calculate predictable schedules from fuzzy estimates. First, let’s define fuzziness, or more precisely, precision.

Precision

Accuracy describes a measurement; precision describes the variability of that measurement. You could say, for example, that I’ve estimated 5 days for a task, and that would have an implied precision of plus or minus one day. If you were to express the same estimate as 40 hours, the implied precision might be plus or minus one hour, but certainly not plus or minus one minute. You could also qualify either of these with explicit precision, e.g. +/- 2 days, or +/- 16 hours. When you’re estimating tasks, you want to choose a degree of precision — hours, days, weeks, or months — that appropriately expresses your certainty — or lack thereof — at the time of estimation.

Imprecise estimates are useful for roughly sizing projects, but not for nailing down ship dates. Precise estimates, when they’re accurate, lead to predictable schedules, at a cost. In general, both accuracy and precision correlate with the degree of detail furnished in design documents, such as functional requirements or technical specifications. Design takes time of its own.

Generally, the more granular your tasks, the better your estimates. It’s not that estimates on smaller tasks are intrinsically more accurate. The quality of your estimates comes from the work you do to break down the problem. Decomposition teases out the relevant questions and answers needed to stake out measurable units of work. You arrive at better estimates by increasing knowledge and decreasing ambiguity.

Measuring Uncertainty

Now let’s talk about the dark side of qualifying estimates with uncertainty. Say you have a hundred or more estimated tasks. The next step is to drop them into a schedule and calculate a completion date. But what are you supposed to do with the uncertainty? Depending on how well the implementation problems and solutions are understood in advance, you’ll likely end up with precision that varies from task to task. Usually, this is expressed not with precision, but as certainty, e.g. “We’ll complete the credit card address validation API in 20 hours, with a certainty of 80%.”

How does 80% certainty translate to days and hours on a calendar? Does it mean the task could take 20% more or less than 20 hours, i.e. 16 to 24 hours? Certainty isn’t meaningful without a significant statistical sample. When a meteorologist says there’s an 80% chance of rain, we assume it’s likely to rain. What she’s really saying is that out of 100 such predictions, it will rain on 80 of those days. On any particular day it will either rain or it won’t. And the prediction says nothing about how much it will rain. Could be a quick shower or a twenty-four-hour deluge. Another way to state this type of certainty is, “There exists an 80% chance I will complete this task within the time estimated,” which only says there is a 20% chance of failure, and then nothing at all about how much additional time might be required, i.e. yes or no, but not how much longer.

Knowing that 80 out of a hundred tasks will finish within the estimated time isn’t helpful. Let’s say you lined them all up sequentially and calculated a project duration of 175 days. Great. How will you use the value of 80% certainty to determine best and worst case? You might assume you could use the broken math from above and calculate precision by applying 20% in either direction, for a range of 140 to 210 days (+/- 35 days). If that’s close enough for your business stakeholders, run with it. The low number is the sum of all best cases, and the high number, the sum of all worst cases. The actual duration will land somewhere within that 70-day gap, an example of low, and mathematically dubious precision.

Hone In On the Range

Given that high precision estimates are unrealistic, it may seem I’m conceding the entire exercise. I’m not. There is an estimating technique that factors in uncertainty in a way that actually, if counter-intuitively leads to predictable schedules.

It’s not as simple as adding up the lows and the highs. That will give you an enormous range of uncertainty. Instead, the summation is based on the statistical method of summing the variance. Think of it this way: If you state a range of uncertainty for each task, some will end up requiring the worst case, some the best, and many in between. Over a large number of tasks, we want to find the average effect of all that uncertainty, not the cumulative best and worst cases.

In his book, Software Estimation: Demystifying the Black Art, Steve McConnell explains the mathematical details well, and it’s simpler than it sounds, requiring only basic arithmetic. If you’re familiar with a few concepts, you may already understand the steps:

  1. Calculate the Standard Deviation (SD) by dividing the difference of each high/low pair by 2.6 (Task_SD = (high – low)/2.6).
  2. Square each SD to obtain the Variance (Task_Variance = Task_SD^2).
  3. Sum the task Variances.
  4. Take the square root of the summed Variance to get the aggregate SD.
  5. Sum the means (averages) of each high/low pair.

Apply the aggregate SD to the overall mean (average) to yield the overall low and high estimates (Overall_Low = Average – Aggregate_SD/2, Overall_High = Average + Aggregate_SD/2).

The result is an unusually strong predictor, based on evidence from a sizable industry sample. The magic number, 2.6, represents the 80% band within the normal distribution. In other words, it assumes that your team will correctly identify the minimum and maximum effort 80% of the time, which is about the best you can hope for, unless you allow them to give extremely broad ranges (e.g. 1 to 1000 hours), and therefore forfeit so much precision that the entire exercise becomes meaningless. The risk in your schedule is in the outliers, so I actually do encourage my teams to toss out a broad spread for any task they find especially tricky to predict, but those are the exceptions.

The arithmetic of range-based estimates is not intuitive, yet not terribly complex. It’s a lightweight statistical method that yields an aggregate range, which is much narrower and more refined than the sum of the lows and highs. The basic calculation is straightforward when you’re calculating total effort for a project as a costing exercise.

Note: No matter what estimating method you use, be careful how you translate estimates into schedules. You can’t simply divide the total expected effort by team capacity per week, month, or Sprint to calculate calendar time. Functional and resource dependencies always stretch timelines.

The range-based method can be incorporated into a schedule, although the math becomes trickier. Estimating total effort for a list of tasks is not the same as estimating completion dates for daisy-chained tasks, with cumulative uncertainty. If you were completing each task in succession, one at a time, the cumulative low and high would predict a range of completion dates. But that’s not how a project works. Some work occurs in parallel, and some in sequence. The milestones and final deadline depend on the critical path. It’s not easy to construct a timeline from best and worst cases, which, as I explained above, are not simple sums, and depend on the path the team takes through the scheduled work. Add to that the gradual decrease in uncertainty as tasks are completed. For a project of any size large enough to benefit from this technique, it’s too painful to calculate the timeline by hand and then revise it continually as the project progresses. You could try a spreadsheet, but spreadsheets are not great tools for modeling complex schedules.

The best option is to use a project management tool that calculates rolled up durations, such as LiquidPlanner.com, the only project management application I’m aware of that runs on range-based estimates. I’m not affiliated with LiquidPlanner.com in any way, but I do recommend them because of this feature. It’s possible that Smartsheet’s (Smartsheet.com) hybrid spreadsheet and project management features, with some finagling, could work as well. If you don’t use a project management tool with these capabilities, you can still benefit from range-based estimates to pre-calculate total effort, and then plan your schedule using the high/low averages (means) for each task.

Range estimates work best over a largish number of tasks, obeying the so-called “law of large numbers.” Your uncertainty on individual tasks can vary widely. Yet over a hundred, or several hundred tasks, it averages out into a surprisingly accurate prediction. At least, that’s been my experience so far. For more thorough coverage of this and other methods, purchase Steve McConnell’s, Software Estimation, mentioned above.