Point Deception

Dead Reckoning

Story Points originated deep in the annals of early Agile methodologies, before Scrum. Points were meant to address a specific problem: A feature’s complexity is distinct from the amount of time it takes to implement it, due to the widely variable skills and productivity of individual software engineers. Scrum teams use Story Points to calculate their Velocity, a measurement of productivity. With a known Velocity and a Backlog of Stories sized in Points, it should be possible to predict the duration of a project or initiative.

Hold on then. If the time required to complete a Story varies widely from engineer to engineer, then how does a complexity index increase predictability? The answer to that question depends on the law of averages. For a given team of non-trivial size, the total number of points completed in any given time period, the team’s Velocity, tends toward a constant. The problem is that teams often aren’t large enough, or don’t remain stable over a long enough period of time to achieve constant Velocity.

Studies and organizations that advocate Points and Planning Poker as an estimating method often omit context. You can’t compare a new team hired to develop a brand new product from the ground up, to an established team that’s been working together for several years on a mature code base. Jeff Sutherland (co-creatore of Scrum) himself declared in a blog post on scruminc.com, “A three point story today is three points next year and is a measurable part of the product release for a Product Owner.” (May 16th, 2013). Sutherland defends Points as the metric by which to achieve predictability. It may be unwise for me to contradict a founding father, but here goes: I disagree. In his very next sentence, Sutherland states, “The hours to do a story depend on who is doing it and what day that person is doing it.” Isn’t that the entire problem?

Obviously, Points are not a universal standard. Depending on who’s making the subjective judgment, a three point story today could be a one point or a five point story a year from now. And it’s not just person-dependent. In the early stages of a new product, many things are new and difficult. As the product matures, both the business and technical architecture become more familiar to the developer, and furthermore, more of the code base becomes reusable, tending to drive Story complexity down, often dramatically. That’s why acclimated teams can accomplish more work in less time than newly formed teams.

Scrum teams who are fully aware of the shifting magnitude of a Story Point compensate by estimating only the Stories planned for the next Sprint. However, if you estimate only the next one to four weeks of Stories, and your product will require many Sprints to complete, then this short term estimating process won’t predict a completion date. Some businesses are structured to accept product delivery as it comes. If that’s your company, then you’re the perfect candidate for orthodox Scrum. Often, that’s not possible. Many enterprise customers won’t enter into a contract if you tell them they’ll get their mission-critical application whenever it happens to be finished. The cure is supposed to be the Minimal Viable Product (MVP), but the MVP almost always requires the effort of multiple Sprints and therefore doesn’t solve predictability, except insofar as the MVP may be limited enough to mitigate the impact of a big schedule slip.

Are Hours Any Better Than Points?

Intrinsically, no. If you estimate tasks in hours with the same sort of snap judgment as you do Points, the results will be just as nebulous. The time required to complete a task by one software engineer compared to another will vary widely, based on skills, experience, and domain knowledge.

The reason I personally prefer hours is twofold. First, you can converge on a more meaningful time estimate by decomposing the Story into more atomic technical tasks, each with its own estimate. With deeper analysis, you’re already improving the quality of your estimates. Second, an estimate in hours should presume a certain skill level. You wouldn’t plan on assigning a particularly complex problem to a junior software engineer, and conversely, you wouldn’t assign straightforward tasks to your most senior staff. The estimates are made within the context of the team assembled to complete the work.

By the way, hourly estimates become even more useful and result in more predictable outcomes when you apply statistical methods to best case/worst case ranges. But that’s a topic for another time (if you can’t wait, see Steve McConnell’s “Software Estimation”).

Some anecdotal evidence suggests that high level estimates in Points are no more or less accurate than carefully considered estimates in hours. I believe it. Some teams know their products and markets so well that they can throw out estimates their companies can hang their hats on. But again, it’s circumstantial. Just because it’s possible doesn’t mean it works in all, or most cases.

Cone of Uncertainty

If certainty increases over time, how would the estimate change over several Sprints for a Story originally estimated at five points? Most teams assign Points in Fibonacci numbers, so the change is quantized. A five Point Story, if it changes, would snap to either three or eight Points, which is an extremely wide range. If either your Points are “small” enough, or your team is large enough to populate each Sprint with hundreds of Points, then a few of these wild variations will likely average out and all will be well. On the other hand, if you typically take on under a hundred Points per Sprint, then your delivery dates can swing dramatically, and that usually means in the bad way.

If you estimate a task in hours, say 30, for example, then you may tweak this number up to 35 or 40, or down to 20, as you gain more domain knowledge. That’s a more precise adjustment, and less likely to randomize your ship date.

I know, Agile projects don’t have ship dates. Every Sprint ships. Fantastic. That would be useful if users could run their businesses with partially baked applications, but they almost never can. The point of shipping often is to force quality and stability, not necessarily utility. An invoicing system that accepts customer account data, but not transactions is not useful. The MVP could be a dozen or more Sprints away.

That brings me back to the idea that you need to apply the most appropriate techniques for the project at hand. If your product goals are open-ended, meaning you make continual improvements at your own pace, then Points-driven Scrum is an ideal way to operate your team, especially if your team remains stable over many iterations. However, for date-driven projects, with customers waiting to receive specific deliverables, you need to manage your time, and time is measured in hours, not Points.