Thursday, May 22, 2014

The Mythical Story Point

I fairly recently became embroiled in an argument about whether Story Points or hours are better for estimating the effort associated with Software Engineering tasks. Having helped a lot of teams adopt Scrum and other agile practices, this is not the first time I have danced this dance, and my experience has left me with a strong preference for using hours rather than an abstraction.

The motivations for using Story Points (or any other abstraction for that matter, e.g. T-shirt Sizes) to estimate effort, seem very reasonable and arguments for their use are initially very compelling. Consistent high-accuracy software estimation is probably beyond current human cognitive capability, so anything that results in improvements, even small ones, constitutes a good thing.

The primary motivation for using Story Points is that they represent a unit of work that is invariant in the face of differing levels of technical skill, differing levels of familiarity with the code or domain, the time of day or year, or how much caffeine either the estimator or developer doing the work has ingested. They also provide a typically more course-grained unit of estimation than hours, which by necessity will result in more apparently accurate estimates. By combining this course-grained unit of work, with mandatory refactoring of Stories (or Epics, or Product Backlog Items, or whatever nomenclature you choose to use) larger than a particular effort size, a team is bound to improve the accuracy of their estimates.

The use of estimation abstractions also seem to be beneficial when a team follows the Principle of Transparency, which is espoused by most Agile philosophies. When a team follows this principle, they make the team’s velocity, estimates, actuals and other data available to all stakeholders (e.g. Sales, Marketing, Support and Management), who almost invariably care a great deal about the work items that they have a stake in, and particularly when those work items will be DONE. By using Story Points for estimates one initially avoids setting unrealistic expectations with stakeholders, who may not necessarily understand the prevalence of emergent complexity in the creation of software.

I would imagine that the human brain has a lot of deep circuitry designed exclusively to deal with time. It was clearly highly adaptive for an early hominid to be able to predict how far in the future a critical event might occur; whether that was knowing when the sun might go down and nocturnal predators appear, or when a particular migratory species would be in the neighbourhood. We are clearly genetically hard-wired for understanding course-grained time, e.g. circadian rhythms, synodic months, the 4 seasons, and the solar year. And human cultural evolution has yielded many powerful and ubiquitous time-related memes, which have added a deep and fine-grained temporal awareness to the human condition, measured in seconds, minutes and hours. Almost every modern electronic device’s default screen displays the time and date, including phones, microwaves, computers, thermostats etc. Time is so ubiquitously available in our modern digital lives that the site of an analog wall clock will require a Tweet or post to Instagram. And everyone has a calendar of some sort that they use to manage their futures. We have clearly become the Children of Time.

Unfortunately, being the Children of Time has obviously not made us capable of even vaguely accurate estimation of the time any task of significant complexity will take. However, we are also terrible at estimating pretty much everything else, so I suspect this is not indicative of a specific limitation of our time-related neural circuitry. 

It is also our aforementioned parentage that limits the usefulness of Story Points and similar abstractions for estimating effort in general.  After some, typically short, period of time everyone on the team and all the stakeholders unconsciously construct a model in their minds that maps the unit of the abstraction back to time in hours or days. And as soon as this  model has been constructed they ostensibly go back to thinking in hours or days, though they now require an extra cognitive step to apply the necessary transformation.

So why bother with using the abstraction in the first place?

I have experimented with the use of estimation abstractions with teams in the past and I can confidently say that using abstractions has proven to be a distraction in the long run. I have settled on an approach that uses course-grained time buckets for initial estimates, e.g. 1, 5, 10, 25, 50 and 100 hours. The key performance indicator for estimation should a be a steady improvement in estimation accuracy over time, rather than the absolute accuracy of the estimate.

Accuracy in the software estimation process is emergent and improvements are predicated on iteration, increased familiarity with the code and the domain, and visibility into the historical data (and analysis thereof). Showing team members how far their estimates have been off over time, just before they estimate other work, is a good way to prime them, and give them an appropriate anchor.

I suspect that I will dance this dance again in the future.


  1. Have you had any success with the type of model that using Fogbugz for your task estimation tool espouses? Ideally, regardless of how you estimate, it tracks how you estimate vs actuals statistically over many projects until eventually, 6 or 7 (or 20) projects down the line it auto-corrects your estimation based upon what that estimation has meant related to actuals in the past. Ever narrowing in on the perfect estimates as it were.

    I'm not saying i have had success. the tool was great in theory but we were pretty ass at putting the estimates and/or actuals into the tool and therefore it couldn't build up the lexicon. when that organization went to Agile and adopted enough adherence to specific process that we could track the numbers we'd moved to Jira and Greenhopper which didn't have the same projecting capability from project to project.

    In theory, you should be able to analyse your capabilities for estimation and use that yourself to improve over time but i don't think that level of retrospective is done well. It strikes me that stories themselves might lend better to improving your estimation capability overall but that itself requires you to have an ability to normalize your stories effort-wise to gain any benefit. Normalizing stories is, i believe itself a myth because you don't really do the same thing over and over again in development and as such it's difficult to say from project to project that any particular task is going to take even roughly the same amount of effort. (or you lose your developers to some shop doing more interesting things and with the lost of those developers you lose that normalization again, in a different way)

    ah well. to summarize - accurate estimation at project start is a myth. start course and narrow as you go. pummel this fact into marketing and sales until they get it.

    1. Mike,
      I have experimented with using analysis of historical data to automatically adjust estimates, with limited success. I have found that continuously adjusting the model in each estimators brain yields far more accurate estimates over time. By showing estimators their historical Estimates versus Actual data often, and just before they provide subsequent estimates, you create a positive feedback loop by providing them with an anchor based on their own historical accuracy, rather than the last random number that they saw, e.g. the number 42 on my favorite t-shirt.
      There will always be a strong correlation between how long the estimator has been working with the code and the accuracy of their estimates; I don’t believe you can mitigate the negative impact of the loss of experienced estimator, no matter how sophisticated your tools are.
      Despite our inability to provide accurate estimates I still believe that early estimates are useful in planning. The trick is to make sure that stakeholders do not take those estimates as commitments. I have found that providing a Confidence, displayed right next to the estimate, goes a long way to achieving this. An early estimate of a 100 hour work item with a Confidence of 10% should set expectations appropriately.

  2. How long do you have to present how people estimated last time to them before they stop doing the knee-jerk jump in the opposite direction?

    Yeah, I agree on early estimation. If you don't give some level of effort to portfolio management they don't have that weight to help in their prioitization of projects. Without an idea of effort/cost for any particular item, management of your portfolio is pretty much a joke.

    I like your idea of including a confidence. I wonder how often confidence will go above 30%.