I fairly recently became embroiled in an argument about whether Story Points or hours are better for estimating the effort associated with Software Engineering tasks. Having helped a lot of teams adopt Scrum and other agile practices, this is not the first time I have danced this dance, and my experience has left me with a strong preference for using hours rather than an abstraction.
The motivations for using Story Points (or any other abstraction for that matter, e.g. T-shirt Sizes) to estimate effort, seem very reasonable and arguments for their use are initially very compelling. Consistent high-accuracy software estimation is probably beyond current human cognitive capability, so anything that results in improvements, even small ones, constitutes a good thing.
The primary motivation for using Story Points is that they represent a unit of work that is invariant in the face of differing levels of technical skill, differing levels of familiarity with the code or domain, the time of day or year, or how much caffeine either the estimator or developer doing the work has ingested. They also provide a typically more course-grained unit of estimation than hours, which by necessity will result in more apparently accurate estimates. By combining this course-grained unit of work, with mandatory refactoring of Stories (or Epics, or Product Backlog Items, or whatever nomenclature you choose to use) larger than a particular effort size, a team is bound to improve the accuracy of their estimates.
The use of estimation abstractions also seem to be beneficial when a team follows the Principle of Transparency, which is espoused by most Agile philosophies. When a team follows this principle, they make the team’s velocity, estimates, actuals and other data available to all stakeholders (e.g. Sales, Marketing, Support and Management), who almost invariably care a great deal about the work items that they have a stake in, and particularly when those work items will be DONE. By using Story Points for estimates one initially avoids setting unrealistic expectations with stakeholders, who may not necessarily understand the prevalence of emergent complexity in the creation of software.
I would imagine that the human brain has a lot of deep circuitry designed exclusively to deal with time. It was clearly highly adaptive for an early hominid to be able to predict how far in the future a critical event might occur; whether that was knowing when the sun might go down and nocturnal predators appear, or when a particular migratory species would be in the neighbourhood. We are clearly genetically hard-wired for understanding course-grained time, e.g. circadian rhythms, synodic months, the 4 seasons, and the solar year. And human cultural evolution has yielded many powerful and ubiquitous time-related memes, which have added a deep and fine-grained temporal awareness to the human condition, measured in seconds, minutes and hours. Almost every modern electronic device’s default screen displays the time and date, including phones, microwaves, computers, thermostats etc. Time is so ubiquitously available in our modern digital lives that the site of an analog wall clock will require a Tweet or post to Instagram. And everyone has a calendar of some sort that they use to manage their futures. We have clearly become the Children of Time.
Unfortunately, being the Children of Time has obviously not made us capable of even vaguely accurate estimation of the time any task of significant complexity will take. However, we are also terrible at estimating pretty much everything else, so I suspect this is not indicative of a specific limitation of our time-related neural circuitry.
It is also our aforementioned parentage that limits the usefulness of Story Points and similar abstractions for estimating effort in general. After some, typically short, period of time everyone on the team and all the stakeholders unconsciously construct a model in their minds that maps the unit of the abstraction back to time in hours or days. And as soon as this model has been constructed they ostensibly go back to thinking in hours or days, though they now require an extra cognitive step to apply the necessary transformation.
So why bother with using the abstraction in the first place?
I have experimented with the use of estimation abstractions with teams in the past and I can confidently say that using abstractions has proven to be a distraction in the long run. I have settled on an approach that uses course-grained time buckets for initial estimates, e.g. 1, 5, 10, 25, 50 and 100 hours. The key performance indicator for estimation should a be a steady improvement in estimation accuracy over time, rather than the absolute accuracy of the estimate.
Accuracy in the software estimation process is emergent and improvements are predicated on iteration, increased familiarity with the code and the domain, and visibility into the historical data (and analysis thereof). Showing team members how far their estimates have been off over time, just before they estimate other work, is a good way to prime them, and give them an appropriate anchor.
I suspect that I will dance this dance again in the future.