Can We Estimate Software Development
Time?
Chuck Connell
Background
n
This
talk is a response to a paper by J.P. Lewis titled Large Limits to Software Estimation in ACM Software Engineering Notes,
July 2001.
n
Lewis’s
paper generated some good discussion within the software engineering community
and on Slashdot.org.
n
My article on this topic appeared on Developer.com and Slashdot in
January 2002.
Outline
n Motivation for the topic
n Summary of Lewis’s paper
n My critique of Lewis’s paper
Why estimating matters
n
Estimating
the size of future projects is one of the most important areas of software
engineering. (Size = $$, person-days, calendar days, opportunity cost.)
n
Estimates
are crucial to go/no-go decisions on every project. (Example: $1M project for
this group.)
n
Questions
about the ability of SE to provide meaningful estimates call into question all
plans for future projects. (And all current projects.)
SE status within CS
n
Software
engineering’s ability to estimate is part of a broader discussion about the
overall validity of SE.
n
If SE
cannot make good estimates about future projects, what is the whole field
doing?
n
SE is
frequently derided by computer scientists as unscientific, non-rigorous, or
“just programmer sociology”.
Lewis: what software is
n
The
behavior of any program can be thought of as a table of inputs and outputs.
n
There
is one row for each possible set of inputs. (Example: 0-3 x 0-3)
n
Assumes
finite number of finite inputs.
n
Assumes
we can represent human interaction as another finite/finite input.
n
Assumes
finite/finite output for each set of inputs.
n
Above
are reasonable assumptions for practical purposes.
Lewis: what software is
n Real software would have a very large table.
n Something like:
u 232 raised to # of numeric inputs, times
u 28 raised to # of text input chars
n But the table is finite and mechanically
constructible.
Lewis: software spec as a string
n A software description is a finite table.
n Any such table can be transformed into a string.
n The string represents what we want the software to do.
n Example: using previous
n So a software specification is a finite string.
Lewis: estimating software tasks
n
Estimating
a software task is the same as asking, “How hard is it to write a program to
create the spec string?”
n
(We
assume a small driver to call program with every input.)
n
Lewis
claims that writing a smaller program is easier/faster than writing a larger
program.
n
So,
Lewis asserts that estimating is equivalent to asking: “What is the smallest
program (in bytes) to create spec string?”
Lewis: complexity of estimating
n
What is
the smallest program to create spec string?
n
By
definition, this is identical to asking for the algorithmic complexity of the string (related to Kolmogorov
complexity).
n
But,
there is a well-known result from complexity theory: There is no program to compute the
algorithmic complexity of arbitrary strings.
n
Therefore,
there is no rigorous method for estimating software development!
n
Lewis
extends this result to estimates with a given percentage accuracy.
Summary of Lewis’s paper
n
Software
specs are tables / strings.
n
It is
faster to write small programs than large programs.
n
Software
estimates seek the algorithmic complexity of the specification string. (What is
the smallest program to print out the spec table?)
n
But
there is no method for finding the algorithmic complexity of arbitrary strings.
n
Therefore,
rigorous/mechanical software estimates are not possible, even allowing % error.
My critique
n
Lewis
is “correct”, given the definitions he uses and the technical results he
obtains, but…
n
He
mischaracterizes the programming process.
n
He
fails to show a relationship between algorithmic complexity and software
development times.
n
He
mischaracterizes promises made by well-known development frameworks, especially
the Software Capability Maturity Model (CMM).
Programming process
n
Do
programmers try to write the shortest possible program?
n
Maybe,
sometimes.
n
But
modern software engineering teaches that other aspects of software are more
important: readability, modifiability, scalability, speed…
n
The
shortest program may not have these qualities.
n
So
programmers may not be trying to do something directly related to the AC of the
spec table.
Estimation problem
n
Assume
Lewis’s description of software algorithmic complexity (AC) is well-defined…
n
Does
it take longer for people to write a program for a high AC task than for a
lower AC task?
n
The
answer to the above must be YES, or AC is unrelated to the software
estimation problem.
n
But we
don’t know the answer to this question, a priori.
Estimation problem
n
Does
it take longer for people to write a program for a high AC task than for a
lower AC task?
n
This is
an empirical question for psychology.
n
Maybe
there are tricky low-AC tasks that are hard (long) to program…
n
Maybe
there are high AC tasks that a human can grasp quickly…
n
So
programming estimates may not be directly related to differences in AC.
Claims about estimating
n
Lewis
attacks a straw man: objective, mechanical estimating procedures with
guaranteed accuracy.
n
It is
true that some writers (or book marketers) have appeared to offer such methods,
but no serious researchers are trying to achieve this.
n
Software
engineering researchers are really looking for an estimating method that is
somewhat accurate, most of the time.
Claims about estimating
n
Imagine
an estimating technique that is 80% right 80% of the time.
n
For the
other 20%, the estimate is worthless.
n
Everyone
in the software business would be leaping with joy!
n
Such average-case algorithms are well-studied in computer
science, and have been used for prime number tests and solving simultaneous
equations.
n
Lewis’s
mathematical arguments do not disallow such a solution, because of the
unbounded errors.
Claims about estimating
n
In
particular, Lewis claims that CMM promises guaranteed accuracy of software
estimates, and provides a quote from CMM to show this.
n
In
fact, CMM promises no such thing; the quote is taken out of context.
n
CMM
does predict better estimates on average the more closely one follow CMM
guidelines. (A claim which may or may not be true.)
n
Lewis
is not alone in this error; because CMM is so dominant, it is often perceived
to be more rigid than it is.
How should we estimate software
projects?
n
1.
Expert judgment based on experience with the project particulars. (Example from
Lotus NLM.)
n
2.
Iterative development, with iterative estimates (e.g. study, POC, prototype,
R1, R2, etc.)
n
Iterative
development accepts the fact that large, complex projects cannot be estimated
up front. Therefore don’t do large, complex projects all at once.
Questions ?