Can We Estimate Software Development Time?

Chuck Connell

Background

n This talk is a response to a paper by J.P. Lewis titled Large Limits to Software Estimation in ACM Software Engineering Notes, July 2001.

n Lewis’s paper generated some good discussion within the software engineering community and on Slashdot.org.

n My article on this topic appeared on Developer.com and Slashdot in January 2002.

Outline

n Motivation for the topic

n Summary of Lewis’s paper

n My critique of Lewis’s paper

Why estimating matters

n Estimating the size of future projects is one of the most important areas of software engineering. (Size = $$, person-days, calendar days, opportunity cost.)

n Estimates are crucial to go/no-go decisions on every project. (Example: $1M project for this group.)

n Questions about the ability of SE to provide meaningful estimates call into question all plans for future projects. (And all current projects.)

SE status within CS

n Software engineering’s ability to estimate is part of a broader discussion about the overall validity of SE.

n If SE cannot make good estimates about future projects, what is the whole field doing?

n SE is frequently derided by computer scientists as unscientific, non-rigorous, or “just programmer sociology”.

n I address this broad question in Why Software Engineering Is Not B.S., inspired by a CS professor’s remark.

Lewis: what software is

n The behavior of any program can be thought of as a table of inputs and outputs.

n There is one row for each possible set of inputs. (Example: 0-3 x 0-3)

n Assumes finite number of finite inputs.

n Assumes we can represent human interaction as another finite/finite input.

n Assumes finite/finite output for each set of inputs.

n Above are reasonable assumptions for practical purposes.

Lewis: what software is

n Real software would have a very large table.

n Something like:

u 2³² raised to # of numeric inputs, times

u 2⁸ raised to # of text input chars

n But the table is finite and mechanically constructible.

Lewis: software spec as a string

n A software description is a finite table.

n Any such table can be transformed into a string.

n The string represents what we want the software to do.

n Example: using previous

n So a software specification is a finite string.

Lewis: estimating software tasks

n Estimating a software task is the same as asking, “How hard is it to write a program to create the spec string?”

n (We assume a small driver to call program with every input.)

n Lewis claims that writing a smaller program is easier/faster than writing a larger program.

n So, Lewis asserts that estimating is equivalent to asking: “What is the smallest program (in bytes) to create spec string?”

Lewis: complexity of estimating

n What is the smallest program to create spec string?

n By definition, this is identical to asking for the algorithmic complexity of the string (related to Kolmogorov complexity).

n But, there is a well-known result from complexity theory: There is no program to compute the algorithmic complexity of arbitrary strings.

n Therefore, there is no rigorous method for estimating software development!

n Lewis extends this result to estimates with a given percentage accuracy.

Summary of Lewis’s paper

n Software specs are tables / strings.

n It is faster to write small programs than large programs.

n Software estimates seek the algorithmic complexity of the specification string. (What is the smallest program to print out the spec table?)

n But there is no method for finding the algorithmic complexity of arbitrary strings.

n Therefore, rigorous/mechanical software estimates are not possible, even allowing % error.

My critique

n Lewis is “correct”, given the definitions he uses and the technical results he obtains, but…

n He mischaracterizes the programming process.

n He fails to show a relationship between algorithmic complexity and software development times.

n He mischaracterizes promises made by well-known development frameworks, especially the Software Capability Maturity Model (CMM).

Programming process

n Do programmers try to write the shortest possible program?

n Maybe, sometimes.

n But modern software engineering teaches that other aspects of software are more important: readability, modifiability, scalability, speed…

n The shortest program may not have these qualities.

n So programmers may not be trying to do something directly related to the AC of the spec table.

Estimation problem

n Assume Lewis’s description of software algorithmic complexity (AC) is well-defined…

n Does it take longer for people to write a program for a high AC task than for a lower AC task?

n The answer to the above must be YES, or AC is unrelated to the software estimation problem.

n But we don’t know the answer to this question, a priori.

Estimation problem

n Does it take longer for people to write a program for a high AC task than for a lower AC task?

n This is an empirical question for psychology.

n Maybe there are tricky low-AC tasks that are hard (long) to program…

n Maybe there are high AC tasks that a human can grasp quickly…

n So programming estimates may not be directly related to differences in AC.

Claims about estimating

n Lewis attacks a straw man: objective, mechanical estimating procedures with guaranteed accuracy.

n It is true that some writers (or book marketers) have appeared to offer such methods, but no serious researchers are trying to achieve this.

n Software engineering researchers are really looking for an estimating method that is somewhat accurate, most of the time.

Claims about estimating

n Imagine an estimating technique that is 80% right 80% of the time.

n For the other 20%, the estimate is worthless.

n Everyone in the software business would be leaping with joy!

n Such average-case algorithms are well-studied in computer science, and have been used for prime number tests and solving simultaneous equations.

n Lewis’s mathematical arguments do not disallow such a solution, because of the unbounded errors.

Claims about estimating

n In particular, Lewis claims that CMM promises guaranteed accuracy of software estimates, and provides a quote from CMM to show this.

n In fact, CMM promises no such thing; the quote is taken out of context.

n CMM does predict better estimates on average the more closely one follow CMM guidelines. (A claim which may or may not be true.)

n Lewis is not alone in this error; because CMM is so dominant, it is often perceived to be more rigid than it is.

How should we estimate software projects?

n 1. Expert judgment based on experience with the project particulars. (Example from Lotus NLM.)

n 2. Iterative development, with iterative estimates (e.g. study, POC, prototype, R1, R2, etc.)

n Iterative development accepts the fact that large, complex projects cannot be estimated up front. Therefore don’t do large, complex projects all at once.

n See Rapid Development by Steve McConnell.

Questions ?