A site devoted to discussing techniques that promote quality and ethical practices in software development.

Thursday, August 20, 2009

Is application of McCabe Complexity just another length reduction technique?

I was reading a book by Tom Demarco [1] that has made some comments about McCabe Cyclomatic Complexity [2] that I believe indicate he has misread the theory and showing off his lack of knowledge in this area. He said this:
... But when you let it guide you to produce alternate designs with lower values, you almost always end up dividing the modules into smaller ones. Now that is a pretty sensible thing to do in general, but it is not really a complexity reduction technique. It's a length reduction technique. You could do the same thing with a much simple metric, like line of code: If a module has too many lines divide it up. V(G) is, most of all, a length metric. Longer modules have higher V(G). If you factor out the hidden effect of length, V(G) is relatively meaningless.
Before I discuss many incorrect assertions in the above paragraph, Tom DeMarco also made the following remark:
... Tom McCabe's Cyclomatic Complexity - also called V(G) for some reason now forgotten.
Well sorry DeMarco, it has not been forgotten. For DeMarco's information V(G) is a standard notation used in Graph theory [3] and it stands for vertices of G. It is not a notation invented by McCabe. In fact, possibly the lack of familiarity with this notation and the graph theory explain why DeMarco makes such a rash and incorrect observation and from these mistakes one can safely assume he has not even used McCabe Complexity metric in even toy projects.

Let's consider his assertion: "It's a length reduction technique. You could do the same thing with a much simple metric, like line of code". Could you? What is better to start this debate than by stating Tom McCabe's expression of V(G) [2]:
V(G) = e - n + p
where
e = number of edges in a graph
n = number of vertices
p = number of connected components.
A graph is not about the number of assignment, line of expressions or function calls but it is made up of branching and logical expressions that causes transitions from one block of code to another, thus the edges. Where is line of code in the above expression, Tom DeMarco?

The following block of code gives some idea of complexity & LOC:

public static Int32 f1( Int32 a, Int32 b, Int32 c, Int32 d, Int32 e )
{
Int32 t = a;

t = a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + 3;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
return t;
}
Which contains 23 statements, thus LOC=23, but according to McCabe, it only yields a CC=1. It does not matter how complex or how simple is the expression on the right hand side of the assignment it remains a CC=1. It could be all repeatedly initializing t to say 0 or it could be a very complicate polynomial expression of any length involving the function arguments CC remains 1 while LOC can increases to very large number.

No matter how Tom DeMarco slices and dices the above block, even down to functions with one line, thus LOC=1, CC is still 1. Hence this invalidates his wild claim.

Now consider this block containing 4 statements, thus LOC=4:

public static Int32 f2( Int32 a, Int32 b, Int32 c, Int32 d, Int32 e )
{
if( (a == b && (c != 5 || d != 6 ) ) ||
(c == e && (e == 2 * a && c > b && d == 3 * a ) ) )
return 23;
else
return 0;
}
Which yields a CC=9.

If Tom DeMarco's assertion that "You could do the same thing with a much simple metric, like line of code" is right, then the first block of code is more complex than the second, according to DeMarco. Clearly that is not true! It does not take much brain power to scan through say 23 initialization statements but to come up with an idea of what is doing in the second block takes a bit of comprehension.

In general, the technique to reduce CC is to make the code more readable to human as machine could not careless by rewriting the logical or control expressions using meaningful function names so that instead of just stating the logical expression in its raw form, the function names help the reader to read the meaning of the expression. It is kind of like literate programming. Or one could use this to make your code more readable and to reduce the CC.

To prove this point and to show that CC and LOC are unrelated, let's refactor the function f2() above in accordance to the recommendation to produce f3() as follows:

public static Int32 f3( Int32 a, Int32 b, Int32 c, Int32 d, Int32 e )
{
bool danger = ( a == b && ( c != 5 || d != 6 ) );
bool risky = (e == 2 * a);
bool makesLotOfMoney = (c > b && d == 3 * a);
if ( danger ||
( c == e && ( risky && makesLotOfMoney ) ) )
return 23;
else
return 0;
}
This function has a LOC=7 and increase from 4 but a reduction of CC from 9 to 6. In according to DeMarco's claim, I have in fact increases the complexity read LOC. This simple experiment clearly refutes DeMarco's unsubstantiated wild claims and his remarks should be discarded.

The above demonstration clearly shows the lack of relationship between LOC and CC. As a result, we can safely discard DeMarco's assertion as being without foundation.

I have great respect for many of his other writings and therefore I am shocked by so many false statements in just one paragraph of this book [1].



[1] "Why does software cost so much? And other puzzles of the information age" Tom DeMarco, Dorset House Publishing 1995

[2] "A Complexity Measure" by Thomas J. McCabe, IEEE Trans. Software Engineering, Vol SE-2, No. 4, Dec 1976, pp 308-320.

[3] "Discrete Mathematics, 2nd Edition", K.A. Ross and C.R.B. Wright, Chapter 8, Prentice-Hall International Editions, 1988

No comments:

Blog Archive