Do The Right Things: August 2009

Thursday, August 20, 2009

Is application of McCabe Complexity just another length reduction technique?

I was reading a book by Tom Demarco [1] that has made some comments about McCabe Cyclomatic Complexity [2] that I believe indicate he has misread the theory and showing off his lack of knowledge in this area. He said this:

... But when you let it guide you to produce alternate designs with lower values, you almost always end up dividing the modules into smaller ones. Now that is a pretty sensible thing to do in general, but it is not really a complexity reduction technique. It's a length reduction technique. You could do the same thing with a much simple metric, like line of code: If a module has too many lines divide it up. V(G) is, most of all, a length metric. Longer modules have higher V(G). If you factor out the hidden effect of length, V(G) is relatively meaningless.

Before I discuss many incorrect assertions in the above paragraph, Tom DeMarco also made the following remark:

... Tom McCabe's Cyclomatic Complexity - also called V(G) for some reason now forgotten.

Well sorry DeMarco, it has not been forgotten. For DeMarco's information V(G) is a standard notation used in Graph theory [3] and it stands for vertices of G. It is not a notation invented by McCabe. In fact, possibly the lack of familiarity with this notation and the graph theory explain why DeMarco makes such a rash and incorrect observation and from these mistakes one can safely assume he has not even used McCabe Complexity metric in even toy projects.

Let's consider his assertion: "It's a length reduction technique. You could do the same thing with a much simple metric, like line of code". Could you? What is better to start this debate than by stating Tom McCabe's expression of V(G) [2]:

V(G) = e - n + p
where
e = number of edges in a graph
n = number of vertices
p = number of connected components.

A graph is not about the number of assignment, line of expressions or function calls but it is made up of branching and logical expressions that causes transitions from one block of code to another, thus the edges. Where is line of code in the above expression, Tom DeMarco?

The following block of code gives some idea of complexity & LOC:


public static Int32 f1( Int32 a, Int32 b, Int32 c, Int32 d, Int32 e )
{
Int32 t = a;

t = a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + 3;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
t = t + a + 2 * b + c + d;
return t;
}

Which contains 23 statements, thus LOC=23, but according to McCabe, it only yields a CC=1. It does not matter how complex or how simple is the expression on the right hand side of the assignment it remains a CC=1. It could be all repeatedly initializing t to say 0 or it could be a very complicate polynomial expression of any length involving the function arguments CC remains 1 while LOC can increases to very large number.

No matter how Tom DeMarco slices and dices the above block, even down to functions with one line, thus LOC=1, CC is still 1. Hence this invalidates his wild claim.

Now consider this block containing 4 statements, thus LOC=4:


public static Int32 f2( Int32 a, Int32 b, Int32 c, Int32 d, Int32 e )
{
if( (a == b && (c != 5 || d != 6 ) )  ||
    (c == e && (e == 2 * a && c > b && d  == 3 * a ) ) )
    return 23;
else
    return 0;
}

Which yields a CC=9.

If Tom DeMarco's assertion that "You could do the same thing with a much simple metric, like line of code" is right, then the first block of code is more complex than the second, according to DeMarco. Clearly that is not true! It does not take much brain power to scan through say 23 initialization statements but to come up with an idea of what is doing in the second block takes a bit of comprehension.

In general, the technique to reduce CC is to make the code more readable to human as machine could not careless by rewriting the logical or control expressions using meaningful function names so that instead of just stating the logical expression in its raw form, the function names help the reader to read the meaning of the expression. It is kind of like literate programming. Or one could use this to make your code more readable and to reduce the CC.

To prove this point and to show that CC and LOC are unrelated, let's refactor the function f2() above in accordance to the recommendation to produce f3() as follows:


public static Int32 f3( Int32 a, Int32 b, Int32 c, Int32 d, Int32 e )
{
  bool danger = ( a == b && ( c != 5 || d != 6 ) );
  bool risky = (e == 2 * a);
  bool makesLotOfMoney = (c > b && d == 3 * a);
  if ( danger ||
      ( c == e && ( risky && makesLotOfMoney ) ) )
          return 23;
  else
      return 0;
}

This function has a LOC=7 and increase from 4 but a reduction of CC from 9 to 6. In according to DeMarco's claim, I have in fact increases the complexity read LOC. This simple experiment clearly refutes DeMarco's unsubstantiated wild claims and his remarks should be discarded.

The above demonstration clearly shows the lack of relationship between LOC and CC. As a result, we can safely discard DeMarco's assertion as being without foundation.

I have great respect for many of his other writings and therefore I am shocked by so many false statements in just one paragraph of this book [1].

[1] "Why does software cost so much? And other puzzles of the information age" Tom DeMarco, Dorset House Publishing 1995

[2] "A Complexity Measure" by Thomas J. McCabe, IEEE Trans. Software Engineering, Vol SE-2, No. 4, Dec 1976, pp 308-320.

[3] "Discrete Mathematics, 2nd Edition", K.A. Ross and C.R.B. Wright, Chapter 8, Prentice-Hall International Editions, 1988

Monday, August 17, 2009

How to register a TLB?

During one conversation, the topic of how to register a TLB in com deployment came up and in the past, I used a less widely publicized program called RegTLIB.exe and from memory, it was installed into my machine as part of Word upgrade or Service Pack or something like that.

Since then, I have lost this program. It is rare to simply distribute the TLB and normally it is better to be embedded into a DLL, a kind of resource only DLL, to leverage the packaged registration script so that user can use the well known system tool Regsvr32.exe to register the TLB.

You can still distribute the TLIB for development but you do not register it unless you use importlib statement in your COM's IDL to import a foreign type library. There is a caveat between importlib and #import statement in developing your COM IDL and it is this:

Note that the imported library, as well as the generated library, must be distributed with the application so that it is available at run time.

The emphasis is on the last four words - available at run time. Failure to do that will cause a run time error. If you are not sure, use OleView to examine your DLL or TLB to see if it uses importlib and how it is used.

Anyway for those that have used importlib to import typelib rather than importlib of a DLL containing the type library, you therefore have to register the type library using a not very commonly available tool. Hence if you are developing COM component, even if you are publishing common interfaces, put the tlb inside the DLL and makes life easy for everyone. While importlib of a DLL does not require you to register the imported type library, you still have to register the imported DLL using RegSvr32. The difference is basically which tool you use to make the type library available at run time - RegTlib or Regsvr32.

In those situations you must register the type library, a lifeline is being offered by .Net framework 2. Go to "%windir%\Microsoft.net\framework\v2.0.50727" and you will find a program called RegTLibv12.exe. Do not expect to find too much information on how to drive this program and MSDN site is devoid of anything even though it has been mentioned several times in Aaron Stebner's blog.

Experimentations with this program tells me this:
* To register MyComStuff.tlb do this:
RegTlibV12 MyComStuff.tlb
* To unregister MyComStuff.tlb do this:
RegTlib -u MyComStuff.tlb
* This program, while distributed with .Net Framework 2, does not use any supporting DLL in the "%windir%\Microsoft.net\framework\v2.0.50727" directory and hence you can safely use it on a machine without .Net 2. Besides, it is a native code program.

Monday, August 10, 2009

WSJ trying to annoy readers the dumb way.

I have been collecting RSS feed from my WSJ subscription using the following location:
http://feeds.wsjonline.com/wsj/xml/rss/3_7013.xml

Randomly, WSJ returns articles that really annoy their readers. For example this article appears black over most of the text like this:

Don't dispair, WSJ is not all that smart. To read this, simply type Ctrl-A to select all the text and there you are:

If you do not want to do Ctrl-A, simply delete "public/b" from the URL and you can read it again.

I have no idea why WSJ is doing this kind of stupid thing just to annoy people, could it be Rupert Murdoch, owner of WSJ, is experimenting with different way to annoy people into paying for his online information?

Tuesday, August 4, 2009

Do software systems age analogously to humans?

Having worked in software industries for a long time, this question, posted in an introduction to a paper by the noted and respected Computer Scientist, David Parnas, in a book seems kind of obvious to me and no doubt to many practitioners. However, there are software companies that still hold this belief of "Design once last forever" philosophy and need not change at all. They are also unfortunately pursuers of this illusive Silver Bullet that another noted and respected scientist, Fredrick Brooks, has warned. They naturally do not believe software can age.

Having worked in such a nescient organisation, I am interested to read what Parnas has to say when I come across his paper "Software Aging". There are specifics that I believe worth mentioning here.

One of the good ways to slow down or even to deal with changes, is to design to cater for changes using well-known design principles. In his research and experience, sadly "I do not see much software that is well designed from this point of view" and he gave some of the reasons why:

... The principle is simple; applying it properly requires a lot of thoughts about the application and the environment. The textbooks do not make that clear.

Many programmers are impatient with such considerations; they are so eager to get the first version working or to meet some imminent deadline, that they do not take the time to design for change. Management is so concerned with the next deadline (and so eager to get to a higher position) that future maintenance costs don't have top priority.

Designs that result from a careful application of information hiding are quite different from the "natural" designs that are the result of most programmers' intuitive work.....

Programmers tend to confuse design principles with languages.... Even worse, they think that one has applied the techniques, if one has used such a language. [Sadly, in my professional experience many managers are the big champions of this (mis)belief creating massive software problems and wastes.]

Many people who are doing software development, do not have an education appropriate to the job.

Italics text are my comments.

One of the topic that he commented on that I share passionately that is often not done in Software Industry as compared with others, such as other forms of engineering and medicines is the topic of reviews despite its effectiveness supported by abundance of well publicized materials. Robert Glass even proclaims that it is "being a breakthrough as anything we have". Parnas said

it is quite astonishing to see how often commercial programs are produced without adequate reviews. There are many reasons for this:
Many programmers have no professional training in software at all. Some are engineers from other fields; some are "fallen scientists" who learned programming incidentally while getting education... In many of those areas, the concept of preparing and holding a design review is nonexistent.
Even among those that have Computer Science degrees many have had an education that neglected such professional concerns as the need for design documentation and reviews....
Many practitioners (and many researchers) do not know how to provide readable precise documentation of a design, as distinct from an implementation. No precise description, other than the detailed code, is available for review. Design reviews early in a project, when they would do the most good, are often reduced to chat sessions....
Much of software is often produced as a cottage industry, where there are no people who could serve as a qualified reviewers ....
Software is often produced under time pressure that misleads the designers into thinking that they have no time for proper reviews.
Many programmers regard programming as an "art" and resent the idea that anyone could or should review the work that they have done.

I also share Parnas' concern that "it is common to hear a programmer saying that the code is its own documentation; even highly respected language researchers take this position, arguing that if you use their language, the structure will be explicit and obvious"

My argument with proponent of this view is that code does not document a design; it merely documents an implementation of a design decision. There are plenty of materials a designer has discarded that are vitally important to the maintenance of the implementation but are not and cannot be translated to code. For example, technologies or algorithms that a designer has considered by discarded and for what reasons are not found in code and cannot be for practical reasons. Design decisions why it is done this way and not another are not found in code; in fact, lack of supplemental information that we called documentation one could be misled from reading code. They cannot be deduced from code no matter what language one uses or how clever one uses the language construct to aid documentation.

Parnas went on to say "Even if we take all reasonable preventive measures, and do so religiously, aging is inevitable." and in that paper he offers many practical advices to deal with software aging, including "start taking documentation more seriously".

Daniel M. Hoffman and David M. Weisss, "Software Fundmentals - Collected Papers by David L. Parnas" 2001. Chapter 29 "Software Aging"

Robert L. Glass, "Facts and Fallacies of Software Engineering", page 104

Sunday, August 2, 2009

Customers/Consumers of electronic services always considered last

Recent report of the way Amazon resolving an essentially business transaction with its supplier just simply highlights the power of supplier of electronic services over customers or consumers that this kind of unfettered power is not permitted in other forms of service provision.

Why should something rendered as 1-and-0 be allowed to be treated any differently from something tangible? If it is tangible, Amazon will be prosecuted for break-and-entering and thief. But when it is delivered in 1-and-0's, it escapes scot free. Why is this be allowed or difference exist? Apple is a master in this kind of kind of practices.

This kind of attitude - it is better to get the customers to foot the bill for the provider's problems or mistakes - is wide spread in software world and it is time the license should be written to provide a more level playing field, subjecting to commercial software to public scrutiny is a start.

Do The Right Things