A site devoted to discussing techniques that promote quality and ethical practices in software development.

Monday, March 31, 2008

Some worrying experience with TrueCrypt 5.1

With a recommendation from Bruce Schneier, how can I not to give this Open-Source on-the-fly encryption tool a try.

In summary, my confident in this tool is severely dented and tested. It is not its encryption capability but in its operations:
  1. When it goes bad, it appears to lock up the machine and the machine seems to misbehave becoming uncooperative. Ever since I had my XP Tablet PC (512M running XP Pro with minimal AV running on demand and is set up to run LUA) for over 2 years, I cannot remember I have to force reboot it so frequently until I am testing TrueCrypt. It almost wears out my reset button on my tablet.
  2. Often, it is not a lock up but for some unknown reason it tkes a long time (~4 minutes) to allow the machine to come out of a comatose state. There is no sign that it will come out of this comatose state. All one can do is to wait and pray.
  3. When the file container is on a Network Share, copying files out of it produces some very worrying result. The only way to get the PC back to normal is by rebooting it.
  4. Unreliable in dealing with waking up from hibernation and running TrueCrypt.exe. It leads to cases where it cannot reconnect to the device driver in Traveler mode. As said, it is not happening all the time but the pattern leading to this has not yet been identified.
Since it is a tool that is designed to protect the most precious information, it should function as reliable as NTFS file system driver or even the Network driver. It should be rock solid and cannot misbehave like the scenarios I am elaborating below.

The facts of not knowing when it will fail, when I need to restart my machine as it cannot deal with waking up from hibernation, or when it locks up my machine have dented my confidence in this tool. The unknown is killing the confidence.

Mounting File Container from Network Share.

In this usage scenario, two PCs were involved. The remote machine, an XP Pro, contained the file container and a client machine, another XP Pro machine, was used to mount this container. The was never a chance that multiple client was mounting the same container. The exercise was to copy the entire contents, a directory tree full of PDF and chm files, from this container to a directory on the remote machine.

Each file in the container was not large (~24M) but there were many. XCopy command was used rather than the Windows Explorer.

After copying almost 80% from this container (2G), the client machine suddenly reported Delay Write Failed, Event ID 26 & 50. After that the Network Share was inaccessible though I could connect to it by IP address. The remote machine had not gone down at all as I could use RDP (Remote Desktop) to access it.

What was strange was that in Windows Explorer on the client machine it was showing the directory containing the container on the remote as being compressed and the file container was missing,


But the RDP session confirmed otherwise. After rebooting the client PC, the directory containing the file container on the remote machine and the container file were shown as not compressed.

What had gone wrong? Why was the Delay Write Failure causing the client machine to misbehave so badly.

Using it in Traveller Mode.

On my tablet PC, situations had been encountered when I tried to restart TrueCrypt.exe after the machine came out of hibernation only to encounter the following dialog boxes,
The report from System Information confirmed that TrueCrypt.sys had been stopped and hanging around.
This forced me to restart the machine defeating the purpose of hibernation.

I had been diligent to ensure all mounted volumes were dismounted and TrueCrypt.exe was terminated properly prior to put my machine into hibernation. What was causing this problem? Was it because the machine was in LUA? But then again, why prior situations of coming out of hibernation did not cause this problem?

The disturbing fact was that the machine has gone into hibernation and out of it for a number of times before encountering this problem. So far there did not appear to be any pattern to cause this problem and this unpredictable behavior was far worse than outright failure.

Lock up but technically not a lock up, if you wait long enough.

There were operations with TrueCrypt.exe or interactions with the mount drives that caused the Tablet PC (the Desktop XP did not seem to be affected by this) to appear to lock up. The behavior or parts of Tablet XP that seemed to be affected were:
  • the Windows Explorer, even the File Open Dialog box was affected when invoked in another program, such as MSPaint.
  • Process Explorer appeared to be unable to delete process,
  • inability to use Windows Explorer. Command line operations unhindered unless it was querying a mounted drive's directory information.
Initially, I thought my machine froze as I could not even delete the TrueCrypt process and had to cold boot the machine. Later, with patience waiting for a few minutes (~3-5 minutes), the lock state was finally released.

During that time, the CPU was just ticking over with plenty of reserve around and memory was not in demand. So what was happening? I suspected the TrueCrypt driver was the culprit.

Why was the problem so severe that I could not do anything but wait and for how long? Everything seemed to be held back until TrueCrypt came out of comatose state. The was totally no feedback to the user. Often I did not start Windows Explorer relying solely in command prompt/PowerShell until all volumes were mounted. Still the lock up affected the desktop and start button. Terrible.

At other time, when I either mounted a volume or dismount one, it took an inordinately long time (~3-4 minutes) before TrueCrypt came out of the operation. So long that the XP ghost window came into effect. This meant the TrueCrypt was blocked in a method call unable to reach the GetMessage() call to clear the Windows Messages. Sometime it was quick to mount but terribly long to make that drive accessible. Initially I thought TrueCrypt and/or my machine was frozen but patience was the key ingredients to get it working. This was not good enough! I could mount a USB drive in seconds and reliably every time. All my file containers were on my hard drive and so it should have better response than USB drive.

At the moment, I am only using TrueCrypt for testing until my confidence is restored and that its behavior is more predictable.

Tuesday, March 25, 2008

Get WSJ articles for Free

The other day, I was shown of a technique to get WSJ articles for nothing. Apparently WSJ releases some articles to allow Google to present them in the News section.

There are several issues with this technique:
  1. There are some links on www.wsj.com that are only accessible to subscribers and if you can't activate those pages, you cannot see the titles of those articles nor can you bring up portion of those pages. For example, the link for 'US Business' is only accessible to subscriber.
  2. The technique seems to be back to front under utilizing the power of Google's search engine.
Surely, there is a better way and indeed there is and below is a better way:
  1. Open your browser and go to www.google.com
  2. Then click on the 'News' link.
  3. On the News' page, click on 'Advanced News Search'
  4. On the Advanced News search option page, on the edit box for 'Return only articles from the news source named', type 'Wall Street Journal'
  5. Then press the Google Search button and you'll be presented with a list of WSJ free articles.
This is using the power of Google Search! If you are having trouble following those instructions, just click here for the list of articles.

As a paying subscriber to WSJ I can compare the completeness of the articles returned by Google and what are available to subscriber only. I can tell you that what's released for free is a small area and that paid subscription entitles access to many facility and data.

Still the availability of some articles which are free makes it easier to share information with others without violating copy right or subscription conditions.

Tuesday, March 18, 2008

CodeGear Delphi 2006.Net's TRegistry fails in Framework 2 SP1

An error has been detected when TRegistry.ReadString in Delphi 2006.Net is promoted to run in .Net Framework 2.0 SP1.

The error is the result of coding error in Borland's VCL.Net library code that is manifested into data corruption caused by Microsoft's tightening the compliance rule to conform to Unicode 5 in Framework 2 SP1.

This article will pin-point the exact cause in Borland's code. It is an very common coding error that has not been picked up in code review and the .Net Framework of the past has chosen to ignore that mistake. Thus masking out the coding error.

The VCL bug causes the use of TRegistry.ReadString() to return a string that has an additional Unicode character of value 0xFFFD appended to the end. This is the Unicode's standard replacement character whenever the encoder detects an invalid Unicode Character. The use of this character is the default action in the .Net Framework.

It is worth noting that Microsoft.Win32.RegistryKey.GetValue() for REG_SZ data does not produce this error and is not affected by the installation of Framework 2 SP1.

Let's begin the code review from TRegistry.ReadString(), which can be found in Borland.Vcl.Registry.Pas line 546.
function TRegistry.ReadString(const Name: string): string;
var
Len: Integer;
RegData: TRegDataType;
Buffer: TBytes;
begin
Len := GetDataSize(Name);
if Len > 0 then
begin
SetLength(Buffer, Len);
GetData(Name, Buffer, Len, RegData);
if (RegData = rdString) or (RegData = rdExpandString) then
begin
SetLength(Buffer, Len - 1); // <<--- Line(A) - The mistake. // .... end;
The coding error is located in the SetLength() as indicated. To understand why this is a mistake, we need to refer to the PInvoke declaration for the registry access function RegQueryValueEx(), which is the corner stone for GetDataSize() and GetData().

The declaration can be found in Borland.Vcl.Windows.Pas, line 21,265 and is reproduced in part here:
[SuppressUnmanagedCodeSecurity, DllImport(advapi32, CharSet = CharSet.Auto, SetLastError = True, EntryPoint = 'RegQueryValueEx')]
function RegQueryValueEx(hKey: HKEY; lpValueName: string;
lpReserved: IntPtr; ..... ): Longint; external;
According to MSDN documentation for CharSet.Auto, this declaration causes all strings to be marshaled as 2-byte Unicode strings and that it will be calling RegQueryValueExW variant of the RegQueryValueEx function.

According to the documentation for RegQueryValueEx(), the data returned from calling RegQueryValueExW() for type REG_SZ is a 2-byte Unicode string and the 6th parameter should contain the length of the string
If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, this size includes any terminating null character or characters unless the data was stored without them.
Also worth noting that the unit of this parameter is in bytes and not in characters. Therefore for a 2-byte Unicode string, this value is always even.

Now returning to Line(A) above. Since Buffer is of type TBytes, which is an array of bytes, if one subtracts 1 from the length of Buffer that is even, this will produce an odd number of bytes. The end result is in producing a nonsensical UTF-16 Unicode string, which is expected to compose of even number of bytes. Now, instead of ending with 2-bytes of zeros, the UTF-16 null terminator, the string now contains an odd byte of zero, which is clearly not a valid UTF-16 character.

According to the knowledge base article:
the trailing NULL byte was removed. However, now the NULL byte is converted to the Unicode replacement character.
As a result, a string returned from TRegistry.ReadString() was like this, for example, "C:\Program Files" now becomes "C:\Program Files\xFFFD" or in appearance like this "C:\Program Files�"

In conclusion, the extra character tagged onto the end is the result of Framework 2 SP1 highlighting the programming error in VCL library. As mentioned, Microsoft.Win32.RegistryKey class does not have this kind of mishandling in all versions of framework. It is not a bug in Framework 2 SP1.

If you have Delphi 2006.Net program, it is therefore recommended that you include an application configuration file containing the <supportedRuntime> element that constrains your application to run only in Framework 1 as a safety measure. Apparently this bug has been rectified in Delphi 2007.Net.

Thursday, February 28, 2008

No excuse for using misleading message.

There are two major share registrar companies in Australia, namely ComputerShare or LinkMarketServices. The former one is well built and its reports are comprehensive but I can't say too much of the latter, which is the theme of this topic.

If you want an example of a commercial enterprise hell-bend on misleading their customers, you need not go any further.

The Link updates its system during time when people in the Eastern Seaboard (about 11pm EST) are still up and definitely those on the Western Seaboard are just finishing off their dinner. During its updates, it has no visible sign to inform the user that it is doing an update and that the system is running with reduced capability. Perhaps the developers are too lazy to deal with error message properly. The update time is not published in their web site prominently.

Worse still as you will see, they are using totally misleading message to tell the user that he/she cannot access the data. Wouldn't a phrase like "System in Maintenance Mode. Features unavailable"? Instead this is what they throw up:

This is after they let you successfully logged into your account and when you click on one of the shares in your portfolio. This is a very dangerous messages. It could indicate system data corruption because you are being informed that the share you have selected has incorrect information. Who change this between now and several hours ago, where you could view the materials? What should the user do?

When I first encountered this message, I was about to delete the share and to re-enter.

You get the same message when you enter a new one. Hence you do not know if the details in the one that you've just entered are truly invalid or because the system developers too damn lazy to inform the users appropriately that the maintenance is in progress.

Perhaps the code in this system is in a mess and that exception handling is left to chance rather than methodologically dealt with.

Misleading users is just another form of bugs in the system. Link has updated its web site recently but apart from the sugar-lolly coating, it is still primitive and buggy underneath. The small consolation is that instead of misleading user with the session times out etc, it is now giving me a much shorter and equally misleading message.

Why software users have to put up with this kind of sloppiness and buggy rubbish. Is it trying to project a 24X7 operation but only skin deep?

As a comparison with ComputerShare, I have yet to be misled. This shows that a good share registrar web site without misleading the user is possible.

Saturday, February 23, 2008

Open Office 2 - coming of age?

I have been testing Open Office on and off and I must say the Open Office 2.3 is pretty impressive.

I am not here to give free advertisement extolling its specialties or features but to flag areas that are still raw:
1) Dialog box modality is still a problem. The program fails to disable the owned window allowing user to click or access the owned window when the dialog box is supposed to be the top level window. For example try Tools | Options... and while the dialog box is up, click on the resize button on the windows behind it. In a well designed Windows program, you cannot click onto the one behind it.

This weakness is the result of Java which still has great deal of difficulties in handling this simple UI task.

2) It is great to see a portable version of OpenOffice that I can take it with me on my USB drive. It only takes about 250M.

Sadly, it does not support Tablet PC's Tablet Input Panel (TIP). Without this floating panel, the only way is to use the Input Panel from the task bar, which has the side-effect of causing every visible window to resize.

Otherwise, it is quite steady and has not experienced any crashes. However, I must admit that I have not given it a reasonably large document to chew on.

Monday, February 18, 2008

Resistance of knowledge

It is amazing some of the topics discussed in a book by Ed Deming [DEMING] book first published in 1982 talking about factory production quality issues are as relevant today in IT industry as in car production factory then.

One of his observations, which unfortunately is still plaguing the IT industry is the "Resistance of knowledge" in an 'knowledge' industry. Deming observes:
"There is a widespread resistance of knowledge. Advances of the kind in Western industry require knowledge, yet people are afraid of knowledge. Pride may play a part in resistance to knowledge. New knowledge brought into the company might disclose some of our failings. A better outlook is of course to embrace new knowledge because it might help us to do a better job."
So true and precise in his assessment done so many years ago. Imagine if this simple act of obstinacy is eliminated by swallowing one's pride, how much greatness would be accomplished with so little expenditure! So little process needs to set in place.

IT industry is more prone to this kind of resistance because of the players' characters. As a result, one has to be vigilant against this disease to prevent it from blinding oneself.

I have a collection full of examples to substantiate this observation.




[DEMING] "Out of the Crisis" by W. Edwards Deming, MIT Press 2000

Thursday, January 24, 2008

2 Billion dollars to pay for ignoring sound principle

The recently exposure of the attack on the Dutch Transit System's electronic ticketing system should be a lesson for anyone contemplating implementing any form of security into their environment and system.

As Ed Felten dissects and analysis this mess, he concludes that:
Unmasking of the algorithm should have been no problem, had the system been engineered well. Kerckhoffs’s Principle, one of the bedrock maxims of cryptography, says that security should never rely on keeping an algorithm secret. It’s okay to have a secret key , if the key is randomly chosen and can be changed when needed, but you should never bank on an algorithm remaining secret.

Unfortunately the designers of Mifare Classic did not follow this principle. Instead, they chose to combine a secret algorithm with a relatively short 48-bit key.
[...]
This kind of disaster would have been less likely had the design process been more open. Secrecy was not only an engineering mistake (violating Kerckhoffs’s Principle) but also a policy mistake, as it allowed the project to get so far along before independent analysts had a chance to critique it. A more open process, like the one the U.S. government used in choosing the Advanced Encryption Standard (AES) would have been safer. Governments seem to have a hard time understanding that openness can make you more secure.
Perhaps the organization that designs and implements this system has been warned internally by people who is aware of this kind of principle, which can be found in any cryptography text, but chooses to ignore it. This is not an unusual reaction in many software organization.

Many manager also have the view that if you can program in one area of expertise you can program in any area.

I have encountered so many muttering like this: We can't crack this key or reverse engineer it, so it must be secure!

Ed Felten correctly identifies the other failure is the lack of checks and inspections in a system of this magnitude and importance. I am wondering how they can now argue that Inspection would cost their project more. This is a classic example that using Inspection (using subject experts of course) would have save $2 billion!

Blog Archive