Wednesday, December 2, 2009

Treat String containing '\0' with care

Recently I had to decode a sequence of octets representing the UCS2 of some data and the sequence contained the C-Style string's null terminator. The sequence of octets were represented in C# as Byte[] naturally and Encoding.Unicode.GetString() was used to decode the sequence into a Unicode string.

The decoding process did not trim off the terminating '\0'. So it produced a Unicode string like this "Hello World\0".

During testing, several interesting things were discovered that could have significant impact on your code and the validity of your Unit Tests:
1) If you write out the string to console like this:
String s = "Hello World\0"; 
Console.WriteLine( "This is the Unicode String = '{0}'", s );

Your output will not contain the trailing '.

2) If you use write out several lines like this:
String s = "Hello World\0"; 
Console.WriteLine( "This is the Unicode String = '{0}'", s );
Console.WriteLine( "See anything strange?" );

It will produce a line like this:
This is the Unicode String = 'Hello WorldSee anything strange?

The Console.WriteLine() removes the ending ' and the newline.

3) Make sure you use the correct version of String.Compare() and use the correct comparison type as specified by the StringComparison enumeration:
String s0 = "Hello World"; 
String s1 = s0 + "\0";

Assert.IsFale( s0.EndsWith( "\0" ); // Just to check 
Assert.IsTrue( s1.EndsWith( "\0" ); // Just to check
Assert.IsTrue( s0.Equals( s1 )==false ); 
Assert.IsTrue( String.Compare( s1, s0 )!=0 );     // Fail but should Pass
Assert.IsTrue( String.Compare( s1, s0, false )!=0 ); // Fail but should Pass
Assert.IsTrue( String.Compare( s1, s0, StringComparison.Ordinal )!=0 );
Assert.IsTrue( String.Compare( s1, s0, 
   StringComparison.InvariantCulture )!=0 ); // Fail but should Pass

The lines in red are incorrectly processed.

Be careful when using NUnit's Assert.AreEqual(). In version 2.4.x, the AreEqual() incorrectly handles this case resulting in passing the comparison. This mistake has been fixed in ver 2.5. MSTest's AreEqual() correctly handles this case.

This illustrates the need to exercise care when comparing strings.

No comments:

Post a Comment