A site devoted to discussing techniques that promote quality and ethical practices in software development.

Friday, October 5, 2012

Microsoft's System.Xml.Serialization.XmlSerializer generates uncompilable code for some xs:double default values

Consider the following valid schema:
<xs:schema id="MyDemoSchema"
    targetNamespace="http://tempuri.org/MyDemoSchema.xsd"
    elementFormDefault="qualified"
    xmlns="http://tempuri.org/MyDemoSchema.xsd"
    xmlns:mstns="http://tempuri.org/MyDemoSchema.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
  <xs:complexType name="Foo">
    <xs:sequence>
      <xs:element name="Name" type="xs:string" minOccurs="0" maxOccurs="1" />
      <xs:element name="SomeNumber" type="xs:double" 
                  minOccurs="0" maxOccurs="1" 
                  default="NaN" />
    </xs:sequence>
  </xs:complexType>

  <xs:element name="Foo" type="Foo" />
</xs:schema>

The minute you execute the following piece of code:
Foo f = new Foo();
XmlSerializer ser = new XmlSerializer( typeof(Foo) ); // ---- Error CS0103 ---

will generate a CS0103 compilation error. This is caused by the CLR runtime in which it will at that moment generates a temporary serialization assembly that actually performs the serialization and deserialization process.

It is the code generation process that fails to handle W3C acceptable xs:double values such as NaN, INF and -INF producing uncompilable code.

In brief, but will provide a more detail expose below, is that it generates code like this:
   if( x != NaN ) { < --- causes CS0103
     // do something 
   }

Which naturally does not compile and it generates the same error message as when the XmlSerializer is instantiated. You can't even use the logical operator to test if the value of System.Double is Double.NaN.

This is the exact code fragment of the generated code causing the CS0103:
if (((global::System.Double)o.@SomeNumber) != NaN) {
    WriteElementStringRaw(@"SomeNumber", @"http://tempuri.org/MyDemoSchema.xsd", 
    System.Xml.XmlConvert.ToString((global::System.Double)((global::System.Double)o.@SomeNumber)));
}

There is no way this line can compile. It has to be corrected like this:
if ( !System.Double.IsNaN ((global::System.Double)o.@SomeNumber)  ) {
    WriteElementStringRaw(@"SomeNumber", @"http://tempuri.org/MyDemoSchema.xsd", 
    System.Xml.XmlConvert.ToString((global::System.Double)((global::System.Double)o.@SomeNumber)));
}

Detail Discussion and work around

To see how this manifested into a runtime error, you can use sgen.exe together with the /keep switch to look at what happen. The catch-22 is that if your schema's default value is NaN, sgen.exe will fail to generate the serialization dll because the generation process generates the above mentioned uncompilable code and you can't see the generated code.

So in order to see what is happening, you replace the default value with a double that you can recognised. In my experiment, I used -5.55.

Once your schema is modified, xsd.exe is used to generate C# classes that are bundled into an assembly, you can then use sgen.exe with the /keep switch to generate the serialization dll together with the source file. It also generates a file with the extension .cmdline which is essentially the response file for CSC.

Using our default value as search string, you can quickly locate the potentially offending piece of code. Below are the steps to prove that code generated by the serialization process is faulty and how to rectify it.

Now correct the line where it checks for -5.55 to using System.Double.IsNaN() as shown above.

The next step is to rebuild the serialization dll and to do that you need to modify the schema to change the default value back to NaN.

Then you use XSD to generate the class and rebuild your assembly.

Next you modify the cmdline file as follows:
  • remove the setting where it references System.Xml.dll to avoid duplicate reference compilation error. 
  • remove the /D: _DYNAMIC_XMLSERIALIZER_COMPILATION 
  • to aid experimentation change the /debug- to /debug.
Check the response file for the location of the CS file. You should execute this response file in that directory. For example if the source file is like this "OutTestDir\5fm3rmet.0.cs", you should run your CSC from the parent directory of "OutTestDir".

Next is to compile and generate the serialization dll which by now should have the offending line corrected using Double.IsNaN().

Next you modify the project for the dll that once called XmlSerializer constructor by
  • adding a reference to the serialization dll. 
  • then replace following piece of code where you instantiate an instance of XmlSerializer:
Foo f = new Foo();
StringBuilder buffer = new StringBuilder();
using( XmlWriter writer = XmlWriter.Create( buffer, null ) )
{
   XmlSerializer ser = new XmlSerializer( typeof(Foo) ); // <-- to be replaced
   ser.Serialize( writer, f );
}

With this:
Foo f = new Foo();
StringBuilder buffer = new StringBuilder();
using( XmlWriter writer = XmlWriter.Create( buffer, null ) )
{
   Microsoft.Xml.Serialization.GeneratedAssembly.FooSerializer ser = 
       new Microsoft.Xml.Serialization.GeneratedAssembly.FooSerializer();
   // XmlSerializer ser = new XmlSerializer( typeof(Foo) ); // <-- to be replaced
   ser.Serialize( writer, f );
}

If you have the PDB, you can easily step through the code to convince yourself that your fix indeed does work.

Let's hope Microsoft will fix this bug which is in .Net 4. May be .Net 5 has fixed this?

No comments:

Blog Archive