In the last several days, I was deeply investigating the WCF and in particular with the
DataContract and the serialization issues using
DataContractSerializer.
I constructed a very simple data type like this:
namespace Bookshop
{
[ DataContract ]
public class Book
{
string title;
List<Person> authors = new List<Person>();
public Book() {
Debug.WriteLine( "Book Default C'tor" );
}
public Book( string title, params Person[] authors ) {
Debug.WriteLine( "Book Other C'tor" );
this.title = title;
this.authors.AddRange( authors );
}
[ DataMember (Name="ti") ]
public String Title {
get{ return this.title; }
set{
Debug.WriteLine( "Title set" );
this.title = value;
}
}
[ DataMember ( Name="au" )]
public Person[] Authors {
get{
Debug.WriteLine( "Getting an array of Persons" );
return this.authors;
}
set {
Debug.WriteLine( "Authors set" );
//if ( this.authors == null )
// this.authors = new List();
//else
// Why NullReferenceException?
this.authors.Clear();
this.authors.AddRange( value );
}
}
}
//=========================================================
[ DataContract ]
public class Person {
string firstName;
string lastName;
Int32 id;
public Person( Int32 id, string firstName, string lastName ) {
this.id = id;
this.firstName = firstName;
this.lastName = lastName;
}
[ DataMember ]
public Int32 Id {
get{ return this.id; }
set{ this.id = value; }
}
[ DataMember ( Name="fn" ) ]
public String FirstName {
get{ return this.firstName; }
set{ this.firstName = value; }
}
[ DataMember (Name="ln") ]
public String LastName {
get { return this.lastName; }
set { this.lastName = value; }
}
}
}
The classes look contrived but that is besides the point. The coding style and usage are inline with recommendation.
However, during a unit test like this:
[Test]
public void TestSerializeAndDeserializeBook()
{
Book book = new Book( "Testing Serialization and Deserialization",
new Person( 100, "Tom", "Smith" ),
new Person( 101, "Peter", "Pan" ) );
DataContractSerializer ser = new DataContractSerializer( typeof(Book) );
using( Stream stm = new MemoryStream() )
{
ser.WriteObject( stm, book );
stm.Seek( 0, SeekOrigin.Begin );
Debug.WriteLine( "Begin to deserialize" );
Book regen = ser.ReadObject( stm ) as Book;
Assert.IsTrue( Object.ReferenceEquals( book, regen )!=true,
"Expecting them to be distinct objects" );
Debug.WriteLine( "Asserting regenerated Book object" );
AssertBook( book, regen );
}
}
The test reveals some runtime mistake that only manifests in a deserialization situation. I will deal with that first and my solution later. I was puzzled by this.
I have obviously forgotten my reading on
ECMA-335 and have only remember section 8.9.6.6 (Partition I):
New values of an object type are created via constructors. Constructors shall be instance methods, defined via a special form of method contract, which defines the method contract as a constructor for a particular object type. The constructors for an object type are part of the object type definition. While the CTS and VES ensure that only a properly defined constructor is used to make new values of an object type, the ultimate correctness of a newly constructed object is dependent on the implementation of the constructor itself.
What I have forgotten is that there is a CLS (framework) notes attached to this section that reads:
[Note:
[...]
CLS (framework): Can assume that object creation includes a call to one of the constructors, and that no object is initialized twice. System.Object.MemberwiseClone (see Partition IV) and deserialization (including object remoting) shall not run constructors. end note]
This note gives permission to deserializer not to run the constructor. This explains precisely why in the above unit test that a NullReferenceException is encountered when deserializing the Book object from the stream. Drilling deeper reveals the exception is being thrown when I try to clear the List
in the property setter of Authors.
In normal usage, the Book.authors initializer is guaranteed to look after that and is supposed to be executed before the first line of code in the body of the constructor. But because a deserializer is permitted to not to call the constructor, the initialization code for Book.authors is skipped thus become null. Interesting and more on this.
So digging further in
ECMA trying to understand the mechanics behind this issue, I have found this Partition III section 4.5, there is an IL instruction called
initobj, which simply
initializes an address with a default value. typeTok is a metadata token (a typedef, typeref, or typespec). dest is an unmanaged pointer (native int), or a managed pointer (&). If typeTok is a value type, the initobj instruction initializes each field of dest to null or a zero of the appropriate built-in type. If typeTok is a value type, then after this instruction is executed, the instance is ready for a constructor method to be called. If typeTok is a reference type, the initobj instruction has the same effect as ldnull followed by stind.ref.
Unlike newobj, the initobj instruction does not call any constructor method.
and the
newobj (in Section 4.21)
allocates a new instance of the class associated with ctor and initializes all the fields in the new instance to 0 (of the proper type) or null as appropriate. It then calls the constructor with the given arguments along with the newly created instance. After the constructor has been called, the now initialized object reference is pushed on the stack.
From the constructor’s point of view, the uninitialized object is argument 0 and the other arguments passed to newobj follow in order.
All zero-based, one-dimensional arrays are created using newarr, not newobj. On the other hand, all other arrays (more than one dimension, or one-dimensional but not zero-based) are created using newobj.
Value types are not usually created using newobj. They are usually allocated either as arguments or local variables, using newarr (for zero-based, one-dimensional arrays), or as fields of objects. Once allocated, they are initialized using initobj. However, the newobj instruction can be used to create a new instance of a value type on the stack, that can then be passed as an argument, stored in a local, etc.
So in most situations where one creates an object using new operator,
newobj instruction is used. But in a deserialization, the standard permits the writer to by-pass this object creation using
newobj instructions but to use
initobj to initialise the member variables without calling the constructors.
There were several solutions offered to me. One of them was to create the List<person> in the property setter of Authors and then use property instead of member field inside Book. While in this contrive class, this is a valid solution but note the property type of Authors, which is an array of Person and not List<person>. One of the attraction of introducing DataContract/DataMember attributes in WCF is to allow developers to reuse as much as possible their business classes (I doubt this really works in practice - more on this) and hence the Book class may be serving other purpose. So it is no a solution but not practical in general.
Another solution is to create the List
in each constructors as well as in the property setter of Authors if it is null. Doing that in the constructors will cater for the normal usage while doing that in the property setter will work for deserialization. But the coding style is not very appealing and in fact, it is identical to what is generated by the C# compiler from the initializer. From this:
public class Book
{
string title;
List<Person> authors = new List<Person>(); public Book()
{
}
public Book( string title, params Person[] author )
{
this.title = title;
this.authors.AddRange( author );
}
to this
.class public auto ansi beforefieldinit Book
extends [mscorlib]System.Object
{
.custom instance void [System.Runtime.Serialization]System.Runtime.Serialization.DataContractAttribute::.ctor()
.method public hidebysig specialname rtspecialname instance void .ctor() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: newobj instance void [mscorlib]System.Collections.Generic.List`1::.ctor()
L_0006: stfld class [mscorlib]System.Collections.Generic.List`1 LMar.Demo.Bookshop.Book::authors L_000b: ldarg.0
L_000c: call instance void [mscorlib]System.Object::.ctor()
L_0011: nop
L_0012: nop
L_0013: nop
L_0014: ret
}
.method public hidebysig specialname rtspecialname instance void .ctor(string title, class LMar.Demo.Bookshop.Person[] author) cil managed
{
.param [2]
.custom instance void [mscorlib]System.ParamArrayAttribute::.ctor()
.maxstack 8
L_0000: ldarg.0
L_0001: newobj instance void [mscorlib]System.Collections.Generic.List`1::.ctor()
L_0006: stfld class [mscorlib]System.Collections.Generic.List`1 LMar.Demo.Bookshop.Book::authors
L_000b: ldarg.0
L_000c: call instance void [mscorlib]System.Object::.ctor()
L_0011: nop
L_0012: nop
L_0013: ldarg.0
L_0014: ldarg.1
L_0015: stfld string LMar.Demo.Bookshop.Book::title
L_001a: ldarg.0
L_001b: ldfld class [mscorlib]System.Collections.Generic.List`1 LMar.Demo.Bookshop.Book::authors
L_0020: ldarg.2
L_0021: callvirt instance void [mscorlib]System.Collections.Generic.List`1::AddRange(class [mscorlib]System.Collections.Generic.IEnumerable`1)
L_0026: nop
L_0027: nop
L_0028: ret
}
So whether you write the creation code in the constructors or you rely the compiler to do it for you, the end effect is the same.
The biggest distraction: The code is not clear why one needs to do this because in normal usage, such construction scheme is totally unnecessary.
The solution can be found in
Lowy
Using the deserializing event
Since no constructor calls are ever made during deserialization, the deserializing event-handling method is logically your deserialization constructor. It is intended for performing some custom pre-deserialization steps - typically initialization of class member not marked as data members.
So with this piece of recommendation, the Book class is modified by adding a new private member function as follows:
public class Book {
// old code
[ OnDeserializing ]
public void OnDeserializing(StreamingContext context)
{
if( this.authors == null )
this.authors = new List();
// any other initialization here.
}
}
The code inside the OnDeserializing handler is unambiguous of their purpose. It does not interfere with normal usage of the Book class. This is almost the same as the special constructor when implementing
ISerializable. I just wish once a class is adorned with DataContract, the compile would require a constructor like that for ISerializable, even if it is an empty one. In that way users do not forget.
Caveats on using Business class as Data Contract
While this sound attractive, but in reality, there are plenty of gotcha. For one, business class for non-trivial solution are not just a simple class or even one containing inside another. It is often a class tree or object graph. While WCF claims to be able to handle this, it can generate data from classes that contains back reference (or circular reference) or removal of duplicate records by using the preserveObjectReferences as part of the settings for DataContractSerializer. But be
warn:
this approach has the following characteristics, which may be undesirable:
- Performance. Replicating data is inefficient.
- Circular references. If objects refer to themselves, even through other objects, serializing by replication results in an infinite loop. (The serializer throws a SerializationException if this happens.)
- Semantics. Sometimes it is important to preserve the fact that two references are to the same object, and not to two identical objects.
[...]
It is important to understand the limitations of this mode:
- The XML the DataContractSerializer produces with preserveObjectReferences set to true is not interoperable with any other technologies, and can be accessed only by another DataContractSerializer instance, also with preserveObjectReferences set to true.
- There is no metadata (schema) support for this feature. The schema that is produced is valid only for the case when preserveObjectReferences is set to false.
- This feature may cause the serialization and deserialization process to run slower. Although data does not have to be replicated, extra object comparisons must be performed in this mode.
More information on WCF Data Transfer in Service Contract is
here. If you decide to ignore the above mentioned issues and pushes on to use the preserverObjectReferences, this is a
discussion on how to over come this 'deficiency'.
In my opinion, it is best to use a specially crafted family of classes to support the
Data Transfer Object Pattern.
Caveats using too many variable initializer
This has been briefly mentioned above that any variable initializer is basically a compiler language trick as there is no direct IL instruction. All it does is to cause the initialization and object creation code injected into each constructor as if you write them. The only difference is that the compiler always places them ahead of your code.
If you have several constructor, these variables initializers can cause some kind of mild bloating. I am not sure if the end effect is as drastic as static compiled language such as native C++. Since the JIT only compiles and loads those methods that are used, unless you use every one of the constructors in your class in a scenario, they may not all be jitted and loaded.
One recommendation is to remove all the initializers and to place the creation logic inside a function that is called by each of the constructor. In that way, it ensures the construction logic is centralised and avoids duplicated IL code being injected into the constructor body.
However, this will not work for those
readonly members that can only be created and initialized in the constructor body. Doing this in a function even called by the constructor is no good.
"Programming WCF Services" by Juval Lowy, 2007. Page 93.