- XML and Traditional Data Access
- XML and ADO.NET
- Serialization, Marshaling, and DataSet
- Typed DataSets
- The XmlDataDocument Class
- Why Databases and XML?
- SQL Server, XML, and Managed Data Access
- Using SQLXML and .NET
- Where Are We?
7.3 Serialization, Marshaling, and DataSet
Rather than use a binary format by default, as in ADO classic, the DataSet default is XML serialization and marshaling. This means that you can populate DataSet from non-Microsoft data sources, and the data is consumable from non-Microsoft platforms. Because DataSet marshals as an XML document, it is natively supported without transformation by Web Services.
The .NET platform includes two libraries that serialize and deserialize classes in XML format:
System.Runtime.Serialization is used for marshaling in .NET implementations.
System.Xml.Serialization is used in Web Services to support unlike implementations.
DataSet is compatible with both of them.
System.Runtime.Serialization serializes .NET classes using two formatters included in the .NET framework: the Binary formatter and the SOAP formatter. The Binary formatter uses a .NET-specific format and protocol to optimize size by reducing the number of bytes transmitted. The SOAP formatter uses an XML-based format and the Simple Object Access Protocol. The standardization of SOAP details is in progress under the auspices of the W3C XML-SP committee. System.Runtime.Serialization.SoapFormatter uses SOAP 1.1 as its format. SOAP 1.1 is currently a W3C note; an updated version (SOAP 1.2) has been released. The SOAP formatter is CLR type-centric. It can serialize any CLR type to SOAP format but cannot serialize any arbitrary XML; some XML types cannot be processed, and others are serialized differently from the expected XSD-defined format. For example, arrays are serialized according to SOAP section 5, which is not consistent with XSD schemas.
System.Xml.Serialization is XML-centric in its approach. It can serialize any XML simple or complex type that can be represented in an XML schema, but it may not be able to serialize all CLR types with 100 percent fidelity. It is used in the System.Web.Services library for greatest compatibility with unlike platforms. The inability of CLR serialization to serialize all schema types, and the inability of XML.Serialization to handle all CLR types, is not a deficiency of the implementation; rather, it's a result of the inherent difference between the schema type system and the CLR type system.
To indicate support for serialization using System.Runtime.Serialization, the class must mark itself with the [Serializable] attribute. Classes that use the [Serializable] attribute can either accept the system's default serialization mechanism or implement ISerializable in a class-specific manner. Listing 719 shows how to use the [Serializable] attribute and implement a custom version of ISerializable.
Listing 719 A class that implements ISerializable
[Serializable] public class Foo : ISerializable { public int x, y; public Foo() {} internal Foo(SerializationInfo si, StreamingContext context) { //Restore our values. x = si.GetInt32("i"); y = si.GetInt32("j"); } public void GetObjectData(SerializationInfo si, StreamingContext context) { //Add our three scalar values; si.AddValue("x", x); si.AddValue("y", y); Type t = this.GetType(); si.AddValue("TypeObj", t); } }
Note that implementing ISerializable requires two things: implementing the GetObjectData method to fill in the SerializationInfo property bag, and implementing a constructor that takes the SerializationInfo and StreamingContext parameters. Custom serialization methods can be implemented to optimize serialization based on the StreamingContext. The DataSet class implements a custom version of ISerializable.
XML schema-centric serialization is controlled by the XmlSerializer class in the System.Xml.Serialization namespace. This class can generate custom XmlSerializationReader/XmlSerializationWriter pairs on a per-type basis. By default, XmlSerializer uses a one-to-one CLR-class-to-XML-complex-type mapping. Classes can customize the exact serialization by decorating their class declarations with a series of CLR attributes from the System.Xml.Serialization namespace. DataSet uses a custom mechanism to interact with XmlSerializer.
DataSet supports both System.Runtime.Serialization and System.Xml.Serialization. It supports each one through its implementations of ReadXmlSchema/ReadXml and WriteXmlSchema/WriteXml. When System.- Runtime.Serialization is used, GetObjectData uses the WriteXmlSchema and WriteXml methods directly. In addition, DataSet has the appropriate constructor for custom serialization and invokes ReadXmlSchema and ReadXml to populate itself from SerializationInfo. There are no optimizations for different streaming contexts; DataSet is marshaled by value even across appdomain boundaries.
DataSet supports custom XML-centric serialization by implementing a special interface, IXmlSerializable. Currently it is the only class in the base class libraries to implement this interface. IXmlSerializable has three methodsReadXml, WriteXml, and GetSchemawhich are implemented in DataSet by calling the appropriate Read or WriteXml and Read or WriteXmlSchema, just as in System.Runtime.Serialization.
If you want to use complex types as DataColumns, it is useful to know exactly how DataSet is serialized. When DataSet is serialized, WriteXml calls XmlDataTreeWriter, which eventually writes each row with an XmlDataRowWriter. Then XmlDataRowWriter calls DataColumn.ObjectToXml on every column. DataColumn.ObjectToXml calls only System.Data.Common.DataStorage.ObjectToXml. The System.Data.Common.DataStorage class has a static method called CreateStorage. It creates Storage classes for any of the concrete types it supportsthat is, it calls the constructor on the concrete classes: System.Data.Common.XXXStorage.
A final storage class is called ObjectStorage. Any class that is not directly supported by DataSet will use the ObjectStorage class. This is important when you think back to the example in Chapter 4 that stores Object types in DataSet.
Every DataColumn value in a DataTable is represented as XML by calling its ToString method. It is rehydrated from XML by using a constructor that takes a single string as input. Therefore, to use arbitrary objects as DataColumn types, they must have a ToString method that renders their value as XML and a single string constructor. This is a difficult design decision because a method (ToString) that may produce string output for reports must be reserved for XML, but the decision must be tempered by the fact that a complex type usually cannot be represented as a single string. Listing 720 illustrates this type of object using the Person class from Chapter 4.
Listing 720 Producing correct XML with the Person class
public class Person { public String name; public int age; public Person(String serstr) { Person p; XmlSerializer ser = new XmlSerializer(typeof(Person)); p = (Person)ser.Deserialize(new StringReader(serstr)); this.age = p.age; this.name = p.name; } public override string ToString() { String s; StringBuilder mysb = new StringBuilder(); StringWriter myStringWriter = new StringWriter(mysb); XmlSerializer ser = new XmlSerializer(this.GetType()); ser.Serialize(myStringWriter, this); s = myStringWriter.ToString(); return s; } }
To use an embedded DataTable in a DataColumn, as you did in Chapter 4, you must override the DataTable's implementation of these two methods. Unfortunately, the DataTable has a single string constructor, and to implement this constructor in such a way changes the semantics of the base class and is suboptimal. SQL Server's UNIQUEIDENTIFIER class is an example of using this pair of methods to map to System.Guid, which has the appropriate constructor and ToString method to be correctly marshaled as a column inside DataSet. The DataSet class implements two additional public methodsShouldSerializeTables and ShouldSerializeRelationsto allow Serialization to work with subclasses, such as strongly typed DataSets.