技不如人

Welcome to Rickel's blog.
posts - 56, comments - 78, trackbacks - 4, articles - 16
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

DataSets, Web Services, DiffGrams, Arrays, and Interoperability

Posted on 2005-05-18 10:15 Rickel 阅读(...) 评论(...) 编辑 收藏

Matt Powell
Microsoft Corporation

February 11, 2003

Summary: Matt Powell examines the issues involved in deciding whether to send DataSets to and from Web services, while also showing you all your options. (11 printed pages)


Over the last few months there has been something of a DataSet and Web services debate around MSDN: Is it or is it not okay to send DataSets to and from Web services?

First, here's a little background for those of you who are unfamiliar with the relationship between DataSets and Web services. I think it is safe to say that DataSets are the coolest part of Microsoft® ADO.NET. In a lot of ways they are like having a simple, cached version of a relational database. Throw a DataAdapter in the loop and you now have seamless synchronization with your cached DataSet and your back-end SQL database. DataSets can be typed or untyped, which simply means that the schema of the data is either fixed or not. If it is fixed (typed), you get the added benefit of having the data wrapped up in strongly typed, managed classes so that rows look like a simple class with properties that map to the columns in your database.

There are a lot of other cool things about DataSets, but one particularly interesting thing is the ability for a DataSet to be passed in a SOAP message. This is possible because a DataSet is capable of being serialized into XML in such a manner that it can be deserialized back into a DataSet. So, for instance, if a Microsoft® .NET Framework Web service returns a DataSet, then a .NET Framework client will get a DataSet object as the return value when it makes the method call on the generated proxy.

The Pros of DataSets and Web Services

Having a DataSet returned from a Web service and having it appear on the client may not seem like a big deal, but it is bigger than is immediately apparent. DataSets are very powerful, so having the ability to move them around and reuse them on the client can prove useful to a lot of applications. On top of the simple rows that are included in a particular table, DataSets also contain the cached changes to the data in the form of the inserts, updates, and deletions that have occurred on the DataSet. This gives the DataSet the ability to resynchronize with a back-end database. Now don't get me wrong—the DataSet on your smart client cannot synchronize seamlessly with the back-end database like it can from the server—but you will be able to query, filter, have multiple tables, and have the special class interfaces that typed DataSets provide, just like you could on the server.

The Cons of DataSets and Web Services

SOAP is all about the wire format. For this reason, interoperability with Web services is possible because the focus is not on how to handle a SOAP message, but on the message itself. This is why, in the SOAP 1.2 specification, the "Simple Object Access Protocol" description was removed and now the SOAP acronym lives on its own. SOAP isn't about objects, it is about the message.

So how does this relate to DataSets? DataSets are objects, and in the scenario where they are being passed in a SOAP message, there is an implicit relationship between the message and the object. This is because DataSets use a special method of XML serialization that creates an XML format unique to DataSets. This format is called a DiffGram. Although we may dream that someday all platforms will have support for dealing with DiffGrams, the reality is that they are really unique to .NET Framework DataSets. From an XML infoset standpoint, it is possible to make sense of DiffGrams, but it can complicate matters when the underlying data from the DataSet should be straightforward. Also, because the format is proprietary, it could change in future versions and break any ad hoc mechanisms created for interpreting them.

Let's look at how DataSets and DiffGrams work. Consider the Items database table shown in Figure 1.

Figure 1. A simple database table

If you were to ask your average XML aficionado to serialize this data into XML, they would probably come up with something very similar to this:

<?xml version="1.0" encoding="utf-8"?>
<Table xmlns="http://msdn.microsoft.com/samples/Table/">
  <Items>
    <ItemNumber>1</ItemNumber>
    <Description>Pink Erasers</Description>
    <Price>1.75</Price>
  </Items>
  <Items>
    <ItemNumber>2</ItemNumber>
    <Description>#2 Pencil</Description>
    <Price>32.75</Price>
  </Items>
  <Items>
    <ItemNumber>3</ItemNumber>
    <Description>Box Staples</Description>
    <Price>23.21</Price>
  </Items>
  <Items>
    <ItemNumber>4</ItemNumber>
    <Description>Stapler</Description>
    <Price>0.32</Price>
  </Items>
</Table>

This may seem pretty straightforward, but if we were going to load the data from this table into a DataSet and then serialize it, you would instead get something like:

<?xml version="1.0" encoding="utf-8"?>
<TypedDataSet xmlns="http://msdn.microsoft.com/samples/DataSetService/">
  <xs:schema id="TypedDataSet" 
targetNamespace="http://msdn.microsoft.com/samples/TypedDataSet.xsd" 
xmlns:mstns="http://msdn.microsoft.com/samples/TypedDataSet.xsd" 
xmlns="http://msdn.microsoft.com/samples/TypedDataSet.xsd" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-
microsoft-com:xml-msdata" attributeFormDefault="qualified" 
elementFormDefault="qualified">
    <xs:element name="TypedDataSet" msdata:IsDataSet="true">
      <xs:complexType>
        <xs:choice maxOccurs="unbounded">
          <xs:element name="Items">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="ItemNumber" msdata:ReadOnly="true" 
msdata:AutoIncrement="true" type="xs:int" />
                <xs:element name="Description" type="xs:string" />
                <xs:element name="Price" type="xs:decimal" />
              </xs:sequence>
            </xs:complexType>
          </xs:element>
        </xs:choice>
      </xs:complexType>
      <xs:unique name="Constraint1" msdata:PrimaryKey="true">
        <xs:selector xpath=".//mstns:Items" />
        <xs:field xpath="mstns:ItemNumber" />
      </xs:unique>
    </xs:element>
  </xs:schema>
  <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" 
xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
    <TypedDataSet xmlns="http://msdn.microsoft.com/samples/TypedDataSet.xsd">
      <Items diffgr:id="Items1" msdata:rowOrder="0">
        <ItemNumber>1</ItemNumber>
        <Description>Pink Erasers</Description>
        <Price>1.75</Price>
      </Items>
      <Items diffgr:id="Items2" msdata:rowOrder="1">
        <ItemNumber>2</ItemNumber>
        <Description>#2 Pencil</Description>
        <Price>32.75</Price>
      </Items>
      <Items diffgr:id="Items3" msdata:rowOrder="2">
        <ItemNumber>3</ItemNumber>
        <Description>Box Staples</Description>
        <Price>23.21</Price>
      </Items>
      <Items diffgr:id="Items4" msdata:rowOrder="3">
        <ItemNumber>4</ItemNumber>
        <Description>Stapler</Description>
        <Price>0.32</Price>
      </Items>
    </TypedDataSet>
  </diffgr:diffgram>
</TypedDataSet>

Obviously there is some bloating of the data when it is put into a DataSet. For a small table like the one in question, this is caused by including the schema for the data in the response. But there are some other differences as well—the schema and data are packaged under the TypedDataSet element. Notice that table constraint information is included, and a primary key for the table is indicated.

One very important but very subtle difference is the highlighted attribute:

 msdata:IsDataSet="true"

This attribute allows the .NET Framework to realize that this is a serialized DataSet rather than just another complex class.

There is a key component of the DiffGram not shown in this example. If changes are made to the DataSet without calling the AcceptChanges method to formally accept these changes, then there will also be rollback information within the DiffGram that appears as another child of the DiffGram element, listing pre-changed rows with references to the rows that are changed.

Obviously all this extra information makes the data larger and more complex, but it is the price to pay to have the additional functionality provided by the DataSet class.

If, as a Web service consumer, you are really only concerned with the data represented in the first non-DataSet serialization of the table and you are not interested in or do not have platform access to .NET Framework DataSets, do we really want to force you to dissect the complexities of DiffGram and DataSet serialization?

Or look at the Web service and DataSet problem from the other side. Suppose you know that the consumers of your Web service are all .NET Framework clients; you may make the decision to go ahead and return DataSets from your Web service so that client applications can use them in their code. This is great, but if you know that the communication is from .NET Framework to .NET Framework, why wouldn't you set up the proprietary .NET Remoting interface to do faster binary serialization, and thus get better performance than by using XML serialization?

Of course, there are other reasons for choosing Web services over .NET Remoting, not the least of which is that even if you have only .NET Framework clients today, there may be a time in the future when you will want to access the information from some other platform.

So where does that leave us? Do we go ahead and send and receive DataSets from our Web services? Do we avoid them at all costs? Is there some happy medium?

Making Everyone Happy: DataSets for .NET Framework Platforms and Arrays for Everyone Else

The reality behind our little debate is that we really don't have to choose. With a couple of simple tricks we can expose our data as 1) a DataSet, 2) raw XML, or 3) a simple array of strongly typed elements. Clients from any number of different platforms can choose which form they would like to see. Even .NET Framework clients can decide if they want to deal with data as a DataSet, an XmlElement (which can be convenient for using XSLT, XPath, DOM or a plethora of other XML capabilities), or as an array of custom .NET Framework objects. Each option has its own set of benefits.

To see how to do this, consider the following simple Web service that returns a DataSet based on the table in Figure 1. To build this Web service in Microsoft® Visual Studio® .NET, I first created a Web Service project, dragged a SqlDataAdapter from the Toolbox onto my Service1.asmx design page, right-clicked the Data Adapter, and then chose the Generate DataSet option. After completing two wizards, I created my WebMethod with the following code:

[WebMethod()]
public TypedDataSet GetDataSet()
{
    sqlDataAdapter1.Fill(typedDataSet1);
    return typedDataSet1;
}

TypedDataSet is the name of the DataSet type that was generated by the Generate DataSet wizard, typeDataSet1 is an instance of that type, and sqlDataAdapter1 is the instance of the SqlDataAdapter class created to access the table in Figure 1 from a SQL database. Therefore, this WebMethod returns a .NET Framework DataSet object and the DiffGram format will be used.

As a client consuming this Web service, I used the Add Web Reference option to import this Web service from my machine, generically referred to as "localhost". The code to create a DataSet from the above WebMethod looks like this:

localhost.Service1 proxy = new localhost.Service1();
localhost.TypedDataSet ds = proxy.GetDataSet();

ds now is an instance of a typed DataSet and can be queried, filtered, or otherwise manipulated as such.

It is important to note that if your goal is simply to allow DataSet data to be accessible as an array of objects, you do not have to convert the data beyond this point. Typed DataSets give you this kind of access through the generated typed collection. In our case, we could get the description of the second object in the database in either of these two ways:

string description;
description = (string)ds.Tables["Items"].Rows[1]["Description"];
description = ds.Items[1].Description;

The second and third lines of this code do the same thing. The second line accesses data through the generic DataSet fashion and is the mechanism we would need to use if this was an untyped DataSet. The third line uses the intuitive typed class mechanism provided by typed DataSets.

We also want to make our Web service's information available through DataSet-less options for our clients. One option is to translate the data from the DataSet into the XML that we converted the table into when we first considered how to return the data. Our WebMethod could then simply return this XML.

It turns out that converting a DataSet to XML is made trivial by something called an XmlDataDocument. The XmlDataDocument class allows you to wrap your DataSet with an XmlDocoument object. Changes made using the XmlDocument interface get reflected in the DataSet and changes to the DataSet get reflected in the XmlDocument. Therefore, using XmlDataDocument to have our WebMethod return XML is simple:

[WebMethod()]
public XmlDocument GetXmlDataDocument()
{
    sqlDataAdapter1.Fill(typedDataSet1);
    XmlDataDocument dataDoc = new XmlDataDocument(typedDataSet1);
    return dataDoc;
}

The result is now we won't have that big, complex DiffGram returned by our Web service instead we get a simple SOAP message that looks like this.

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope 
    xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Body>
    <GetXmlDataDocumentResponse 
  xmlns="http://msdn.microsoft.com/samples/DataSetService/">
      <GetXmlDataDocumentResult>
        <TypedDataSet 
  xmlns="http://msdn.microsoft.com/samples/TypedDataSet.xsd">
          <Items>
            <ItemNumber>1</ItemNumber>
            <Description>Pink Erasers</Description>
            <Price>1.75</Price>
          </Items>
          <Items>
            <ItemNumber>2</ItemNumber>
            <Description>#2 Pencil</Description>
            <Price>32.75</Price>
          </Items>
          <Items>
            <ItemNumber>3</ItemNumber>
            <Description>Box Staples</Description>
            <Price>23.21</Price>
          </Items>
          <Items>
            <ItemNumber>4</ItemNumber>
            <Description>Stapler</Description>
            <Price>0.32</Price>
          </Items>
        </TypedDataSet>
      </GetXmlDataDocumentResult>
    </GetXmlDataDocumentResponse>
  </soap:Body>
</soap:Envelope>

Simplicity of our on-the-wire message is probably not our primary concern—the main concern is how an application on the other side can access the message. In this case we are sending an array of items in XML. The client can read the XML using any number of XML techniques. A .NET Framework client that does this is shown in the code below. It is reading the same description field that we read previously out of the typed DataSet, but now it is accessing the data using an XPath query. The task is made slightly more complex because there is a default namespace defined on the data in question, but the code to take this into account and make the appropriate query is as follows:

localhost.Service1 proxy = new localhost.Service1();
XmlNode node = proxy.GetXmlDataDocument();
string description;
XmlNamespaceManager nsmgr 
    = new XmlNamespaceManager(new NameTable());
nsmgr.AddNamespace("def", 
    "http://msdn.microsoft.com/samples/TypedDataSet.xsd");
XmlNode item 
    = node.SelectSingleNode("/def:Items[2]/Description",
                            nsmgr);
description = item.InnerText;

One of the key things to note about our client code is that the call to the Web service returns an XmlNode element instead of the previous typed DataSet. It determines these types when it reads the WSDL for the Web during the Add Web Reference process. The WSDL for the method we created indicates that the data being returned from this method is of XML type any. The .NET Framework exposes an XML element of type any as an XmlNode, which is what the return type becomes for the method call on our proxy.

XmlNodes are pretty powerful things and by exposing our returned data as XML type any, we open up our Web service to a plethora of platforms that will be able to load the data into their object models for XML infosets and happily use the various APIs that are available in the world of XML. However we still have a problem.

By using XML type any in the definition of our interface, we lose the fact that the data is not free-form. The data has a very fixed format that is easily defined in XML Schema. If we want to provide as much information about our interface as possible and make programming for the consumers of our Web service as easy as possible, we should include an explicit schema definition of our data in the WSDL that defines our interface. For instance if we define the schema, the Add Web Reference option in Visual Studio .NET will create a class to deserialize the data into.

We like the on-the-wire format of the data, we just want to describe the format better in the WSDL for our Web service. The format itself can be described with the following simple type definition.

<s:complexType name="ArrayOfItems">
  <s:sequence>
    <s:element minOccurs="0" maxOccurs="unbounded" 
        name="Items" nillable="true" type="s0:Items" />
    </s:sequence>
</s:complexType>
<s:complexType name="Items">
  <s:sequence>
    <s:element minOccurs="1" maxOccurs="1" 
        name="ItemNumber" type="s:int" />
    <s:element minOccurs="0" maxOccurs="1" 
        name="Description" type="s:string" />
    <s:element minOccurs="1" maxOccurs="1" 
        name="Price" type="s:decimal" />
  </s:sequence>
</s:complexType>

We can use C# and the .NET Framework to create a class that matches this schema. A simple class serialized into XML that matches the Items type above looks like this:

private class Items
{
    public int ItemNumber;
    public string Description;
    public Decimal Price;
}

By declaring an array of Items, we generate XML that matches the ArrayOfItems type in the schema definition.

Now we have several options to use to modify our original Web method to get the wire format we like, and our WSDL represents the format of the data completely. One option is to change our Web method to return an array of Items instead of the XmlDataDocument that it returned previously. The problem with this approach is that we must copy the data out of the typed DataSet into the instances of the Items class. This seems inefficient since we already like the format of our response, we just want it described differently.

Another option is to simply turn off the automatic WSDL generation that Microsoft® ASP.NET creates for us and manually create our own WSDL. This may seem like a pretty good option but it also has its shortcomings, including the fact that making any changes to the interface requires you to manually change the WSDL as well.

A third option is to use XML serialization attributes to manipulate how the .NET Framework generates the schema for the data we are returning and change the WSDL accordingly. This gives us the benefit of automatic WSDL generation with strongly defined data types without having to manually copy the data out of the DataSet. We must still define the Items class in our code, but we do not have to copy the data into it or even use it. The modified WebMethod using this approach is shown below.

[WebMethod()]
[return: XmlElement(typeof(Items[]))]
public XmlDataDocument GetTypedXmlDataDocument()
{
    sqlDataAdapter1.Fill(typedDataSet1);
    XmlDataDocument dataDoc 
        = new XmlDataDocument(typedDataSet1);
    return dataDoc;
}

The only difference between this code and the code used previously for returning an XmlDataDocument is the addition of the XmlElement attribute. The syntax in this example specifically uses XmlElement to address the return value of our WebMethod. It tells the XmlSerializer used to create the SOAP message that the return element should be treated as an Items array instead of the raw XML that it would normally use. The WSDL definition for the return type of this method will now reflect the schema that we defined earlier.

The .NET Framework client code for calling this Web method will now be able to treat the results as an array of Items objects. The code is shown below.

localhost.Service1 proxy = new localhost.Service1();
localhost.Items[] items = proxy.GetTypedXmlDataDocument();
string description;
description = items[1].Description;

The Add Web Reference option of Visual Studio has created a class that now includes the definition of an Items class. This makes accessing a particular field of a particular record easy, as shown in the last line of the code above. Of course, if you only provide this approach then you lose the ability to access the raw XML that shown in the previous option.

Silencing the Debate

Now you see how easy it is to convert a DataSet into something that can be consumed on all platforms in whatever way makes the most sense for the consumer. A DataSet can be consumed as a DataSet, as easily understood XML, or even as an array of well-defined structures. You simply need to expose your WebMethods with the easy variations mentioned here. The XML and array options are easily digested from numerous other platforms, thus insuring interoperability, while the DataSet option gives additional power to the .NET Framework consumers on your network.

But...I still have a dream that someday, support for DiffGrams will indeed span many platforms and DataSet-type functionality will be shared by all.