Thursday, 9 April 2009

Using ShouldSerialize for conditional omission of properties in the XmlSerializer

Wow

I've just found out about the undocumented ShouldSerialize technique for XmlSerializers in .NET. Possibly the most useful System.Xml discovery I've made for years. Basically, while it's not very OO (but when is codegen particularly OO?) it allows you to provide programattic control of when a property should be serialized to the XML result stream inside the generated XmlSerializer for that type.




Here's a link to the only MSDN article I've seen on the subject http://msdn.microsoft.com/en-us/library/53b8022e(VS.71).aspx

Why

You have to comply with an existing XML consumer (probably bespoke and third party) that isn't tolerant of fully schema-bound XML, and you want to use the built-in XmlSerializer in .NET. And why wouldn't you? it's extremely powerful and configurable - and there's a huge sense of satisfaction to be gained from getting it working with some esoteric consumer - done well, your code is clean, readable and functional - far better than messing around with StringBuilders and XmlWriters.




Now I've gotten quite good at manipulating the XmlSerializer over the years, so it tends to be my first port of call when I need to generate some XML for some reason. Who wouldn't want to replace gobs and gobs of String concatenation code with three lines of a call to a serializer. It's clear what you're doing and far far easier to maintain. The point being that XML is structured data, it's not just text, just because it can be represented that way.

The Scenario

I'm trying to automatically Serialize some CAML to send to SharePoint. Specifically a Query which looks something like this:

<Query xmlns="http://schemas.microsoft.com/sharepoint/soap/">
  <OrderBy>
    <FieldRef Name="Title" />
  </OrderBy>
</Query>

(1) Required XML from the serializer

I have a Query class which contains a List<FieldRef> called OrderBy. So far so good. Now, I also added a subclass of List<FieldRef> called GroupBy, adding the extra property Collapse, which you can see the schema requires. Now consider how I get the XML above from the serializer. My API looks something like this:


Query query = new Query();
query.OrderBy.Add(
new FieldRef ("Title"));

(2) API to build the XML in (1) above.

so I default the GroupBy and OrderBy properties to the Query to new instances, but since I haven't added a GroupBy, when I run I'll get this XML back.

<Query xmlns="http://schemas.microsoft.com/sharepoint/soap/">
  <OrderBy>
    <FieldRef Name="Title" />
  </OrderBy>
  <GroupBy />
</Query>

(3) XML generated by the serializer under normal circumstances.


See the extra GroupBy? That's no good. So how do we get rid of this empty element? Ok, I could annotate the GroupBy property with [DefaultValue(null)] and have the GroupBy property instantiate lazily. That's all well and good, and would work... until we want to remove a FieldRef from the list, leaving it empty. Same problem, the list isn't null although it's empty, so it serializes and we get the XML (3) above.




The problem is that DefaultValue doesn't allow conditional evaluation. What we need is something that behaves like the DefaultValueAttribute which tells the Serializer to skip the property when it has that value, but that does allow conditional evaluation. Of course we could performs some weird hacks, checking for an empty list in the property getter and returning null... that would keep the serializer happy, but it would break the API, forcing us to explicitly create the list from client code, and that's something easily forgotten leading to potential errors.



Enter ShouldSerialize. It turns out that creating a public boolean method called ShouldSerialize[PropertyName] in the serializable class tells the generated XmlSerializer to call that method to determine whether it should try to serialize the property or not. So, I create a method called ShouldSerializeGroupBy in my Query class, which checks for null or empty, returning false in that case, and BAM! tests pass, and I am happy. So my Query class now looks like this below.


[
XmlRoot("Query", Namespace = Namespaces.SharePointSoap),
XmlType("Query", Namespace = Namespaces.SharePointSoap),
Serializable
]
public class Query
{
private List orderBy = new List();
private GroupBy groupBy = new GroupBy();

[XmlArray("OrderBy")]
public List OrderBy
{
get { return orderBy; }
set { orderBy = value; }
}

public bool ShouldSerializeGroupBy()
{
return groupBy != null && groupBy.Count > 0;
}

[XmlArray("GroupBy"), DefaultValue(null)]
public GroupBy GroupBy
{
get { return groupBy; }
set { groupBy = value; }
}
}


Hope this helps you. It's saved me a lot of trouble. TTFN :)

4 comments:

Gita said...
This comment has been removed by the author.
dc_united said...

Why is that this 'ShouldSerialize' must be a public method within the class. Would it not make more sense for this to be private?

Leon Jollans said...

Well, I didn't write the XmlSerializer so you'd have to ask Microsoft, but why shouldn't it be public? custom clients get to check the condition for themselves without reflection. I mean OK, reflection's necessary for the initial Serializer generation, but not for subsequent usage. If the serializer had to reflect every time to serialize an object it could hit performance with a serious volume.

Additionally, client code of classes shouldn't depend on encapsulated private functionality - it's not good OO to rely on such dependencies. Private members are by definition implementation details, and not necessarily reliable for client code either to act the same way, or even to exist.

Now if you feel that adding this public property pollutes your class I imagine it's possible to contain it within an interface and isolate the clean interface of that class through another interface. In some ways it's probably better to do that anyway since working to interfaces gives you the ability to substitute one class for another at runtime anyway.

So imagine your serializable class is.. Let's say "Account". It has ShouldSerializeBalance as a public member for the serializer's benefit. It could also implement an interface IAccount, which could declare only the important details.. ShouldSerializeXXX wouldn't be in there. Then all your client code using IAccount references would never know about the ShouldSerializeBalance method, nor would intellisense, and you could swap it for some test class in unit tests (or swap it for IAccount implementations from external sources) without any trouble, or confusion.

Leon Jollans said...

Also don't forget that the XmlSerializer generates a class to perform the work against your business object. It may (unless you specify) exist in another namespace, certainly a different assembly. The generated serializer is a client of your business object, you're not augmenting the business object wih the ability to serialize itself. I mean sure, go ahead and implement IXmlSerializable on your business object and write shed loads of System.Xml code (or Linq 2 XML these days).. and sure you could always do that. In that case it would be proper OO to leabve the implementation details private, and make the IXmlSerializable methods explicit interface bound signatures to keep the public API clean..

But that's not the point of this article or the feature. ShouldSerializeXXX *simplifies* serialization, just as Xml attributes do. And the less code you have to write to achieve a goal, the less that's going to go wrong in production.