Best Practice for Data Containers
So, what is the best practice? Let's take a step back and think about the time before .NET.
Yesterday
Those of us who were building n-tier systems with VB6 didn't have many choices for data containers. The main problem was that we couldn't write classes that were marshaled by value. All classes written in VB6 classes were marshaled by reference. Therefore, if we used custom object-oriented formats for the "carrier" functionality, we were asking for bad performance. Instead, we had to use, for example, tagged strings, arrays of simple values, or disconnected ADO Recordsets. (Disconnected ADO Recordsets used custom marshaling and, therefore, were marshaled by value.)
NOTE
VB programmers weren't the only ones suffering from the problem of marshalling by reference in COM. C++ programmers suffered, too. It wasn't exactly trivial to write COM components that used custom marshalling.
A problem arose with representation of custom object-oriented formats as well because all classes written with VB became COM components. For some of our classes, that was just great; for other classes, though, we didn't need that. As a result, we paid with high overhead for each instance of those classes. It was even worse if we made the mistake of configuring the "wrong" classes in COM+ without configuring them so that they could "activate in the context of the caller." Then the overhead of each instance was as much as 3KB. For example, arrays, XML DOMs, and disconnected ADO Recordsets were often used for the representation, too.
As I said, ADO Recordsets wasn't the only option, but for most situations, it was considered a best practice to use them, especially for carrier functionality.
Today and Tomorrow
Okay, it wasn't a single choice "yesterday," but in a way it was easy to choose data containers because there weren't too many options. With .NET, the situation has changed. I don't think we can say that there is a clear best practice yet, but, as I've said already, Microsoft most often recommends ADO.NET DataSets as the way to go. I have changed my mind and now prefer to use custom classes and collectionsthat is, a custom object-oriented format. Who is right, Microsoft or me?
Before you answer that question and call me crazy, let me make it clear that this question is not as simple as it first sounds. There is never a single answer to all questions, and that goes for data containers, too. Different situations require different solutions I have talked to Microsoft about this because of their pretty clear recommendations, and they've said that they are actually pretty fond of a custom object-oriented format. In their view, the problem is that it's much more complicated and more risky to go my route. After all, DataSets come with a lot of built-in functionality for free. A lot of the literature (for example, Fowler's book) supports Microsoft's view that building a custom object-oriented format requires work and is risky.
The choice of which data container to use isn't obvious. Another question to ask is, does it matter at all? As you probably have guessed by now, I think it's very importantI would go so far to say that the choice is crucial. You and your application will live with the choice for a long time, and the implications are enormous. This decision will affect several qualities:
Performance. How much time does it take from request until complete response?
Scalability. How is the performance affected when the size of the problem increases? For example, what happens when more users are added or larger amounts of data must be transferred?
Productivity. How much effort is required to write the initial version of an application?
Maintainability. How much effort is required to extend and change the application?
Interoperability. How easily can heterogeneous clients work with the data?
As you probably know, you can't have all of these qualities at 100%; they somewhat compete with each other, so to speak.
Which Is the Most Important Quality?
Determining which qualities are most important depends on different situations. Assuming that you have to choose one that is more important for most situations, which one would you choose?
I would probably choose maintainability. By focusing on that, you might get a little less productivity when you write the initial version, but you can quite easily get performance, scalability, and interoperability if you later find out that you don't completely fulfill those qualities. The cost of system ownership increasingly depends on maintainability. My guess is that this won't change in the near future.
I use these qualities when comparing the different options discussed next.