Hierarchical
Complex relationships can be expressed in a relational database, but the results of a SQL query can take only one shape: a rectangular grid. LINQ has no such restrictions. Built into its very foundation is the idea that data is hierarchical (see Figure 3.2). If you want to, you can write LINQ queries that return flat, SQL-like datasets, but this is an option, not a necessity.
Figure 3.2 Both object-oriented languages and the developers who use them have a natural tendency to think in terms of hierarchies. SQL data is arranged in a simple grid.
Consider a simple relational database that has tables called Customers, Orders, and OrderDetails. It is possible to capture the relationship between these tables in a SQL database, but you cannot directly depict the relationship in the results of a single query. Instead, you are forced to show the result as a join that binds the tables into a single array of columns and rows.
LINQ, on the other hand, can return a set of Customer objects, each of which owns a set of 0-to-n Orders. Each Order can be associated with a set of OrderDetails. This is a classic hierarchical relationship that can be perfectly expressed with a set of objects:
Customer |
||
Orders |
||
OrderDetails |
Consider the following simple hierarchical query that captures the relationship between two objects:
var query = from c in db.Customers select new { City = c.City, orders = from o in c.Orders select new { o.OrderID } };
This query asks for the city in which a customer lives and a list of the orders the person has made. Rather than returning a rectangular dataset as a SQL query would, this query returns hierarchical data that lists the city associated with each customer and the ID associated with each order:
City=Helsinki orders=... orders: OrderID=10615 orders: OrderID=10673 orders: OrderID=10695 orders: OrderID=10873 orders: OrderID=10879 orders: OrderID=10910 orders: OrderID=11005 City=Warszawa orders=... orders: OrderID=10374 orders: OrderID=10611 orders: OrderID=10792 orders: OrderID=10870 orders: OrderID=10906 orders: OrderID=10998
This result set is multidimensional, nesting one set of columns and rows inside another set of columns and rows.
Look again at the query, and notice how we gain access to the Orders table:
orders = from o in c.Orders
The identifier c is an instance of a Customer object. As you will learn later in the book, LINQ to SQL has tools for automatically generating Customer objects given the presence of the Customer table in the database. Here you can see that the Customer object is not flat; instead, it contains a set of nested Order objects.
Listing 3.1 shows a simplified version of the Customer object that is automatically generated by the LINQ to SQL designer. Notice how LINQ to SQL wraps the fields of the Customer table. Later in this book, you will learn how to automatically generate Customer objects that wrap the fields of a Customer table.
Listing 3.1. A Simplified Version of the Customer Object That the LINQ to SQL Designer Generates Automatically
public partial class Customer { ... // Code omitted here private string _CustomerID; private string _CompanyName; private string _ContactName; private string _ContactTitle; private string _Address; private string _City; private string _Region; private string _PostalCode; private string _Country; private string _Phone; private string _Fax; private EntitySet<Order> _Orders; ... // Code omitted here }
The first 11 private fields of the Customer object simply reference the fields of the Customer table in the database. Taken together, they provide a location to store the data from a single row of the Customer table. Notice, however, the last item, which is a collection of Order objects. Because it is bound to the Orders table in a one-to-many relationship, each customer has from 0-to-n orders associated with it, and LINQ to SQL stores those orders in this field. This automatically gives you a hierarchical view of your data.
The same thing is true of the Order table, only it shows not a one-to-many relationship with the Customer table, but a one-to-one relationship:
public partial class Order { ... // Code omitted here private int _OrderID; private string _CustomerID; private System.Nullable<int> _EmployeeID; private System.Nullable<System.DateTime> _OrderDate; private System.Nullable<System.DateTime> _RequiredDate; private System.Nullable<System.DateTime> _ShippedDate; private System.Nullable<int> _ShipVia; private System.Nullable<decimal> _Freight; private string _ShipName; private string _ShipAddress; private string _ShipCity; private string _ShipRegion; private string _ShipPostalCode; private string _ShipCountry; private EntityRef<Customer> _Customer; ... // Code omitted here }
Again we see all the fields of the Orders table, their types, and whether they can be set to Null. The difference here is that the last field points back to the Customer table not with an EntitySet<T>, but an EntityRef<T>. This is not the proper place to delve into the EntitySet and EntityRef classes. However, it should be obvious to you that an EntitySet refers to a set of objects, and an EntityRef references a single object. Thus, an EntitySet captures a one-to-many relationship, and an EntityRef captures a one-to-one relationship.
The point to take away from this discussion is that LINQ to SQL captures not a flat view of your data, but a hierarchical view. A Customer class is connected to a set of orders in a clearly defined hierarchical relationship, and each order is related to the customer who owns it. LINQ gives you a hierarchical view of your data.
In a simple case like this, such a hierarchical relationship has obvious utility, but it is possible to imagine getting along without it. More complex queries, however, are obviously greatly simplified by this architecture. Consider the following LINQ to SQL query:
var query = from c in db.Customers where c.CompanyName == companyName from o in c.Orders from x in o.Order_Details where x.Product.Category.CategoryName == "Confections" orderby x.Product.ProductName group x by x.Product.ProductName into g orderby g.Count() select new { Count = g.Count(), Product = g.Key };
Here we use LINQ’s hierarchical structure to move from the Customers table to the Orders table to the Order_Details table without breaking a sweat:
var query = from c in db.Customers from o in c.Orders from x in o.Order_Details
The next line really helps show the power of LINQ hierarchies:
where x.Product.Category.CategoryName == "Confections"
The identifier x represents an instance of a class containing the data from a row of the Order_Details table. Order_Details has a relationship with the Product table, which has a relationship with the Category table, which has a field called CategoryName. We can slice right through that complex relationship by simply writing this:
x.Product.Category.CategoryName
LINQ’s hierarchical structure shines a clarifying light on the relational data in your programs. Even complex relational models become intuitive and easy to manipulate.
We can then order and group the results of our query with a few simple LINQ operators:
orderby x.Product.ProductName group x by x.Product.ProductName into g orderby g.Count()
Trying to write the equivalent code using a more conventional C# style of programming is an exercise that might take two or three pages of convoluted code and involve a number of nested loops and if statements. Even writing the same query in standard SQL would be a challenge for many developers. Here we perform the whole operation in nine easy-to-read lines of code.
In this section, I have introduced you to the power of LINQ’s hierarchical style of programming without delving into the details of how such queries work. Later in this book you will learn how easy it is to compose your own hierarchical queries. For now you only need to understand two simple points:
- There is a big difference between LINQ’s hierarchical structure and the flat, rectangular columns and rows returned by an SQL query.
- Many benefits arise from this more powerful structure. These include the intuitive structure of the data and the ease with which you can write queries against this model.