7.15 Query Expressions
Query expressions provide a language-integrated syntax for queries that is similar to relational and hierarchical query languages such as SQL and XQuery.
query-expression: from-clause query-body
from-clause: from typeopt identifier in expression
query-body: query-body-clausesopt select-or-group-clause query-continuationopt
query-body-clauses: query-body-clause query-body-clauses query-body-clause
query-body-clause: from-clause let-clause where-clause join-clause join-into-clause orderby-clause
let-clause: let identifier = expression
where-clause: where boolean-expression
join-clause: join typeopt identifier in expression on expression equals expression
join-into-clause: join typeopt identifier in expression on expression equals expression into identifier
orderby-clause: orderby orderings
orderings: ordering orderings , ordering
ordering: expression ordering-directionopt
ordering-direction: ascending descending
select-or-group-clause: select-clause group-clause
select-clause: select expression
group-clause: group expression by expression
query-continuation: into identifier query-body
A query expression begins with a from clause and ends with either a select or group clause. The initial from clause can be followed by zero or more from, let, where, join, or orderby clauses. Each from clause is a generator introducing a range variable, which ranges over the elements of a sequence. Each let clause introduces a range variable representing a value computed by means of previous range variables. Each where clause is a filter that excludes items from the result. Each join clause compares specified keys of the source sequence with keys of another sequence, yielding matching pairs. Each orderby clause reorders items according to specified criteria. The final select or group clause specifies the shape of the result in terms of the range variables. Finally, an into clause can be used to “splice” queries by treating the results of one query as a generator in a subsequent query.
7.15.1 Ambiguities in Query Expressions
Query expressions contain a number of “contextual keywords”—that is, identifiers that have special meaning in a given context. Specifically, these contextual keywords are from, where, join, on, equals, into, let, orderby, ascending, descending, select, group, and by. To avoid ambiguities in query expressions caused by mixed use of these identifiers as keywords and simple names, the identifiers are always considered keywords when they occur anywhere within a query expression.
For this purpose, a query expression is any expression that starts with “from identifier” followed by any token except “;”, “=”, or “,”.
To use these words as identifiers within a query expression, prefix them with “@” (§2.4.2).
7.15.2 Query Expression Translation
The C# language does not directly specify the execution semantics of query expressions. Rather, query expressions are translated into invocations of methods that adhere to the query expression pattern (§7.15.3). Specifically, query expressions are translated into invocations of methods named Where, Select, SelectMany, Join, GroupJoin, OrderBy, OrderByDescending, ThenBy, ThenByDescending, GroupBy, and Cast. These methods are expected to have particular signatures and result types, as described in §7.15.3. They can be instance methods of the object being queried or extension methods that are external to the object, and they implement the actual execution of the query.
The translation from query expressions to method invocations is a syntactic mapping that occurs before any type binding or overload resolution has been performed. The translation is guaranteed to be syntactically correct, but it is not guaranteed to produce semantically correct C# code. Following translation of query expressions, the resulting method invocations are processed as regular method invocations. This processing may, in turn, uncover errors—for example, if the methods do not exist, if arguments have wrong types, or if the methods are generic and type inference fails.
A query expression is processed by repeatedly applying the following translations until no further reductions are possible. The translations are listed in order of application: Each section assumes that the translations in the preceding sections have been performed exhaustively, and once exhausted, a section will not be revisited later in the processing of the same query expression.
Assignment to range variables is not allowed in query expressions. However, a C# implementation is permitted to not always enforce this restriction, because satisfying this constraint may sometimes not be possible with the syntactic translation scheme presented here.
Certain translations inject range variables with transparent identifiers denoted by *. The special properties of transparent identifiers are discussed further in §7.15.2.7.
7.15.2.1 select and groupby Clauses with Continuations
A query expression with a continuation
from ... into x ...
is translated into
from x in ( from ... ) ...
The translations in the following sections assume that queries have no into continuations.
The example
from c in customers group c by c.Country into g select new { Country = g.Key, CustCount = g.Count() }
is translated into
from g in from c in customers group c by c.Country select new { Country = g.Key, CustCount = g.Count() }
Its final translation is
customers. GroupBy(c => c.Country). Select(g => new { Country = g.Key, CustCount = g.Count() })
7.15.2.2 Explicit Range Variable Types
A from clause that explicitly specifies a range variable type
from T x in e
is translated into
from x in ( e ) . Cast < T > ( )
A join clause that explicitly specifies a range variable type
join T x in e on k1 equals k2
is translated into
join x in ( e ) . Cast < T > ( ) on k1 equals k2
The translations in the following sections assume that queries have no explicit range variable types.
The example
from Customer c in customers where c.City == "London" select c
is translated into
from c in customers.Cast<Customer>() where c.City == "London" select c
The final translation is
customers. Cast<Customer>(). Where(c => c.City == "London")
Explicit range variable types are useful for querying collections that implement the non-generic IEnumerable interface, but not the generic IEnumerable<T> interface. In the preceding example, this would be the case if customers were of type ArrayList.
7.15.2.3 Degenerate Query Expressions
A query expression of the form
from x in e select x
is translated into
( e ) . Select ( x => x )
The example
from c in customers select c
is translated into
customers.Select(c => c)
A degenerate query expression is one that trivially selects the elements of the source. A later phase of the translation removes degenerate queries introduced by other translation steps by replacing those queries with their source. In this situation, it is important to ensure that the result of a query expression is never the source object itself, as that would reveal the type and identity of the source to the client of the query. As a consequence, this step protects degenerate queries written directly in source code by explicitly calling Select on the source. It is then up to the implementers of Select and other query operators to ensure that these methods never return the source object itself.
7.15.2.4 from, let, where, join, and orderby Clauses
A query expression with a second from clause followed by a select clause
from x1 in e1 from x2 in e2 select v
is translated into
( e1 ) . SelectMany( x1 => e2 , ( x1 , x2 ) => v )
A query expression with a second from clause followed by something other than a select clause
from x1 in e1 from x2 in e2 ...
is translated into
from * in ( e1 ) . SelectMany( x1 => e2 , ( x1 , x2 ) => new { x1 , x2 } ) ...
A query expression with a let clause
from x in e let y = f ...
is translated into
from * in ( e ) . Select ( x => new { x , y = f } ) ...
A query expression with a where clause
from x in e where f ...
is translated into
from x in ( e ) . Where ( x => f ) ...
A query expression with a join clause without an into followed by a select clause
from x1 in e1 join x2 in e2 on k1 equals k2 select v
is translated into
( e1 ) . Join( e2 , x1 => k1 , x2 => k2 , ( x1 , x2 ) => v )
A query expression with a join clause without an into followed by something other than a select clause
from x1 in e1 join x2 in e2 on k1 equals k2 ...
is translated into
from * in ( e1 ) . Join( e2 , x1 => k1 , x2 => k2 , ( x1 , x2 ) => new { x1 , x2 }) ...
A query expression with a join clause with an into followed by a select clause
from x1 in e1 join x2 in e2 on k1 equals k2 into g select v
is translated into
( e1 ) . GroupJoin( e2 , x1 => k1 , x2 => k2 , ( x1 , g ) => v )
A query expression with a join clause with an into followed by something other than a select clause
from x1 in e1 join x2 in e2 on k1 equals k2 into g ...
is translated into
from * in ( e1 ) . GroupJoin( e2 , x1 => k1 , x2 => k2 , ( x1 , g ) => new { x1 , g }) ...
A query expression with an orderby clause
from x in e orderby k1 , k2 , ... , kn ...
is translated into
from x in ( e ) . OrderBy ( x => k1 ) . ThenBy ( x => k2 ) . ... . ThenBy ( x => kn ) ...
If an ordering clause specifies a descending direction indicator, an invocation of OrderBy-Descending or ThenByDescending is produced instead.
The following translations assume that there are no let, where, join, or orderby clauses, and no more than the one initial from clause in each query expression.
The example
from c in customers from o in c.Orders select new { c.Name, o.OrderID, o.Total }
is translated into
customers. SelectMany(c => c.Orders, (c,o) => new { c.Name, o.OrderID, o.Total } )
The example
from c in customers from o in c.Orders orderby o.Total descending select new { c.Name, o.OrderID, o.Total }
is translated into
from * in customers. SelectMany(c => c.Orders, (c,o) => new { c, o }) orderby o.Total descending select new { c.Name, o.OrderID, o.Total }
The final translation is
customers. SelectMany(c => c.Orders, (c,o) => new { c, o }). OrderByDescending(x => x.o.Total). Select(x => new { x.c.Name, x.o.OrderID, x.o.Total })
where x is a compiler-generated identifier that is otherwise invisible and inaccessible.
The example
from o in orders let t = o.Details.Sum(d => d.UnitPrice * d.Quantity) where t >= 1000 select new { o.OrderID, Total = t }
is translated into
from * in orders. Select(o => new { o, t = o.Details.Sum(d => d.UnitPrice * d.Quantity) }) where t >= 1000 select new { o.OrderID, Total = t }
The final translation is
orders. Select(o => new { o, t = o.Details.Sum(d => d.UnitPrice * d.Quantity) }). Where(x => x.t >= 1000). Select(x => new { x.o.OrderID, Total = x.t })
where x is a compiler-generated identifier that is otherwise invisible and inaccessible.
The example
from c in customers join o in orders on c.CustomerID equals o.CustomerID select new { c.Name, o.OrderDate, o.Total }
is translated into
customers.Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c.Name, o.OrderDate, o.Total })
The example
from c in customers join o in orders on c.CustomerID equals o.CustomerID into co let n = co.Count() where n >= 10 select new { c.Name, OrderCount = n }
is translated into
from * in customers. GroupJoin(orders, c => c.CustomerID, o => o.CustomerID, (c, co) => new { c, co }) let n = co.Count() where n >= 10 select new { c.Name, OrderCount = n }
The final translation is
customers. GroupJoin(orders, c => c.CustomerID, o => o.CustomerID, (c, co) => new { c, co }). Select(x => new { x, n = x.co.Count() }). Where(y => y.n >= 10). Select(y => new { y.x.c.Name, OrderCount = y.n)
where x and y are compiler-generated identifiers that are otherwise invisible and inaccessible.
The example
from o in orders orderby o.Customer.Name, o.Total descending select o
has the final translation
orders. OrderBy(o => o.Customer.Name). ThenByDescending(o => o.Total)
7.15.2.5 select Clauses
A query expression of the form
from x in e select v
is translated into
( e ) . Select ( x => v )
except when v is the identifier x. In the latter case, the translation is simply
( e )
For example,
from c in customers.Where(c => c.City == "London") select c
is simply translated into
customers.Where(c => c.City == "London")
7.15.2.6 groupby Clauses
A query expression of the form
from x in e group v by k
is translated into
( e ) . GroupBy ( x => k , x => v )
except when v is the identifier x. In the latter case, the translation is
( e ) . GroupBy ( x => k )
The example
from c in customers group c.Name by c.Country
is translated into
customers. GroupBy(c => c.Country, c => c.Name)
7.15.2.7 Transparent Identifiers
Certain translations inject range variables with transparent identifiers denoted by *. Transparent identifiers are not a proper language feature; they exist only as an intermediate step in the query expression translation process.
When a query translation injects a transparent identifier, further translation steps propagate the transparent identifier into anonymous functions and anonymous object initializers. In those contexts, transparent identifiers have the following behavior:
- When a transparent identifier occurs as a parameter in an anonymous function, the members of the associated anonymous type are automatically in scope in the body of the anonymous function.
- When a member with a transparent identifier is in scope, the members of that member are in scope as well.
- When a transparent identifier occurs as a member declarator in an anonymous object initializer, it introduces a member with a transparent identifier.
In the translation steps described earlier, transparent identifiers are always introduced together with anonymous types, with the intent of capturing multiple range variables as members of a single object. An implementation of C# is permitted to use a different mechanism than anonymous types to group together multiple range variables. The following translation examples assume that anonymous types are used, and show how transparent identifiers can be translated away.
The example
from c in customers from o in c.Orders orderby o.Total descending select new { c.Name, o.Total }
is translated into
from * in customers. SelectMany(c => c.Orders, (c,o) => new { c, o }) orderby o.Total descending select new { c.Name, o.Total }
which is further translated into
customers. SelectMany(c => c.Orders, (c,o) => new { c, o }). OrderByDescending(* => o.Total). Select(* => new { c.Name, o.Total })
When transparent identifiers are erased, the final translation is equivalent to
customers. SelectMany(c => c.Orders, (c,o) => new { c, o }). OrderByDescending(x => x.o.Total). Select(x => new { x.c.Name, x.o.Total })
where x is a compiler-generated identifier that is otherwise invisible and inaccessible.
The example
from c in customers join o in orders on c.CustomerID equals o.CustomerID join d in details on o.OrderID equals d.OrderID join p in products on d.ProductID equals p.ProductID select new { c.Name, o.OrderDate, p.ProductName }
is translated into
from * in customers. Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c, o }) join d in details on o.OrderID equals d.OrderID join p in products on d.ProductID equals p.ProductID select new { c.Name, o.OrderDate, p.ProductName }
which is further reduced to
customers. Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c, o }). Join(details, * => o.OrderID, d => d.OrderID, (*, d) => new { *, d }). Join(products, * => d.ProductID, p => p.ProductID, (*, p) => new { *, p }). Select(* => new { c.Name, o.OrderDate, p.ProductName })
The final translation is
customers. Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c, o }). Join(details, x => x.o.OrderID, d => d.OrderID, (x, d) => new { x, d }). Join(products, y => y.d.ProductID, p => p.ProductID, (y, p) => new { y, p }). Select(z => new { z.y.x.c.Name, z.y.x.o.OrderDate, z.p.ProductName })
where x, y, and z are compiler-generated identifiers that are otherwise invisible and inaccessible.
7.15.3 The Query Expression Pattern
The query expression pattern establishes a pattern of methods that types can implement to support query expressions. Because query expressions are translated to method invocations by means of a syntactic mapping, types have considerable flexibility in how they implement the query expression pattern. For example, the methods of the pattern can be implemented as instance methods or as extension methods because both kinds of methods have the same invocation syntax. Likewise, the methods can request delegates or expression trees because anonymous functions are convertible to both.
The recommended shape of a generic type C<T> that supports the query expression pattern is shown below. A generic type is used to illustrate the proper relationships between parameter and result types, but it is possible to implement the pattern for nongeneric types as well.
delegate R Func<T1,R>(T1 arg1); delegate R Func<T1,T2,R>(T1 arg1, T2 arg2); class C { public C<T> Cast<T>(); } class C<T> : C { public C<T> Where(Func<T,bool> predicate); public C<U> Select<U>(Func<T,U> selector); public C<V> SelectMany<U,V>(Func<T,C<U>> selector, Func<T,U,V> resultSelector); public C<V> Join<U,K,V>(C<U> inner, Func<T,K> outerKeySelector, Func<U,K> innerKeySelector, Func<T,U,V> resultSelector); public C<V> GroupJoin<U,K,V>(C<U> inner, Func<T,K> outerKeySelector, Func<U,K> innerKeySelector, Func<T,C<U>,V> resultSelector); public O<T> OrderBy<K>(Func<T,K> keySelector); public O<T> OrderByDescending<K>(Func<T,K> keySelector); public C<G<K,T>> GroupBy<K>(Func<T,K> keySelector); public C<G<K,E>> GroupBy<K,E>(Func<T,K> keySelector, Func<T,E> elementSelector); } class O<T> : C<T> { public O<T> ThenBy<K>(Func<T,K> keySelector); public O<T> ThenByDescending<K>(Func<T,K> keySelector); } class G<K,T> : C<T> { public K Key { get; } }
These methods use the generic delegate types Func<T1, R> and Func<T1, T2, R>, but they could equally well have used other delegate or expression tree types with the same relationships in parameter and result types.
Notice the recommended relationship between C<T> and O<T>, which ensures that the ThenBy and ThenByDescending methods are available only on the result of an OrderBy or OrderByDescending. Also notice the recommended shape of the result of GroupBy—a sequence of sequences, where each inner sequence has an additional Key property.
The System.Linq namespace provides an implementation of the query operator pattern for any type that implements the System.Collections.Generic.IEnumerable<T> interface.