Basic MDX Concepts
The Multidimensional Expression (MDX) language was created with the primary purpose of allowing the user to manipulate multidimensional data in Microsoft SQL Server 2000 Analysis Services. MDX is defined in the OLAP extensions and OLE DB. It's used with Analysis Services as a query language, and is used by various client components and applications to return data from OLAP cubes. A second important characteristic of MDX is its role as an expression language. Its functions can be used in Analysis Services to define calculated members, as well as to build local cubes and to query cube data using the Pivot Table Service with OLE DB and Microsoft ActiveX Data Objects (ADO). MDX also allows you to create and register flexible user-defined functions that accept arguments and return values in the MDX syntax.
Contrasting MDX with SQL
If you're familiar with the Structured Query Language (SQL), the MDX syntax will appear similar in many facets. As we proceed with our introduction to MDX, it may seem that many of the actions we can perform with MDX could be handled (although perhaps not as well) with SQL. MDX exists to make this functionality more efficient and intuitive, with specific respect to multidimensional data and objects.
The SELECT clause (representing a data request), the FROM clause (representing a "starting point" for data selection), and the WHILE clause (in essence a filter) are requirements for both MDX and SQL. MDX uses other keywords to query cubes and return data that we can analyze, as well as providing functions that serve as tools to manipulate returned data. MDX is further enhanced with the support of user-defined functions that allow you to extend and customize the tool to fit specific business needs that might not lend themselves to "off the shelf" solutions.
MDX also resembles SQL in its provision of Data Definition Language (DDL) syntax for managing data structures. For example, commands exist for creating, modifying, and deleting cubes, dimensions, measures, and other structural objects in the OLAP environment.
Most data definition/manipulation languages, including SQL, are constructed to return and manipulate data in two dimensions, commonly seen as a row dimension and a column dimension. A partial picture of a two-dimensional (or relational) database may appear as shown in Figure 1.
Figure 1 Database schema for the sample budget cube.
The tables in Figure 1 contain two-dimensional data fields, or single data elements that exist at the intersection of each row and column. As you may already know, a SQL query would specify columns with a SELECT statement, with a starting point defined in a FROM clause. The specific rows to be returned by the query would be specified by a WHERE clause, which acts in the capacity of a filter.
Fundamental MDX Terminology
The domain of MDX is multidimensional data, which by its nature is housed within structures (called cubes) of greater than two dimensions. A dimension is an organized hierarchy of categories (levels) that typically describes data in a fact table; I liken dimensions to "perspectives" or "views" of data. Instead of fields, the element(s) of data that are stored at the intersection of the cube's dimensions are called measures. There can be more than one measure stored at the intersections.
Figure 2 shows a sample cube based loosely on the more elaborate "Budget" cube supplied as a sample with the installation of Analysis Services. The cube is composed of three dimensions: Stores, Accounts, and Time. It also stores two measures: Actual and Budget amounts. Each dimension is broken down into different, hierarchical levels, each of which is broken down further into members. For example, the Stores dimension is divided into State levels (seen here as Georgia and New York), which are composed of City levels (here Atlanta, Augusta, New York City, and Albany). The cities are subdivided into the individual stores themselves, which are the members of the Stores dimension.
Figure 2 Sample cube, showing various hierarchical relationships.
The intersect points of the local Stores, together with the General Ledger Accounts (the members composing the Income Statement levels of the Accounts dimension) and the operating months (the members making up the Quarter levels of the Time dimension), represent the point (cells) at which the measures are collected.
To identify and extract the data, whether for a single cell or multiple cells, MDX uses a reference system based on the concept of tuples. Tuples list dimensions and members to identify individual cells as well as groups of cells in the cube. Because each cell is an intersection of all the dimensions of the cube, tuples can uniquely identify every cell in the cube.
Tuples identify sections of the cube called slices, which are composed of more than one cell. An ordered collection of tuples constitutes a set. In an MDX query, axis and slicer dimensions are composed of these sets of tuples. In addition, it's possible to create a named set. A named set is a set with an alias, used to make MDX queries easier to understand and, if especially complex, more efficient to process.
When we use SQL, it's usually necessary to filter the potentially large amounts of data returned from a query on a table or tables. We can apply filters by specifying columns in the SELECT statement and restricting the rows that are returned based on specific criteria with a WHERE statement.
The concept of filtering is equally important in MDX, where a SELECT statement is used to select the axis dimensions (the dimensions and members to be returned in MDXexpected to return data for multiple members). The WHERE statement is used to restrict the returned data to a slicer dimension, which is composed of specific dimension and member criteria, and is expected to return data for a single member. The terms axis dimension and slicer dimension distinguish dimensions of the source cube cells of the query, indicated in the FROM clause, from the dimensions of the result cube cells, which can be composed of multiple cube dimensions.
Two more terms to consider in our review of fundamental MDX terminology are calculated members and user-defined functions. Calculated members are derived members; that is, they're members that have no basis in the data itself, but are created via evaluated expressions in MDX. They're returned by the same processes as a standard member, and can be created via a rich set of functions that are present in MDX. Calculated members extend our capabilities to manipulate multidimensional data. User-defined functions extend those capabilities by allowing us to create custom functions to manipulate multidimensional data, and to register those functions, giving us the flexibility of calling these functions from within the Calculated Member Builder (introduced later in this session), MDX queries, and data definition language (DDL) statements that support MDX. User-defined functions can be programmed in any language that supports Component Object Model (COM) interfaces, and are capable of accepting arguments and returning values in the MDX syntax.