- About the Series
- Introduction to Performance Tuning
- Understanding How Queries Are Processed
- Types of Intervention
- Large Level Threshold Property
- Location of Query Processing
- Optimization of Set Operations
- Importance of Optimal Arrangement
- Expression Arrangement
- Caching and Optimization
- Other Performance Enhancement Options
- A Parting Word
Expression Arrangement
As many of us have learned in working with SQL and with MDX, the greatest strides in performance tuning can be gained by tuning the expressions themselves. The simple rearrangement of a query can produce dramatic performance gains, as we will see in this section. We have discussed the control of resource use to some extent, through the direction of a query's location of execution. We also exposed a scenario in the previous section in which a situation exists that client processing can be more expensive than is apparent, even though the query might contain level populations that fall below the threshold for server processing.
Let's return to the preceding example and see what we can do in the way of arranging expressions to make our query less resource-intensive. To do so, we will resume where we left off within the sample application:
Create a new query to rephrase the last (saved as Step2-1), as follows:
-- Step 2-2: Optimized Set Operation SELECT {[Measures]. [Units Shipped]} ON COLUMNS, TopCount ( Generate ( [Warehouse].[City].Members, TopCount ( CrossJoin ({ [Warehouse].CurrentMember}, [Product].[Product Name].Members ), 7, [Measures].[Units Shipped] ) ), 7, [Measures].[Units Shipped] ) ON ROWS FROM Warehouse
In our rearrangement of query 2-1, we are exploiting the GENERATE function to rephrase our query to ask the following: "Determine the top seven tuples that we obtain from crossjoining the warehouse city and named products for the current warehouse city. Then, marry the results with the rest of the members of the warehouse city level." The number of combinations that result within this query is 1,721 (1 x 1,560 named products, plus 7 x 23 cities). This represents a significant reduction over the number of combinations in the pre-optimized query, and results in a corresponding reduction of more than 95 percent.
As we will see from the next step, the results will be identical to those in the pre-optimized version.
Execute the query using the Run Query button.
The results set appears, as shown in Figure 9.
Figure 9 Results set, optimized crossjoin query.
Through this small exercise we have illustrated the dramatic performance gains that can be obtained by simply rephrasing a query. To summarize the approach, we might keep in mind that we can replace complex CROSSJOIN() statements with a GENERATE() and CROSSJOIN(), as demonstrated previously. The following general rule summarizes the approach:
CrossJoin (X,Y) = GENERATE(X, CrossJoin(X.CurrentMember, Y)
Or
CrossJoin (X,Y) = GENERATE (CrossJoin(X,Y.CurrentMember), Y)
Let's take a look at another scenario, in which the objective of our query is the summation of two measures (units shipped and warehouse sales) for a given set of products, stores, and warehouses (all products in all non-U.S. stores and warehouses, in our example). We can approach the query in a couple of ways, using calculated measures to affect the summations.
Let's create a straightforward query first, in which we create calculated members that we combine with the measures we seek to obtain the correct calculations.
Create the following new query:
-- Step 3-1: Pre-optimized Set Operation # 2 WITH MEMBER [Product].[Non-US Total] AS 'SUM ({[Product].[All Products].[Drink], 9 [Product].[All Products].[Food]})' MEMBER [Store].[Non-US Total] AS 'SUM ({[Store].[All Stores].[Canada], 9 [Store].[All Stores].[Mexico]})' MEMBER [Warehouse].[Non-US Total] AS 'SUM ({[Warehouse].[All Warehouses].[Canada], 9 [Warehouse].[All Warehouses].[Mexico]})' SELECT {[Measures].[Units Shipped], 9 [Measures].[Warehouse Sales]} ON COLUMNS, {[Time].[1998]} ON ROWS FROM Warehouse WHERE ( [Product].[Non-US Total], 9 [Store].[Non-US Total], [Warehouse].[NON-US Total] )
Now, let's run the query and note the results.
Execute the query using the Run Query button.
The results set appears, as shown in Figure 10.
Figure 10 Results set, pre-optimized summation query.
Save the query as Step3-1.
Now, let's consider another approach to obtaining the same objective. This time, we will place each set of calculation logic into a respective calculated measure.
Create the following new query:
-- Step 3-2: Optimized Set Operation # 2 WITH MEMBER [Measures].[Total Units Shipped] AS 'SUM ( {[Warehouse].[All Warehouses].[Canada], 9 [Warehouse].[All Warehouses].[Mexico]}, SUM ( {[Store].[All Stores].[Canada], 9 [Store].[All Stores].[Mexico]}, SUM ( {[Product].[All Products].[Drink], 9 [Product].[All Products].[Food]}, [Measures].[Units Shipped] )))' MEMBER [Measures].[Total Warehouse Sales] AS 'SUM ( {[Warehouse].[All Warehouses].[Canada], 9 [Warehouse].[All Warehouses].[Mexico]}, SUM ( {[Store].[All Stores].[Canada],[Store].[All Stores].[Mexico]}, SUM ( {[Product].[All Products].[Drink],[Product].[All Products].[Food]}, [Measures].[Warehouse Sales] )))' SELECT { [Measures].[Total Units Shipped], 9 [Measures].[Total Warehouse Sales] } ON COLUMNS, {[Time].[1998]} ON ROWS FROM Warehouse
Again, we will run the query and note the results.
Execute the query using the Run Query button.
The results set appears, as shown in Figure 11.
Figure 11 Results set, optimized summation query.
Save the query as Step3-2.
We note that the results are identical for the two approaches to the query (with the exception of the rounding of the Warehouse Sales amounts and the measure names, either of which could be adjusted if they were significant concerns). The second approach, however, performs appreciably better, as would be evident were the query extracting a much larger amount of data from a real-world data source. The following general rule summarizes the use of the commutative nature of the SUM function (and applies equally to the MAX, MIN, and TopCount functions):
SUM (CrossJoin (X,Y), b) = SUM(X, SUM(Y, b))
In both the examples of performance enhancements we have explored and in any other potential performance enhancement scenario we might contemplate, we need to consider the cost/benefit tradeoffs and actual performance metrics against different sources, and decide whether the differences in performance are too negligible to devote too much time to attempting to increase performance further. The point is that tuning of MDX expressions can be accomplished, both in the manner suggested previously or through other approaches, to maximize performance; and that beginning with the MDX statements themselves is usually the best approach to any optimization effort. Once the base query is tuned to an optimal state, we can always consider client/server balancing, hardware upgrades, and perhaps further measures to increase performance.