- Dimensional Modeling
- Changing Dimensions
- Populating the Fact Tables
- Summary
Populating the Fact Tables
Populating fact tables can present some interesting challenges due to inherently large number of records that need to be imported. Although dimension tables usually contain few dozen to few thousand members, tables are likely to contain millions or even billions of rows. Therefore, populating the fact that tables tend to be resource-intensive and time-consuming.
Perhaps the biggest challenge is managing indexes on the fact table. SQL Server maintains indexes under the covers as you execute INSERT, UPDATE, and DELETE statements against the indexed table. The performance benefits you get from indexes with SELECT statements outweigh the cost of maintaining indexes during INSERT, UPDATE, and DELETE. However, if you have to load millions of rows using the INSERT statement into a table that already has many millions of rows, the performance of the fact table population might become unacceptable. In most cases, the only application that uses fact table indexes is Analysis Services while the cube is being processed. If you are using the MOLAP storage mode, the fact table will never be queried except during cube processing. Therefore, after the cube is processed, dropping indexes on the fact table will impose no harm. If you encounter performance issues with fact table loads, consider altering the fact table population routine as follows:
Drop indexes from the fact table.
Load data into the fact table.
Rebuild the clustered index on the fact table.
Rebuild all indexes on the fact table.
Note that when you build a clustered index on a table, all non-clustered indexes are rebuilt. That's why you need to build the clustered index first, prior to building any non-clustered indexes.
Another technique that you might consider for speeding up fact table loads is splitting the fact table into partitionsmultiple smaller tables. In Analysis Services you can build each cube partition by using a different underlying database object (table or view). Therefore, if you accumulate millions of fact rows each week, you might consider building a different table for each month and populating this smaller table instead of keeping all facts in a single gigantic table. Doing so will speed up the incremental cube processing as well as fact table population.
Of course, by partitioning the fact table you improve the performance, but you also create a maintenance problem for yourselfeach month you'll have to create a new table and a new partition within your cube. (A savvy DBA could easily come up with code that will automate such maintenance tasks.)