- Data Warehouse Lifecycle
- The Sample Scenario
- Dimensional Modeling
- Summary
Dimensional Modeling
Dimensional modeling is somewhat different from its relational counterpart. I won’t go into details of dimensional modeling here because such concepts have fine coverage in several books that each DW architect should read. Most commonly referenced dimensional modeling authors are Bill Inmon and Ralph Kimball.
For the purposes of this article, I’ll suffice to say that dimensional models consist of the fact and dimension tables. Typical fact tables contain numerous foreign keys referencing dimension tables. Dimension tables, on the other hand, usually contain very few columns—dimension key, value, create, and update date, and perhaps an obsolete date. Fact tables record occurrences of a measurable fact, such as customer orders. Dimension tables provide a way to slice business data across various diagonals of company’s operations; for example, we can examine orders by customer or by product.
You can use the "obsolete_date" column within dimension tables to track the history of values that change over time. This concept is known as slowly changing dimension. For example, consumers of your products might change their last names due to marriage, divorce, or for another personal reason. Similarly multiple departments within your organization can be combined into one, or one department can be divided. In some cases, you care to keep just the current value. If so, consider yourself lucky—you can simply override the existing value with the new value in the dimension table. In other cases, you must keep track of the old value as well as the new value. This is when you use the record obsolete date to track the timeframe during which the record was valid.
Northwind trader’s dimensional model will be very simple, consisting of four-dimension tables and a fact table. Notice that because this is just a static database and I won’t have any new data to populate it regularly, I won’t add the obsolete_date column to the dimensions. You can create fact and dimension tables using the following script:
CREATE TABLE dbo.dim_supplier( supplier_ident INT IDENTITY(1, 1), supplier_id INT NOT NULL, supplier_name VARCHAR(255) NOT NULL, supplier_city VARCHAR (255) NULL, country VARCHAR(255) NULL ) CREATE TABLE dbo.dim_product ( product_ident INT IDENTITY(1, 1), product_id INT NOT NULL, product_name VARCHAR(255) NOT NULL, discontinued BIT NOT NULL ) CREATE TABLE dbo.dim_customer ( customer_ident INT IDENTITY(1, 1), customer_id VARCHAR(20) NOT NULL, customer_name VARCHAR(255) NOT NULL, customer_city VARCHAR(255) NULL, customer_country VARCHAR(255) NULL ) CREATE TABLE dbo.dim_employee ( employee_ident INT IDENTITY(1, 1), employee_id INT NOT NULL, employee_name VARCHAR(85) NOT NULL, employee_city VARCHAR(255) NULL, employee_country VARCHAR(255) NULL ) CREATE TABLE dbo.dim_time ( time_member_key INT NOT NULL , calendar_date_dt DATETIME NOT NULL , calendar_day_of_week_num INT NOT NULL , calendar_day_of_week_name VARCHAR(15) NOT NULL , calendar_day_of_month_num INT NOT NULL , calendar_day_of_year_num INT NOT NULL , calendar_week_num INT NOT NULL , calendar_month_num INT NOT NULL , calendar_month_name VARCHAR (15) NOT NULL , calendar_quarter_num INT NOT NULL , calendar_year_num INT NOT NULL ) CREATE TABLE fact_sales ( customer_ident INT NOT NULL, product_ident INT NOT NULL, employee_ident INT NOT NULL, supplier_ident INT NOT NULL, total_sale SMALLMONEY NOT NULL, time_member_key INT NOT NULL )
Next let’s populate these tables using the following queries:
-- supplier dimension: INSERT dim_supplier ( supplier_id , supplier_name , supplier_city , country ) SELECT supplierid, companyname, city, country FROM suppliers -- product dimension: INSERT dim_product ( product_id, product_name, discontinued) SELECT productid, productname, discontinued FROM products -- customer dimension: INSERT dim_customer ( customer_id, customer_name, customer_city, customer_country) SELECT customerid, companyname, city, country FROM customers -- employee dimension: INSERT dim_employee ( employee_id, employee_name, employee_city, employee_country) SELECT employeeid, TItleOfCourtesy + ’ ’ + FirstName + ’ ’ + LastName AS employee_name, city, country FROM employees
Notice that dim_time is a special dimension. It isn’t populated by data that is already in the warehouse. Instead we populate it with calendar dates and date parts (day, month, quarter, year, and so forth) so that we can aggregate warehouse data as needed. You can come up with a routine that populates your own time dimension; here is a sample store procedure that I use to populate the time dimension:
CREATE PROCEDURE load_dim_time ( @dim_table_name VARCHAR(255), @start_date_dt SMALLDATETIME, @end_date_dt SMALLDATETIME ) AS SET NOCOUNT ON DECLARE @sql_string NVARCHAR(1024) , @time_member_key INT , @calendar_date_dt SMALLDATETIME , @calendar_day_of_week_num INT , @calendar_day_of_week_name VARCHAR(10) , @calendar_day_of_month_num INT , @calendar_day_of_year_num INT , @calendar_week_num INT , @calendar_month_num INT , @calendar_month_name VARCHAR(10) , @calendar_quarter_num INT , @calendar_year_num INT SET @calendar_date_dt = @start_date_dt WHILE (@calendar_date_dt <= @end_date_dt) BEGIN IF NOT EXISTS ( SELECT time_member_key FROM dim_time WHERE calendar_date_dt = @calendar_date_dt ) BEGIN SELECT @calendar_day_of_week_num = DATEPART(dw, @calendar_date_dt) , @calendar_day_of_week_name = DATENAME(WEEKDAY, @calendar_date_dt) , @calendar_day_of_month_num = DATEPART(DD, @calendar_date_dt) , @calendar_day_of_year_num = DATEPART(DY, @calendar_date_dt) , @calendar_week_num = DATEPART(WK, @calendar_date_dt) , @calendar_month_num = DATEPART(M, @calendar_date_dt) , @calendar_month_name = DATENAME(MONTH, @calendar_date_dt) , @calendar_quarter_num = DATEPART(QQ, @calendar_date_dt) , @calendar_year_num = DATEPART(YYYY, @calendar_date_dt) , @time_member_key = CAST( CAST(@calendar_year_num AS VARCHAR) + RIGHT(’00’ + CAST(@calendar_day_of_year_num AS VARCHAR), 3) AS INT) SELECT @sql_string = ’INSERT INTO ’ + @dim_table_name + ’ (’ + ’time_member_key, ’ + ’calendar_date_dt, ’ + ’calendar_day_of_week_num,’ + ’calendar_day_of_week_name,’ + ’calendar_day_of_month_num,’ + ’calendar_day_of_year_num,’ + ’calendar_week_num,’ + ’calendar_month_num,’ + ’calendar_month_name,’ + ’calendar_quarter_num, ’ + ’calendar_year_num’ + ’) ’ + ’VALUES ’ + ’(’ + CHAR(39) + CAST(@time_member_key AS VARCHAR) + CHAR(39) + ’,’ + CHAR(39) + CAST(@calendar_date_dt AS VARCHAR) + CHAR(39) + ’,’ + CAST(@calendar_day_of_week_num AS VARCHAR) + ’,’ + CHAR(39) + @calendar_day_of_week_name + CHAR(39) + ’,’ + CAST(@calendar_day_of_month_num AS VARCHAR) + ’,’ + CAST(@calendar_day_of_year_num AS VARCHAR) + ’,’ + CAST(@calendar_week_num AS VARCHAR) + ’,’ + CAST(@calendar_month_num AS VARCHAR) + ’,’ + CHAR(39) + @calendar_month_name + CHAR(39) + ’,’ + CAST(@calendar_quarter_num AS VARCHAR) + ’,’ + CAST(@calendar_year_num AS VARCHAR) + ’)’ EXEC sp_executesql @sql_string END SET @calendar_date_dt = @calendar_date_dt + 1 END /* now use load_dim_time procedure to populate dim_time table with needed dates */ EXEC load_dim_time dim_time, ’1/1/96’, ’1/1/99’
SQL Server has a fine ETL tool—Data Transformation Services (DTS)—which you can leverage to execute and schedule DW population routines. A typical DTS package determines which data rows need to be extracted from their source and inserts such rows into appropriate dimension and fact tables. Because this is a sample application, I won’t need to create any DTS packages, but keep in mind that real-world ETL routines can get quite complicated and might take several weeks to develop.