Data Compression in SQL Server 2012
With data explosion occurring in the industry and enterprise, more and more data is being stored within SQL Server. However, the databases are no longer averaging 100GB in size as they were 10 years ago. Now databases are becoming larger than ever, with sizes from 10 to 50TB now being common. In addition, it is often only a select few tables within the database that are growing to these record-breaking sizes. To combat this, even though storage is getting cheaper, Microsoft has introduced data compression and tools, namely the Data Compression Wizard and specific Transact-SQL statements, to facilitate data compression. Therefore, a DBA can compress tables and indexes to conserve storage space at a slight CPU cost. One of the main design goals of data compression was to shrink data warehouse fact tables. Fact tables are typically the largest tables within a data warehouse because they contain the majority of the data associated with a database.
SQL Server provides two methods, Page and Row compression, to reduce data storage on disk and speed I/O performance by reducing the amount of I/O required for transaction processing. Page and row compression work in different, yet complementary, ways and are worth further discussion.
Page compression uses an algorithm called “deduplication.” When deduplicating, as the name implies, SQL Server looks for duplicate values that appear again and again within the data page. For example, the table HumanResources.Employee in the AdventureWorks2012 database contains an nvarchar(50) column called JobTitle. This column contains many duplicate values—things like “Design Engineer” and Research and Development Manager.” These values might appear many times on a single data page. Using page compression, SQL Server can remove such duplicate values within a data page by replacing each duplicate value with a tiny pointer to a single appearance of the full value. (Incidentally, this is known as “dictionary-based compression.”)
By comparison, row compression does not actually use a compression algorithm per se. Instead, when row compression is enabled, SQL Server simply removes any extra, unused bytes in a fixed data type column, such as a CHAR(50) column.
Page and row compression are not compatible, but by enabling page compression SQL Server automatically includes row compression. You can also specify compression per partition, so partitioned tables could have multiple compression specifications at once.
Using the Data Compression Wizard to Compress Space
The Data Compression Wizard can be used to analyze and compress space associated with a table stored in a heap or clustered index structure. In addition, it can also support compression tables using nonclustered index and index views. Finally, if the table is partitioned, a DBA can compress portions of the tables, and various partitions do not have to maintain the same compressed settings.
The Data Compression Wizard can be invoked by right-clicking any table or index and selecting Storage and then Manage Compression. The first step is to select the compression type based on a partition number, or you can choose to use the same compression for all partitions. The compression options include Row, Page, or None, indicating no compression.
The next step is to click the Calculate button to review the disk cost savings. The cost savings will be displayed in the Requested Compressed Space column based on a partition number. This is displayed in Figure 3.14. The final step includes selecting an output option. It is possible to create a script, run immediately, or schedule the task to commence at a later time.
Figure 3.14. Specifying the compression type with the Data Compression Wizard.
Data Compression with Transact SQL
For those of you who do not like using a GUI, the following examples outline the default syntax for compressing data with Transact-SQL.
Creating a Table with Row Compression Enabled
CREATE TABLE <Table Name> (<Column 1. int, <Column 2> nvarchar(50) ) WITH (DATA_COMPRESSION = ROW); GO
Creating a Table with Page Compression Enabled
CREATE TABLE <Table Name> (<Column 1. int, <Column 2> nvarchar(50) ) WITH (DATA_COMPRESSION = PAGE); GO
The following Transact-SQL syntax illustrates compressing the Sales Order Detail table in the AdventureWorks2012 database by the page compression setting:
USE [AdventureWorks2012] ALTER TABLE [Sales].[SalesOrderDetail] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE )
You can monitor compression activity with the performance counters Page Compression Attempts/sec and Page Compressed/sec found in the SQL Server Access Methods Object. For example, this Transact-SQL code will query both counters:
SELECT cntr_value FROM sys.dm_os_performance_counters WHERE counter_name LIKE 'Page Compression%'
As mentioned earlier, compression is included only in the Enterprise Edition of SQL Server. Compression technologies dramatically cut I/O requests and storage; however, the storage gained is typically at the cost of added CPU activity. Therefore, it is strongly advised that you test before implementing data compression in a production workload scenario. In most transaction processing scenarios, SQL Server has CPU to spare and sorely needs more I/O throughput. However, occasionally the improvement to I/O performance is not big enough to warrant the lost CPU. Test to ensure that performance is not negatively impacted.