- What's New for DBAs When Administering Storage on SQL Server 2012
- Storage Hardware Overview
- Designing and Administering Storage on SQL Server 2012
- Designing for BLOB Storage
- Designing and Administrating Partitions in SQL Server 2012
- Data Compression in SQL Server 2012
- Summary
- Best Practices
Designing and Administrating Partitions in SQL Server 2012
A popular method of better managing large and active tables and indexes is the use of partitioning. Partitioning is a feature for segregating I/O workload within SQL Server database so that I/O can be better balanced against available I/O subsystems while providing better user response time, lower I/O latency, and faster backups and recovery. By partitioning tables and indexes across multiple filegroups, data retrieval and management is much quicker because only subsets of the data are used, meanwhile ensuring that the integrity of the database as a whole remains intact.
After a table or index is partitioned, data is stored horizontally across multiple filegroups, so groups of data are mapped to individual partitions. Typical scenarios for partitioning include large tables that become very difficult to manage, tables that are suffering performance degradation because of excessive I/O or blocking locks, table-centric maintenance processes that exceed the available time for maintenance, and moving historical data from the active portion of a table to a partition with less activity.
Partitioning tables and indexes warrants a bit of planning before putting them into production. The usual approach to partitioning a table or index follows these steps:
- Create the filegroup(s) and file(s) used to hold the partitions defined by the partitioning scheme.
- Create a partition function to map the rows of the table or index to specific partitions based on the values in a specified column. A very common partitioning function is based on the creation date of the record.
- Create a partitioning scheme to map the partitions of the partitioned table to the specified filegroup(s) and, thereby, to specific locations on the Windows file system.
- Create the table or index (or ALTER an existing table or index) by specifying the partition scheme as the storage location for the partitioned object.
Although Transact-SQL commands are available to perform every step described earlier, the Create Partition Wizard makes the entire process quick and easy through an intuitive point-and-click interface. The next section provides an overview of using the Create Partition Wizard in SQL Server 2012, and an example later in this section shows the Transact-SQL commands.
Leveraging the Create Partition Wizard to Create Table and Index Partitions
The Create Partition Wizard can be used to divide data in large tables across multiple filegroups to increase performance and can be invoked by right-clicking any table or index, selecting Storage, and then selecting Create Partition. The first step is to identify which columns to partition by reviewing all the columns available in the Available Partitioning Columns section located on the Select a Partitioning Column dialog box, as displayed in Figure 3.13. This screen also includes additional options such as the following:
- Collocate to an Available Partitioned Table—Displays related data to join with the column being partitioned.
- Storage Align Non Unique Indexes and Unique Indexes with an Indexed Partition Column—Aligns all indexes of the table being partitioned with the same partition scheme. If you do not select this option, you may place indexes independently of the columns they point to.
Figure 3.13. Selecting a partitioning column.
The next screen is called Select a Partition Function. This page is used for specifying the partition function where the data will be partitioned. The options include using an existing partition or creating a new partition. The subsequent page is called New Partition Scheme. Here a DBA will conduct a mapping of the rows selected of tables being partitioned to a desired filegroup. Either a new partition scheme should be used or a new one needs to be created. The final screen is used for doing the actual mapping. On the Map Partitions page, specify the partitions to be used for each partition and then enter a range for the values of the partitions. The ranges and settings on the grid include the following:
- Filegroup—Enter the desired filegroup for the partition.
Left and Right Boundary—Used for entering range values up to a specified value. Left boundary is based on Value <= Boundary and Right boundary is based on Value < Boundary.
- RowCount—Read-only columns that display required space and are determined only when the Estimate Storage button is clicked.
- Required Space—Read-only columns that display required space and are determined only when the Estimate Storage button is clicked.
- Available Space—Read-only columns that display available space and are determined only when the Estimate Storage button is clicked.
- Estimate Storage—When selected, this option determines the rowcount, required, and available space.
Designing table and index partitions is a DBA task that typically requires a joint effort with the database development team. The DBA must have a strong understanding of the database, tables, and columns to make the correct choices for partitioning. For more information on partitioning, review Books Online.
Enhancements to Partitioning in SQL Server 2012
SQL Server 2012 now supports as many as 15,000 partitions. When using more than 1,000 partitions, Microsoft recommends that the instance of SQL Server have at least 16Gb of available memory. This recommendation particularly applies to partitioned indexes, especially those that are not aligned with the base table or with the clustered index of the table. Other Data Manipulation Language statements (DML) and Data Definition Language statements (DDL) may also run short of memory when processing on a large number of partitions.
Certain DBCC commands may take longer to execute when processing a large number of partitions. On the other hand, a few DBCC commands can be scoped to the partition level and, if so, can be used to perform their function on a subset of data in the partitioned table.
Queries may also benefit from a new query engine enhancement called partition elimination. SQL Server uses partition enhancement automatically if it is available. Here’s how it works. Assume a table has four partitions, with all the data for customers whose names begin with R, S, or T in the third partition. If a query’s WHERE clause filters on customer name looking for ‘System%’, the query engine knows that it needs only to partition three to answer the request. Thus, it might greatly reduce I/O for that query. On the other hand, some queries might take longer if there are more than 1,000 partitions and the query is not able to perform partition elimination.
Finally, SQL Server 2012 introduces some changes and improvements to the algorithms used to calculate partitioned index statistics. Primarily, SQL Server 2012 samples rows in a partitioned index when it is created or rebuilt, rather than scanning all available rows. This may sometimes result in somewhat different query behavior compared to the same queries running on SQL Server 2012.
Administrating Data Using Partition Switching
Partitioning is useful to access and manage a subset of data while losing none of the integrity of the entire data set. There is one limitation, though. When a partition is created on an existing table, new data is added to a specific partition or to the default partition if none is specified. That means the default partition might grow unwieldy if it is left unmanaged. (This concept is similar to how a clustered index needs to be rebuilt from time to time to reestablish its fill factor setting.)
Switching partitions is a fast operation because no physical movement of data takes place. Instead, only the metadata pointers to the physical data are altered.
You can alter partitions using SQL Server Management Studio or with the ALTER TABLE...SWITCH Transact-SQL statement. Both options enable you to ensure partitions are well maintained. For example, you can transfer subsets of data between partitions, move tables between partitions, or combine partitions together. Because the ALTER TABLE...SWITCH statement does not actually move the data, a few prerequisites must be in place:
- Partitions must use the same column when switching between two partitions.
- The source and target table must exist prior to the switch and must be on the same filegroup, along with their corresponding indexes, index partitions, and indexed view partitions.
- The target partition must exist prior to the switch, and it must be empty, whether adding a table to an existing partitioned table or moving a partition from one table to another. The same holds true when moving a partitioned table to a nonpartitioned table structure.
- The source and target tables must have the same columns in identical order with the same names, data types, and data type attributes (length, precision, scale, and nullability). Computed columns must have identical syntax, as well as primary key constraints. The tables must also have the same settings for ANSI_NULLS and QUOTED_IDENTIFIER properties. Clustered and nonclustered indexes must be identical. ROWGUID properties and XML schemas must match. Finally, settings for in-row data storage must also be the same.
- The source and target tables must have matching nullability on the partitioning column. Although both NULL and NOT NULL are supported, NOT NULL is strongly recommended.
Likewise, the ALTER TABLE...SWITCH statement will not work under certain circumstances:
- Full-text indexes, XML indexes, and old-fashioned SQL Server rules are not allowed (though CHECK constraints are allowed).
- Tables in a merge replication scheme are not allowed. Tables in a transactional replication scheme are allowed with special caveats. Triggers are allowed on tables but must not fire during the switch.
- Indexes on the source and target table must reside on the same partition as the tables themselves.
- Indexed views make partition switching difficult and have a lot of extra rules about how and when they can be switched. Refer to the SQL Server Books Online if you want to perform partition switching on tables containing indexed views.
- Referential integrity can impact the use of partition switching. First, foreign keys on other tables cannot reference the source table. If the source table holds the primary key, it cannot have a primary or foreign key relationship with the target table. If the target table holds the foreign key, it cannot have a primary or foreign key relationship with the source table.
In summary, simple tables can easily accommodate partition switching. The more complexity a source or target table exhibits, the more likely that careful planning and extra work will be required to even make partition switching possible, let alone efficient.
Here’s an example where we create a partitioned table using a previously created partition scheme, called Date_Range_PartScheme1. We then create a new, nonpartitioned table identical to the partitioned table residing on the same filegroup. We finish up switching the data from the partitioned table into the nonpartitioned table:
CREATE TABLE TransactionHistory_Partn1 (Xn_Hst_ID int, Xn_Type char(10)) ON Date_Range_PartScheme1 (Xn_Hst_ID) ; GO CREATE TABLE TransactionHistory_No_Partn (Xn_Hst_ID int, Xn_Type char(10)) ON main_filegroup ; GO ALTER TABLE TransactionHistory_Partn1 SWITCH partition1 TO TransactionHistory_No_Partn; GO
The next section shows how to use a more sophisticated, but very popular, approach to partition switching called a sliding window partition.
Example and Best Practices for Managing Sliding Window Partitions
Assume that our AdventureWorks business is booming. The sales staff, and by extension the AdventureWorks2012 database, is very busy. We noticed over time that the TransactionHistory table is very active as sales transactions are first entered and are still very active over their first month in the database. But the older the transactions are, the less activity they see. Consequently, we’d like to automatically group transactions into four partitions per year, basically containing one quarter of the year’s data each, in a rolling partitioning. Any transaction older than one year will be purged or archived.
The answer to a scenario like the preceding one is called a sliding window partition because we are constantly loading new data in and sliding old data over, eventually to be purged or archived. Before you begin, you must choose either a LEFT partition function window or a RIGHT partition function window:
- How data is handled varies according to the choice of LEFT or RIGHT partition function window:
- With a LEFT strategy, partition1 holds the oldest data (Q4 data), partition2 holds data that is 6- to 9-months old (Q3), partition3 holds data that is 3- to 6-months old (Q2), and partition4 holds recent data less than 3-months old.
- With a RIGHT strategy, partition4 holds the holds data (Q4), partition3 holds Q3 data, partition2 holds Q2 data, and partition1 holds recent data.
- Following the best practice, make sure there are empty partitions on both the leading edge (partition0) and trailing edge (partition5) of the partition.
- RIGHT range functions usually make more sense to most people because it is natural for most people to to start ranges at their lowest value and work upward from there.
- Assuming that a RIGHT partition function windows is used, we first use the SPLIT subclause of the ALTER PARTITION FUNCTION statement to split empty partition5 into two empty partitions, 5 and 6.
- We use the SWITCH subclause of ALTER TABLE to switch out partition4 to a staging table for archiving or simply to drop and purge the data. Partition4 is now empty.
- We can then use MERGE to combine the empty partitions 4 and 5, so that we’re back to the same number of partitions as when we started. This way, partition3 becomes the new partition4, partition2 becomes the new partition3, and partition1 becomes the new partition2.
- We can use SWITCH to push the new quarter’s data into the spot of partition1.
Some best practices to consider for using a slide window partition include the following:
- Load newest data into a heap, and then add indexes after the load is finished. Delete oldest data or, when working with very large data sets, drop the partition with the oldest data.
- Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when loading in new data, and merge, after unloading old data, do not cause data movement.
- Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.
- Create the load staging table in the same filegroup as the partition you are loading.
- Create the unload staging table in the same filegroup as the partition you are deleting.
- Don’t load a partition until its range boundary is met. For example, don’t create and load a partition meant to hold data that is one to two months older before the current data has aged one month. Instead, continue to allow the latest partition to accumulate data until the data is ready for a new, full partition.
- Unload one partition at a time.
- The ALTER TABLE...SWITCH statement issues a schema lock on the entire table. Keep this in mind if regular transactional activity is still going on while a table is being partitioned.