A Page Split in SQL Server the Good, the Nasty and the Smart
A Page Split in SQL Server the Good, the Nasty and the Smart
In SQL Server, a page split occurs on index pages when a row is required to be on a certain page (because of the index key), and there isn’t enough room on the page for it. This can be from either an insert operation, or an update operation. When this occurs, that page is split into two pages, with roughly half of the rows of that original page on each of the pages. The row is then put into the proper page. It is possible that a page split causes higher level leaf nodes to undergo page splits also. Furthermore, all of the page allocations and the data movement is logged in the transaction log.
Paul Randal has defined two different types of page splits . Paul calls the first type of page split a “good” page split, where the storage engine has to add a new page on the right hand side of the index’s leaf level. If you think of a table with an identity column, where the last page for the index for the identity column is too full to hold a new row, then adding a new row will allocate a new page so that the row can be inserted. Paul calls the second type of page split a “nasty” page split, which is when a row expands and the page doesn’t have enough space to hold the changed data, or if a new row needs to go on the page and there isn’t room for it.
In my book, the “good” page split isn’t really a page split, it’s just a new page allocation. However, this is deemed a page split in SQL Server, therefore this is the type of page split that we want to have happening.Identity Columns
A recent SQLSkills newsletter has a discussion about a table that uses an integer identity column, and running out of values. Under normal usage (and the default unless otherwise specified) an identity column starts with the value of 1, and increments by 1. If you have enough rows where you exhaust the positive values, you need to do something so that your application will continue to work. Obviously, the best thing to do is to change this integer column into a bigint column. However, with over 2 billion rows in this table, a long maintenance window is needed to perform this conversion. What if you need to do something now, before this maintenance window? There are another 2+ billion negative values available for use in the integer data type, so we will use those.
In order to do this, the identity column needs to be changed. It can either be changed to start at the most negative value and increment by one, or start at -1 and be decremented by one. In other words, it can either be set to (-2147483648 , 1), or be set to (-1, -1).Page Splits on Identity Columns
In considering which of these methods is preferred, we need to consider whether page splits impact these methods especially nasty page splits. Furthermore, how will index maintenance affect each choice? So let’s think this through.
When there are negative values in this column, and the index is rebuilt, there will be a page with both negative and positive values in it. If the identity column is set to (-1, -1), there won’t be a gap (excluding the 0) in the values, and newly added rows will get a new page allocated a good page split. If the identity column is set to (-2147483648 , 1), then there will be a full page with the records for the most recently used identity value, and with the values starting with 1 a rather large gap.
When a new row is added, it will need to be added into the gap on this page, which will need to be split so that the new row can fit in before the value of 1. As more rows are added, the page with the value of 1 will again be used, and then need to be split again. As long as there are rows to be added, this cycle will continue. Therefore, considering the page splits, this choice seems to be a bad choice.
Well, that is my reasoning for how things will work. However, it’s best to test out your theory especially if you are disagreeing with Paul. So, I will create a table with an identity column (set to 1, 1), and insert some rows. I will then change the identity column to the values to be tested, and add some more rows. I’ll throw in an index rebuild during this mix so that there will be a page with both the positive and negative values, and then insert more rows and see how this affects the page splits. With this in mind, the code to set up this test environment is as follows:USE master; IF DB_ID('PageSplits') IS NOT NULL DROP DATABASE PageSplits; GO CREATE DATABASE PageSplits; GO USE PageSplits; GO CREATE TABLE dbo.LotsaSplits( RowID INTEGER IDENTITY (1 , 1) PRIMARY KEY CLUSTERED, Col01 CHAR(1000) ); GO -- put some positive numbers in there WITH Tens(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) INSERT INTO dbo.LotsaSplits(Col01) SELECTTOP (995) CONVERT(CHAR(1000), N) FROMTally; Reset the Identity Column
Next, the identity column needs to be reset to use the new seed / increment value and this is where I ran into a problem with the test. While I can use DBCC CHECKIDENT to change the seed value, there is no way to change the increment value. Therefore, I can’t test my preferred method by changing the identity column to (-1, -1). However, I can simulate it by creating a table with the initial identity value at (-1, -1) and then using SET IDENTITY_INSERT to put in the positive values. Furthermore, I’ll need a second table to test the positive increment, so I’ll just create it with the most negative seed value and insert the positive values into it. The new code to setup the environment is:USE master; IF DB_ID('PageSplits') IS NOT NULL DROP DATABASE PageSplits; GO CREATE DATABASE PageSplits; GO USE PageSplits; GO -- since you can't change the increment, set it this way and use identity_insert to put positive values in there. CREATE TABLE dbo.LotsaSplits1( RowID INTEGER IDENTITY (-1 , -1) PRIMARY KEY CLUSTERED, Col01 CHAR(1000) ); GO CREATE TABLE dbo.LotsaSplits2( RowID INTEGER IDENTITY (-2147483648 , 1) PRIMARY KEY CLUSTERED, Col01 CHAR(1000) ); GO -- put some positive numbers in there SET IDENTITY_INSERT dbo.LotsaSplits1 ON; WITH Tens(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) INSERT INTO dbo.LotsaSplits1(RowID, Col01) SELECTTOP (995) N, CONVERT(CHAR(1000), N) FROMTally; SET IDENTITY_INSERT dbo.LotsaSplits1 OFF; -- copy these into the other table SET IDENTITY_INSERT dbo.LotsaSplits2 ON; INSERT INTO dbo.LotsaSplits2(RowID, Col01) SELECT RowID, Col01 FROM dbo.LotsaSplits1; SET IDENTITY_INSERT dbo.LotsaSplits2 OFF; GO
The next part of this test is to create a page that has both positive and negative values in it. Since this test is to test out the second option, I’ll do this just for the second table. This code will insert a few rows, and then rebuild the index on this table. Finally it will show how the data on the page looks.-- reseed the identity value DBCC CHECKIDENT(LotsaSplits2, RE
本文数据库（mssql）相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库
本文标题：A Page Split in SQL Server the Good, the Nasty and the Smart