Columnstore Indexes part 94 (“Use Partitioning Wisely”)
Continuation from the previous 93 parts, the whole series can be found at http://www.nikoport.com/columnstore/ .
This blog post will talk about some of the common problematic practices for the Columnstore Indexes and how to avoid or solve them.
It is not intended as a bashing of the Columnstore Indexes but as a guide of how to detect and resolve them. This is the first blog post in the series of the blog posts about the common problems and solutions.
The focus of this particular blog post is the Partitioning. An enterprise-feature only before the SQL Server 2016 with Service Pack 1, after which it became all-edition one, it has been one of the most beloved feature for any BI & Data Warehousing professionals working with big amounts of data, for managing and loading the data.
Some of the recent investments (specifically for the SQL Server 2016 the TRUNCATE statement supporting on the partition level and all the online rebuilds in the previous editions of the SQL Server) has enabled even wider range of the solutions with my personal hopes lying on the partition level statistics one fine day.
A lot of people are expecting the partitioning to bring some performance improvements by eliminating the partitions, and surely it happens a number of times but there are more quirks about Columnstore Indexes partitioning than one might initially think.
Let’s us start wit the setup first and for the tests, I will use my own generated copy of the TPCH database (1GB version), that I have done with the help of the HammerDB .
As usually, the location of my backup file is C:\Install and I will use the following script to restore and reconfigure the database on SQL Server 2014 & 2016 instances (with an obvious difference that the compatibility level will be set to 120 & 130 accordingly)/* * This script restores backup of the TPC-H Database from the C:\Install */ USE [master] if exists(select * fromsys.databaseswherename = 'tpch') begin alterdatabase [tpch] setSINGLE_USERWITHROLLBACKIMMEDIATE; end RESTOREDATABASE [tpch] FROMDISK = N'C:\Install\tpch_1gb_new.bak' WITHFILE = 1, NOUNLOAD,STATS = 1 alterdatabase [tpch] setMULTI_USER; GO GO ALTERDATABASE [tpch] SETCOMPATIBILITY_LEVEL = 130 GO USE [tpch] GO EXECdbo.sp_changedbowner @loginame = N'sa', @map = false GO USE [master] GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch',FILEGROWTH = 256152KB ) GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch_log', SIZE = 1200152KB , FILEGROWTH = 256000KB )
Let’s convert 2 biggest tables of the TPCH (lineitem & orders) to the columnstore, by creating the copies of the respective tables with Clustered Columnstore Indexes:USE [tpch] GO droptableif existsdbo.lineitem_cci; -- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci FROM [dbo].[lineitem]; GO -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci ondbo.lineitem_cci; USE [tpch] GO DROPTABLEIF EXISTSdbo.orders_cci; SELECT [o_orderdate] ,[o_orderkey] ,[o_custkey] ,[o_orderpriority] ,[o_shippriority] ,[o_clerk] ,[o_orderstatus] ,[o_totalprice] ,[o_comment] intodbo.orders_cci FROM [dbo].[orders]; createclusteredcolumnstoreindexcci_orders_cci ondbo.orders_cci; GO Partitioning
Remember kids Partitioning is NOT a PERFORMANCE IMPROVEMENT !
It is a data management improvement that might or might not deliver some improvements.
Partitioning your data wrongly with Columnstore Indexes will be a huge killer for your performance.
The issue here is when you are dealing with partitioning in Columnstore Indexes (especially with the Clustered Columnstore ones), you need to watch out for the size of the row groups and if they are too small (times less than 1048576 rows), your performance will suffer greatly.
For showing the result, I will create a copy the dbo.lineitem table where I shall partition it by day, making on average the size of each of the row groups to be equal to 2375 rows .
The following script will define the partition for each of the dates between 1st of January 1992 and 1st of January of 1999:DECLARE @bigStringNVARCHAR(MAX) = '', @partFunctionNVARCHAR(MAX); ;WITHcteAS ( SELECTCAST( '1 Jan 1992' AS DATE ) testDate UNIONALL SELECTDATEADD( day, 1, testDate ) FROMcte WHEREtestDate < '31 Dec 1998' ) SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' ) FROMcte OPTION ( MAXRECURSION 5000 ) SELECT @partFunction = 'CREATE PARTITION FUNCTION fn_DailyPartition (DATE) AS RANGE RIGHT FOR VALUES ( ' + cast(STUFF( @bigString, 1, 1, '' )as nvarchar(max)) + ' )' EXECsp_Executesql @partFunction CREATEPARTITIONSCHEMEps_DailyPartScheme AS PARTITIONfn_DailyPartition ALLTO ( [PRIMARY] );
Now let us create a partitioned copy of the dbo.lineitem table, for having less space on my disk drive I went with loading data into an existing Columnstore Index and compressing it with ALTER INDEX REORGANIZE (COMPRESS_ALL_ROW_GROUPS = ON):-- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci_part FROM [dbo].[lineitem] where 1 = 0; GO -- CreateClusteredIndex createclusteredindexcci_lineitem_cci_part ondbo.lineitem_cci_part ( [l_shipdate] ) WITH (DATA_COMPRESSION = PAGE) ONps_DailyPartScheme( [l_shipdate] ); -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci_part ondbo.lineitem_cci_part WITH (DROP_EXISTING = ON) ONps_DailyPartScheme( [l_shipdate] ); insertintodbo.lineitem_cci_part (l_shipdate, l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_returnflag, l_partkey, l_linestatus, l_tax, l_commitdate, l_receiptdate, l_shipmode, l_linenumber, l_shipinstruct, l_comment) SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] FROM [dbo].[lineitem] alterindexcci_lineitem_cci_part ondbo.lineitem_cci_part reorganizewith (COMPRESS_ALL_ROW_GROUPS = ON);
Now I am ready to test a couple of simple queries, let’s start by simply showing the total discounts from our sales:setstatisticstime, ioon selectSUM(l_discount) fromdbo.lineitem_cci selectSUM(l_discount) fromdbo.lineitem_cci_part
Needless to say that looking at the execution plans you notice that the actual execution plan shows 10 times difference between them, even though both tables contain the very same data!
The query cost for the partitioned table is staggering it is around 10 times bigger ( ~8.8 ) vs ( ~0.81 ) for the first query.The execution times reflect in part this situation: 12 ms vs 91 ms . Non-partitioned table perform
本文数据库（mssql）相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库
本文标题：Columnstore Indexes part 94 (“Use Partitioning Wisely”)