未加星标

Columnstore Indexes part 94 (“Use Partitioning Wisely”)

字体大小 | |
[数据库(mssql) 所属分类 数据库(mssql) | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Continuation from the previous 93 parts, the whole series can be found at http://www.nikoport.com/columnstore/ .

This blog post will talk about some of the common problematic practices for the Columnstore Indexes and how to avoid or solve them.

It is not intended as a bashing of the Columnstore Indexes but as a guide of how to detect and resolve them. This is the first blog post in the series of the blog posts about the common problems and solutions.

The focus of this particular blog post is the Partitioning. An enterprise-feature only before the SQL Server 2016 with Service Pack 1, after which it became all-edition one, it has been one of the most beloved feature for any BI & Data Warehousing professionals working with big amounts of data, for managing and loading the data.

Some of the recent investments (specifically for the SQL Server 2016 the TRUNCATE statement supporting on the partition level and all the online rebuilds in the previous editions of the SQL Server) has enabled even wider range of the solutions with my personal hopes lying on the partition level statistics one fine day.

A lot of people are expecting the partitioning to bring some performance improvements by eliminating the partitions, and surely it happens a number of times but there are more quirks about Columnstore Indexes partitioning than one might initially think.

Let’s us start wit the setup first and for the tests, I will use my own generated copy of the TPCH database (1GB version), that I have done with the help of the HammerDB .

As usually, the location of my backup file is C:\Install and I will use the following script to restore and reconfigure the database on SQL Server 2014 & 2016 instances (with an obvious difference that the compatibility level will be set to 120 & 130 accordingly)

/* * This script restores backup of the TPC-H Database from the C:\Install */ USE [master] if exists(select * fromsys.databaseswherename = 'tpch') begin alterdatabase [tpch] setSINGLE_USERWITHROLLBACKIMMEDIATE; end RESTOREDATABASE [tpch] FROMDISK = N'C:\Install\tpch_1gb_new.bak' WITHFILE = 1, NOUNLOAD,STATS = 1 alterdatabase [tpch] setMULTI_USER; GO GO ALTERDATABASE [tpch] SETCOMPATIBILITY_LEVEL = 130 GO USE [tpch] GO EXECdbo.sp_changedbowner @loginame = N'sa', @map = false GO USE [master] GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch',FILEGROWTH = 256152KB ) GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch_log', SIZE = 1200152KB , FILEGROWTH = 256000KB )

Let’s convert 2 biggest tables of the TPCH (lineitem & orders) to the columnstore, by creating the copies of the respective tables with Clustered Columnstore Indexes:

USE [tpch] GO droptableif existsdbo.lineitem_cci; -- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci FROM [dbo].[lineitem]; GO -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci ondbo.lineitem_cci; USE [tpch] GO DROPTABLEIF EXISTSdbo.orders_cci; SELECT [o_orderdate] ,[o_orderkey] ,[o_custkey] ,[o_orderpriority] ,[o_shippriority] ,[o_clerk] ,[o_orderstatus] ,[o_totalprice] ,[o_comment] intodbo.orders_cci FROM [dbo].[orders]; createclusteredcolumnstoreindexcci_orders_cci ondbo.orders_cci; GO Partitioning

Remember kids Partitioning is NOT a PERFORMANCE IMPROVEMENT !

It is a data management improvement that might or might not deliver some improvements.

Partitioning your data wrongly with Columnstore Indexes will be a huge killer for your performance.

The issue here is when you are dealing with partitioning in Columnstore Indexes (especially with the Clustered Columnstore ones), you need to watch out for the size of the row groups and if they are too small (times less than 1048576 rows), your performance will suffer greatly.

For showing the result, I will create a copy the dbo.lineitem table where I shall partition it by day, making on average the size of each of the row groups to be equal to 2375 rows .

The following script will define the partition for each of the dates between 1st of January 1992 and 1st of January of 1999:

DECLARE @bigStringNVARCHAR(MAX) = '', @partFunctionNVARCHAR(MAX); ;WITHcteAS ( SELECTCAST( '1 Jan 1992' AS DATE ) testDate UNIONALL SELECTDATEADD( day, 1, testDate ) FROMcte WHEREtestDate < '31 Dec 1998' ) SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' ) FROMcte OPTION ( MAXRECURSION 5000 ) SELECT @partFunction = 'CREATE PARTITION FUNCTION fn_DailyPartition (DATE) AS RANGE RIGHT FOR VALUES ( ' + cast(STUFF( @bigString, 1, 1, '' )as nvarchar(max)) + ' )' EXECsp_Executesql @partFunction CREATEPARTITIONSCHEMEps_DailyPartScheme AS PARTITIONfn_DailyPartition ALLTO ( [PRIMARY] );

Now let us create a partitioned copy of the dbo.lineitem table, for having less space on my disk drive I went with loading data into an existing Columnstore Index and compressing it with ALTER INDEX REORGANIZE (COMPRESS_ALL_ROW_GROUPS = ON):

-- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci_part FROM [dbo].[lineitem] where 1 = 0; GO -- CreateClusteredIndex createclusteredindexcci_lineitem_cci_part ondbo.lineitem_cci_part ( [l_shipdate] ) WITH (DATA_COMPRESSION = PAGE) ONps_DailyPartScheme( [l_shipdate] ); -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci_part ondbo.lineitem_cci_part WITH (DROP_EXISTING = ON) ONps_DailyPartScheme( [l_shipdate] ); insertintodbo.lineitem_cci_part (l_shipdate, l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_returnflag, l_partkey, l_linestatus, l_tax, l_commitdate, l_receiptdate, l_shipmode, l_linenumber, l_shipinstruct, l_comment) SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] FROM [dbo].[lineitem] alterindexcci_lineitem_cci_part ondbo.lineitem_cci_part reorganizewith (COMPRESS_ALL_ROW_GROUPS = ON);

Now I am ready to test a couple of simple queries, let’s start by simply showing the total discounts from our sales:

setstatisticstime, ioon selectSUM(l_discount) fromdbo.lineitem_cci selectSUM(l_discount) fromdbo.lineitem_cci_part
Columnstore Indexes   part 94 (“Use Partitioning Wisely”)

Needless to say that looking at the execution plans you notice that the actual execution plan shows 10 times difference between them, even though both tables contain the very same data!

The query cost for the partitioned table is staggering it is around 10 times bigger ( ~8.8 ) vs ( ~0.81 ) for the first query.

The execution times reflect in part this situation: 12 ms vs 91 ms . Non-partitioned table perform

本文数据库(mssql)相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库

主题: SQLTIRESTSQL ServerSUMPAOPTUFFUCU
分页:12
转载请注明
本文标题:Columnstore Indexes part 94 (“Use Partitioning Wisely”)
本站链接:http://www.codesec.net/view/522075.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(mssql) | 评论(0) | 阅读(48)