未加星标

Columnstore Indexes part 95 (“Basic Query Patterns”)

字体大小 | |
[数据库(mssql) 所属分类 数据库(mssql) | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Continuation from the previous 94 parts, the whole series can be found at http://www.nikoport.com/columnstore/ .

In this blog post I am continuing showing some of the solutions for the common problematic places for the Columnstore Indexes.

Here I will focus on 2 important aspects of the query execution plans that are using the Columnstore indexes:

Non-Hash Joins

Query construction (focus on the engine improvements)

This is not a complete list of the patterns that I am pretending to publish, more stuff will be added as soon, but these 2 are quite common in my experience.

Before advancing I will use my own generated copy of the TPCH database (1GB version), that I have done with the help of the HammerDB .

As usually, the location of my backup file is C:\Install and I will use the following script to restore and reconfigure the database on SQL Server 2016 instance by making sure that the compatibility level is set to 130:

/* * This script restores backup of the TPC-H Database from the C:\Install */ USE [master] if exists(select * fromsys.databaseswherename = 'tpch') begin alterdatabase [tpch] setSINGLE_USERWITHROLLBACKIMMEDIATE; end RESTOREDATABASE [tpch] FROMDISK = N'C:\Install\tpch_1gb_new.bak' WITHFILE = 1, NOUNLOAD,STATS = 1 alterdatabase [tpch] setMULTI_USER; GO GO ALTERDATABASE [tpch] SETCOMPATIBILITY_LEVEL = 130 GO USE [tpch] GO EXECdbo.sp_changedbowner @loginame = N'sa', @map = false GO USE [master] GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch',FILEGROWTH = 256152KB ) GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch_log', SIZE = 1200152KB , FILEGROWTH = 256000KB )

Let’s convert 2 biggest tables of the TPCH (lineitem & orders) to the columnstore, by creating the copies of the respective tables with Clustered Columnstore Indexes:

USE [tpch] GO droptableif existsdbo.lineitem_cci; -- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci FROM [dbo].[lineitem]; GO -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci ondbo.lineitem_cci; USE [tpch] GO DROPTABLEIF EXISTSdbo.orders_cci; SELECT [o_orderdate] ,[o_orderkey] ,[o_custkey] ,[o_orderpriority] ,[o_shippriority] ,[o_clerk] ,[o_orderstatus] ,[o_totalprice] ,[o_comment] intodbo.orders_cci FROM [dbo].[orders]; createclusteredcolumnstoreindexcci_orders_cci ondbo.orders_cci; GO Non-Hash Joins

The importance of having Hash Joins in the SQL Server with the Columnstore Indexes can not be overstated. Unless you are processing very small amounts of data out of the Columnstore Indexes (under 1000), the only join type you will want to see is the HASH JOIN.

The difference can be abysmal in the terms of the used resources and overall performance that it will take you down not only on the level of the rowstore indexes, but because you might be dealing with the

Consider running the following query which is returning top 100 orders and the sum of the top 10 line item volume sales for each of the orders:

setstatisticstime, ioon selectTOP 100 o_orderkey, o.o_orderdate, MAX(o.o_totalprice) as orderprice, SUM(Volume) as Top10VolumeSales fromdbo.orders_cci o crossapply ( selecttop 10 (l_extendedprice * (1 - l_discount)) as volume fromlineitem_ccili whereo_orderkey = li.l_orderkey orderby(l_extendedprice * (1 - l_discount)) desc ) f where o.o_orderstatus <> 'F' groupby o.o_orderdate, o_orderkey ORDERBYSUM(o.o_totalprice) DESC

On my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time .

The relevant part of the execution plan can be found below,


Columnstore Indexes   part 95 (“Basic Query Patterns”)

showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.


Columnstore Indexes   part 95 (“Basic Query Patterns”)
To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.

One of the possible solutions that I see in such cases is to rewrite the query where possible without the CROSS APPLY part, thus enabling the query to be executed with the Batch Execution Mode completely by using HASH JOIN for the connection between the 2 tables:

setstatisticstime, ioon selectTOP 100 o.o_orderkey, o.o_orderdate, MAX(o.o_totalprice) as orderprice, SUM(Volume) as Top10Volume fromdbo.orders_cci o innerjoin ( selectROW_NUMBER() OVER (PARTITIONBYl_orderkeyORDERBYsum(l_extendedprice * (1 - l_discount)) desc) AS rowNumber, sum(l_extendedprice * (1 - l_discount)) as volume, li.l_orderkey fromlineitem_ccili groupbyl_orderkey ) f ono_orderkey = f.l_orderkey where o.o_orderstatus <> 'F' and rowNumber < 10 groupby o.o_orderdate, o.o_orderkey ORDERBYSUM(o.o_totalprice) DESC

Take a look at the relevant part of the execution plan below:


Columnstore Indexes   part 95 (“Basic Query Patterns”)

here we have every single iterator running in the Batch Execution Mode, making this query lightning fast to be executed under 1 seconds with just 2.8 seconds spent off the CPU time . There is no need to compare the performance of these 2 queries even though both of them are using Columnstore Indexes and the first one is using Batch Execution Mode in around 40% of the existing iterators.

Naturally it is not always possible to rewrite a complex DWH query in such a light way as in this example, but the idea here is to provide the tip that the LOOP join is typically not a good thing to be found within your execution plan with Columnstore Indexes when dealing with hundreds of thousands or millions of rows. There are times when Query Optimiser will select the LOOP JOIN because of the statistics distribution or any other possible reasons, and if there is nothing that you can do about it, then your natural option should be forcing the HASH JOIN with the OPTION (HASH JOIN) or HASH JOIN hints.

Another important aspect to notice here is that the CROSS APPLY does not automatically mean that the execution plan will use NESTED LOOP JOIN, please consider the notice the following query below which is using HASH JOINS and runs fast as it should:

selectTOP 100 o_orderkey, o.o_orderdate, MAX(o.o_totalprice) as orderprice, SUM(Volume) as

本文数据库(mssql)相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库

主题: SQLCPURESTSUTISQL ServerMPAOPTEDI
分页:12
转载请注明
本文标题:Columnstore Indexes part 95 (“Basic Query Patterns”)
本站链接:http://www.codesec.net/view/522081.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(mssql) | 评论(0) | 阅读(42)