未加星标

SQL Server v.Next : STRING_AGG Performance, Part 2

字体大小 | |
[数据库(mssql) 所属分类 数据库(mssql) | 发布者 店小二03 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Last week, I made a couple of quick performance comparisons, pitting the new STRING_AGG() function against the traditional FOR XML PATH approach I've used for ages. I tested both undefined/arbitrary order as well as explicit order, and STRING_AGG() came out on top in both cases:

SQL Server v.Next : STRING_AGG() Performance, Part 1

For those tests, I left out several things (not all intentionally):

Mikael Eriksson and Grzegorz yp both pointed out that I was not using the absolute most efficient FOR XML PATH construct (and to be clear, I never have). I did not perform any tests on linux; only on windows. I don't expect those to be vastly different, but since Grzegorz saw very different durations, this is worth further investigation. I also only tested when output would be a finite, non-LOB string which I believe is the most common use case (I don't think people will commonly be concatenating every row in a table into a single comma-separated string, but this is why I asked in my previous post for your use case(s)). For the ordering tests, I did not create an index that might be helpful (or try anything where all the data came from a single table).

In this post, I'm going to deal with a couple of these items, but not all of them.

FOR XML PATH

I had been using the following:

... FOR XML PATH, TYPE).value(N'.[1]', ...

After this comment from Mikael , I have updated my code to use this slightly different construct instead:

... FOR XML PATH(''), TYPE).value(N'text()[1]', ... Linux vs. Windows

Initially, I had only bothered to run tests on Windows:

Microsoft SQL Server vNext (CTP1.1) - 14.0.100.187 (X64)
Dec 10 2016 02:51:11
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
Developer Edition (64-bit) on Windows Server 2016 Datacenter 6.3 (Build 14393: ) (Hypervisor)

But Grzegorz made a fair point that he (and presumably many others) only had access to the Linux flavor of CTP 1.1. So I added Linux to my test matrix:

Microsoft SQL Server vNext (CTP1.1) - 14.0.100.187 (X64)
Dec 10 2016 02:51:11
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
on Linux (Ubuntu 16.04.1 LTS)

Some interesting but completely tangential observations:

@@VERSION doesn't show edition in this build, but SERVERPROPERTY('Edition') returns the expected Developer Edition (64-bit) . Based on the build times encoded into the binaries, the Windows and Linux versions seem to now be compiled at the same time and from the same source. Or this was one crazy coincidence. Unordered tests

I started by testing the arbitrarily ordered output (where there is no explicitly defined ordering for the concatenated values). Following Grzegorz, I used WideWorldImporters (Standard) , but performed a join between Sales.Orders and Sales.OrderLines . The fictional requirement here is to output a list of all orders, and along with each order, a comma-separated list of each StockItemID .

Since StockItemID is an integer, we can use a defined varchar , which means the string can be 8000 characters before we have to worry about needing MAX. Since an int can be a max length of 11 (really 10, if unsigned), plus a comma, this means an order would have to support about 8,000/12 (666) stock items in the worst case scenario (e.g. all StockItemID values have 11 digits). In our case, the longest ID is 3 digits, so until data gets added, we would actually need 8,000/4 (2,000) unique stock items in any single order to justify MAX. In our case, there are only 227 stock items in total, so MAX isn't necessary, but you should keep an eye on that. If such a large string is possible in your scenario, you'll need to use varchar(max) instead of the default ( STRING_AGG() returns nvarchar(max) , but truncates to 8,000 bytes unless the input is a MAX type).

The initial queries (to show sample output, and to observe durations for single executions):

SET STATISTICS TIME ON; GO SELECT o.OrderID, StockItemIDs = STRING_AGG(ol.StockItemID, ',') FROM Sales.Orders AS o INNER JOIN Sales.OrderLines AS ol ON o.OrderID = ol.OrderID GROUP BY o.OrderID; GO SELECT o.OrderID, StockItemIDs = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID) FROM Sales.OrderLines AS ol WHERE ol.OrderID = o.OrderID FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'') FROM Sales.Orders AS o GROUP BY o.OrderID; GO SET STATISTICS TIME OFF; /* Sample output: OrderID StockItemIDs ======= ============ 1 67 2 50,10 3 114 4 206,130,50 5 128,121,155 Important SET STATISTICS TIME metrics (SQL Server Execution Times): Windows: STRING_AGG: CPU time = 217 ms, elapsed time = 405 ms. FOR XML PATH: CPU time = 1954 ms, elapsed time = 2097 ms. Linux: STRING_AGG: CPU time = 627 ms, elapsed time = 472 ms. FOR XML PATH: CPU time = 2188 ms, elapsed time = 2223 ms. */

I ignored the parse and compile time data completely, as they were always exactly zero or close enough to be irrelevant. There were minor variances in the execution times for each run, but not much the comments above reflect the typical delta in runtime ( STRING_AGG seemed to take a little advantage of parallelism there, but only on Linux, while FOR XML PATH did not on either platform). Both machines had a single socket, quad-core CPU allocated, 8 GB of memory, out-of-the-box configuration, and no other activity.

Then I wanted to test at scale (simply a single session executing the same query 500 times). I didn't want to return all of the output, as in the above query, 500 times, since that would have overwhelmed SSMS and hopefully doesn't represent real-world query scenarios anyway. So I assigned the output to variables and just measured the overall time for each batch:

SELECT sysdatetime(); GO DECLARE @i int, @x varchar(8000); SELECT @i = o.OrderID, @x = STRING_AGG(ol.StockItemID, ',') FROM Sales.Orders AS o INNER JOIN Sales.OrderLines AS ol ON o.OrderID = ol.OrderID GROUP BY o.OrderID; GO 500 SELECT sysdatetime(); GO DECLARE @i int, @x varchar(8000); SELECT @i = o.OrderID, @x = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID) FROM Sales.OrderLines AS ol WHERE ol.OrderID = o.OrderID FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'') FROM Sales.Orders AS o GROUP BY o.OrderID; GO 500 SELECT sysdatetime();

I ran those tests three times, and the difference was profound nearly an order of magnitude. Here is the average duration across the three tests:


SQL Server v.Next : STRING_AGG Performance, Part 2
Average duration, in milliseconds, for 500 executions of variable assignment

I tested a variety of other things this way as well, mostly to make sure I was covering the types of tests Grzegorz was running (without the LOB part).

Select

本文数据库(mssql)相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库

主题: SQLXMLLinuxWindowsCPUSQL ServerTIWindows ServUbuntuERP
分页:12
转载请注明
本文标题:SQL Server v.Next : STRING_AGG Performance, Part 2
本站链接:http://www.codesec.net/view/522077.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(mssql) | 评论(0) | 阅读(33)