未加星标

Lightning fast query performance with Azure SQL Data Warehouse

字体大小 | |
[系统(windows) 所属分类 系统(windows) | 发布者 店小二04 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏

Azure SQL Data Warehouse is a fast, flexible and secure analytics platform for enterprises of all sizes. Today we announced significant query performance improvements for Azure SQL Data Warehouse (SQL DW) customers enabled through enhancements in the distributed query execution layer.

Analytics workload performance is determined by two major factors, I/O bandwidth to storage and repartitioning speed, also known as shuffle speed. In this previousblog post, we described how SQL DW caches relevant data to take advantage of NVMe based local storage. In this blog post, we will go under the hood of SQL DW, to see how the shuffling speed has improved.

Data movement is an operation where parts of the distributed tables are moved to different nodes during query execution. This operation is required where the data is not available on the target node, most commonly when the tables do not share the distribution key. The most common data movement operation is shuffle. During shuffle, for each input row, SQL DW computes a hash value using the join columns and then sends that row to the node that owns that hash value. Either one or both sides of join can participate in the shuffle. The diagram below displays shuffle to implement join between tables T1 and T2 where neither of the tables is distributed on the join column col2.


Lightning fast query performance with Azure SQL Data Warehouse

Another conventional data movement operation is the broadcast where table parts are copied from the source node to all the other SQL DW nodes, for example when joining a dimension and a fact table, the dimension table is commonly broadcast. SQL DW query optimizer chooses the appropriate data movement type to minimize the number of rows transferred.

Until now the data movement operations were done by SQL DW Data Movement Service (DMS) component, seen in the diagram below.


Lightning fast query performance with Azure SQL Data Warehouse

To implement shuffle, Data Movement Service copies rows out of SQL Server Engine, hashes them and sends them to the appropriate instance of DMS on other nodes, where DMS copies rows to the local temporary tables using SQL Server BulkCopy API. Reading rows out of SQL Server is a single threaded operation and can be a bottleneck.

SQL DW now integrates data movement directly into SQL Server engine. This integration allows SQL DW data movement to benefit from full multi-core parallelism available and use batch mode execution for data movement operations as well.

Combined with the use of Azure Accelerated Networking , SQL DW can move data up to one gigabyte a sec per node, significantly improving queries that join data from tables not aligned on their distribution keys. The diagram below shows the SQL DW operating shuffle using SQL DW instant data movement mode:


Lightning fast query performance with Azure SQL Data Warehouse

When SQL DW moves data in the instant mode, the intermediate results get produced and sent to all required nodes using all available CPU cores, thus taking advantage of the multi-core trend. Data is written in a compact batch-oriented row format directly to the remote node’s temporary database, with minimal per row overhead, as the data does not cross SQL Server Engine front-door which performs data validations which add a significant cost.

This capability is available now for all existing and new customers of Compute Optimized Gen2 tier of SQL DW. SQL DW instant data movement is used for queries that do not use external data; PolyBase queries remain using Data Movement Service. As data volumes and the need for faster insights grow, we remain committed to innovating to deliver the best possible query performance for all our customers.

Read more about how Azure SQL Data Warehouse offers industry-leading query performance in this GigaOm report . Get started with Azure SQL Data Warehouse forfree today.

本文系统(windows)相关术语:三级网络技术 计算机三级网络技术 网络技术基础 计算机网络技术

主题: SQLCPUSQL Server
tags: SQL,data,DW,Data,movement,shuffle,query,Azure
分页:12
转载请注明
本文标题:Lightning fast query performance with Azure SQL Data Warehouse
本站链接:https://www.codesec.net/view/577695.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 系统(windows) | 评论(0) | 阅读(25)