By: Nai Biao Zhou || Related Tips:More >SQL Server 2017

Problem We live in a world of data. Data are the facts and figures that are collected to make a decision [1]. Companies who use Microsoft technologies usually store their data in SQL Server databases. To extract business values from these data, we usually apply statistical techniques. Statistics involves collecting, classifying, summarizing, organizing, analyzing, and interpreting data [2]. R, which has become the worldwide language for statistics [3], can bridge the gap between statistics and business intelligence development. Furthermore, The Microsoft platform enable us to work with SQL Server databases and R together [4]. With powerful statistical tools, how can we get started to use statistical methods to extract meaningful information from voluminous amount of data? Solution We are going to use data from the AdventureWorks sample database "AdventureWorks2017.bak" [5]. We should always start with asking research questions when we analyze data. Here are research questions needed to be addressed through this study: Did an employee have a different sales performance in 2013 from 2012? Was an employee sales performance impacted by the seasonal factors? Were postal codes of customer addresses in the database valid? We will use " R Tools for Visual Studio sample projects " [6] as a starting point. While investigating employee sales performance, we will go through procedures to create and publish a stored procedure by using R Tools for Visual Studio (RTVS) and use line graphs to interpret data. Then, we will compute mean and median of the all employee sales to measure the "central tendency", which are indicators of typical middle value of all employees’ performance [1]. In the end, we will use regular expression to test Canadian postal codes in the database.

The solution was tested with SQL Server Management Studio V17.4 and Microsoft Visual Studio Community 2017 on windows 10 Home 10.0 <X64>. The DBMS is Microsoft SQL Server 2017 Enterprise Edition (64-bit).

A First Look at RTVS Line graphs are commonly used to present changes in data over a period [1]. We are going to look at employee monthly sales and reveal the employee performance changes through a line graph. In the meanwhile, we will create a stored procedure by using RTVS. 1 - Work on the first R project 1.1 Download the sample project and Open the solution file "Examples.sln".

For using RTVS effectively, we use "Data Science Setting" as the Visual Studio setting. Before switching on this setting, I recommend that we save the current window layout. We can revert to the previous window layout after we have done data analysis. Figure 1 shows the menu item that we can use to save the window layout.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 1 - Save current window layout

To change the setting for using RTVS, click on the menu item shown in the Figure 2.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 2 - Switch to the data science settings Figure 3 shows the window layout for using RTVS. It is noteworthy that we should verify if a correct version of R is used when we have multiple versions installed. We can find the version number in the "Workspaces" panel or in the bottom right corner of the IDE. For the best experience, we should follow the instructions in [6] to run the R codes of the "1-Getting_Started_with_R.R" file line-by-line.
Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 3 - IDE window layout for data analysis 1.2 Add a database connection to the project

Click on the menu item "Add Database Connection…", as shown in Figure 4.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 4 - Add database connection to the project

Configure the connection properties, as shown in the Figure 5.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 5 - Configure the connection properties

Click on the "OK" button. An R script file "Settings.R" is automatically added to the project, as shown in Figure 6. To access the connection string, we should run the codes immediately, consequently save the connection string in the "setting" variable.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 6 - Application settings file 2 - Create a new stored procedure 2.1 Add a new stored procedure

Right-click on the "A first look at R" folder and select "Add > New Item" from the context menu. A pop-up window shows up as illustrated in Figure 7. Select "SQL Stored Procedure with R" template and name the procedure as "sp_employee_sales_monthly".


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 7 - Create a new stored procedure

In the "Solution Explore" panel shown in Figure 8, we find three files have been created. This allows us to work on R scripts and SQL scripts, separately.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 8 - R Tool for Visual Studio creates three files for one stored procedure 2.2 Write a SQL query to retrieve data from the database In this tip, we adopt SQL queries used in the SQL Server Reporting Services Product Samples [7]. Open the file "sp_employee_sales_monthly.Query.sql" and replace the content with the following SQL query: -- Place SQL query retrieving data for the R stored procedure here
-- Employee sales
DECLARE @EmployeeID int
SET @EmployeeID = 283
SELECT P.FirstName + SPACE(1) + P.LastName AS Employee,
DATEPART(Year, SOH.OrderDate) AS [Year],
DATEPART(Month, SOH.OrderDate) AS MonthNumber,
DATENAME(Month, SOH.OrderDate) AS [Month],
SUM(DET.LineTotal) AS Sales
FROM [Sales].[SalesPerson] SP
INNER JOIN [Sales].[SalesOrderHeader] SOH ON SP.[BusinessEntityID] = SOH.[SalesPersonID]
INNER JOIN Sales.SalesOrderDetail DET ON SOH.SalesOrderID = DET.SalesOrderID
INNER JOIN [Person].[Person] P ON P.[BusinessEntityID] = SP.[BusinessEntityID]
WHERE SOH.SalesPersonID = @EmployeeID
and DATEPART(Year, SOH.OrderDate) in (2012, 2013)
GROUP BY P.FirstName + SPACE(1) + P.LastName,
SOH.SalesPersonID,
DATEPART(Year, SOH.OrderDate), DATEPART(Month, SOH.OrderDate),
DATENAME(Month, SOH.OrderDate)

Click on the arrow button, as shown in Figure 9, to run the query. If we run the query first time, a pop-up window may show up and ask us to establish a database connection.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 9 - Run the SQL query in Visual Studio 2.3 Write R script to plot a multiple line graph for an employee

Open the file "sp_employee_sales_monthly.R". The RTVS included some testing codes in the file, as shown in the Figure 10. These testing codes provide us a method to load data from a SQL Server database.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 10 - Initial R codes in the new file Unc

本文数据库(mssql)相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库

代码区博客精选文章
分页:12
转载请注明
本文标题:Getting Started with Data Analysis on the Microsoft Platform - Examining Data
本站链接:https://www.codesec.net/view/610642.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(mssql) | 评论(0) | 阅读(101)