未加星标

Delete Files with SQL Server 2016 R Logic in T-SQL Code

字体大小 | |
[数据库(mssql) 所属分类 数据库(mssql) | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏
Problem How can I delete files older than [X] days in a folder using T-SQL?

This is an old requirement, as old as SQL Server 7 (or even earlier). The typical T-SQL solution is to use xp_cmdshell or an undocumented stored procedure xp_delete_file.

These options work, but there are a number of items to consider:

xp_cmdshell may not be allowed as per company security policy or auditing requirements. Undocumented stored procedures are never recommended for use in production environment as you never know whether your next patch will break it. For complex deletion logic such as deleting files older than 3 days, but only when file size is bigger than 1MB and the filename contains the string 'temp' are very difficult to program using the xp_cmdshell or xp_delete_file options.

Another question is how to list files in a folder?

The typical solution is to use xp_cmdshell or write a CLR stored procedure/function, but CLR integration is considered relatively more difficult to develop and maintain than a T-SQL script.

Solution

Starting with SQL Server 2016, we are able to use the R language directly inside SQL Server via sp_execute_external_script.

The assumption here is that you have installed R service (in-database). For detailed installation steps, please check this MSDN document.

Before plugging R code into a SQL script, it is best to test the R script in a dedicated R IDE. I personally prefer RStudio Desktop IDE , and it is free.

Once the R service is installed and configured, we can prepare the demo case.

Let's say I have a folder with a few files in it as shown below.


Delete Files with SQL Server 2016 R Logic in T-SQL Code

Figure 1 - Sample Directory

Native R Script Demonstration

All the R code below is run in the RStudio IDE.

List Files with R

I will run the following script in RStudio to do the following tasks:

List files in a folder List files whose names match a condition List files whose property, such as size, creation time, matches a condition List files with multiple conditions

All the script functions are detailed in the comments below:

#1. list files in a folder
dir(path="c:/MyTest", full.names = TRUE) #R is case-sensitive, so True <> TRUE
#2.1. list files with names containing "temp" literal value, pattern parm accepts RegEx
dir(path="c:/MyTest", pattern = "temp") #R is case-sensitive, so True <> TRUE
#2.2. list files with names containing number(s) by using regular expression \d
dir(path="c:/MyTest", pattern = "\\d", full.names = T) # T means TRUE
#3. list files created more than 7 days ago.
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi[difftime(Sys.time(), fi[,"ctime"], units="days") > 7, c( 'ctime', 'size')]
#4. list files created within 7 days and size bigger than 1000 bytes
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi[difftime(Sys.time(), fi[,"ctime"], units="days") <= 7 & fi['size'] > 1000, c('ctime', 'size')]

The execution result is as follows in the RStudio IDE, you can check the result against the info in Figure 1.


Delete Files with SQL Server 2016 R Logic in T-SQL Code

Figure 2 - Results of List Files R Code

Delete Files with R

Now we will look at how to delete files. The following script tasks are:

Delete a file Delete multiple files based on a criteria Delete a whole directory

Here is the script:

#1. del some specified files
file.remove("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt")
# or you can do
unlink(c("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt"));
#2.1. del files whose names contain "temp"
file.remove(dir(path="c:/MyTest/", pattern = "temp", full.names = T))
# or you can do this way
unlink("c:/Mytest/*temp*.*");
#2.2. del files older than 7 days
# first find files that are older than 7 days
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi <- fi[difftime(Sys.time(), fi[,"mtime"], units="days") > 7, 1:3];
file.remove(rownames(fi));
#or you can do this way to replace the file.remove()
unlink(rownames(fi));
#3. to delete the whole directory c:/Mytest, including its sub-folders if any
unlink("c:/mytest", recursive = T, force = T)

In summary, if we want to find or delete a file based on its properties, i.e. size, creation time or modification time or others, we need to use file.info function to filter out what we need, then we can work on the result.

Using R Code in T-SQL

Now we will embed the above R scripts into T-SQL via sp_exec_external_script. For more examples, please see MSDN document .

List files with R in T-SQL

Let's look at the t-sql code first

--1. list files in a folder
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file_list <- as.data.frame(dir(path="c:/MyTest", full.names = TRUE));'
, @output_data_1_name = N'file_list'
with result sets (([AllFiles] varchar(256)))
go
--2. list files with names containing "temp" literal value, pattern parm accepts RegEx
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file_list <- as.data.frame(dir(path="c:/MyTest", pattern = "temp"))'
, @output_data_1_name = N'file_list'
with result sets (([FileName_Has_Temp] varchar(256)))
go
--3. list files with names containing number(s)
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file_list <- as.data.frame(dir(path="c:/MyTest", pattern = "\\d", full.names = T))'
, @output_data_1_name = N'file_list'
with result sets (([FileName_Has_Num] varchar(256)))
go
--4. list files created more than 7 days ago.
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'fi <- file.info(dir(path="c:/Mytest/", full.names = T));
fi <- fi[difftime(Sys.time(), fi[,"ctime"], units="days") > 7, c("ctime", "size")];
fi["Name"] <- rownames(fi);
file_list <- as.data.frame(fi[c("Name", "ctime")]);'
, @output_data_1_name = N'file_list'
with result sets (([File_7+Days_Old] varchar(256), Create_Time datetime))
go
--5. list files created within 7 days and size bigger than 1000 bytes
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'fi <- file.info(dir(path="c:/Mytest2", full.names = T));fi <- fi[difftime(Sys.time(), fi[,"ctime"], units="days") <= 7 & fi["size"] > 1000, c("ctime", "size")];
fi["name"] <- rownames(fi)
file_list <- as.data.frame(fi[c("name", "ctime","size")])'
, @output_data_1_name = N'file_list'
with result sets (([File < 7Days with Size > 1000] varchar(256), ctime datetime, size int))
go

If you run the script in a SSMS window that connects to a SQL Server 2016 instance with R service installed, you will get the following results:

]
Delete Files with SQL Server 2016 R Logic in T-SQL Code

If we compare the R code embedded in T-SQL with the native R code examples we listed before, we can see there are some differences. In T-SQL, we need to use as.data.frame function to convert the result into a Data Frame structure before the R result can be displayed by T-SQL, otherwise, an error like the following will occur:

Msg 39004, Level 16, State 20, Line 1
A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 1, Line 1
An external script error occurred:
Error: The output dataset must be of type data frame; consider using data.frame() to convert it. Delete files with R in T-SQL

Deleting files does not need to return any result set, so we do not need as.data.frame in R code

--1. delete specific files
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file.remove("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt");';
-- or
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'unlink(c("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt"))'
go
-- 2. delete files with names containing "temp"
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file.remove(dir(path="c:/MyTest/", pattern = "temp", full.names = T))'
-- or
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'unlink("c:/Mytest/*temp*.*")'
go
-- 3. delete files older than 7 days
-- first find files that are older than 7 days
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi <- fi[difftime(Sys.time(), fi[,"ctime"], units="days") > 7, c("ctime", "size")];
file.remove(rownames(fi)) # or use this: unlink(rownames(fi));'
go

After running all the deletion statements in SSMS, all files will be gone under C:\MyTest folder.

A tip here is that when I refer to a folder, I use / (forward slash) instead of \ (back slash), because \ is an escape letter in R, so if we refer to c:\Mytest in R code, we should write c:\\Mytest. To save time, we can simply code it as c:/Mytest

Summary

In this tip, we have discussed how to use the R language to do file listing and deletion, this actually is a good replacement for a CLR stored procedure.

All the code in this tip was tested in an environment using SQL Server 2016 Developer edition on a windows 10 computer.

R language is mainly for data science, but it is powerful enough for DBAs to use it innovatively for daily work. To allow T-SQL to embed R code directly, Microsoft starts to treat T-SQL more like a first class citizen in the MS language portfolio.

Sp_execute_external_script has a parameter @language, currently the only valid value is 'R', but it is reasonable to guess that in the future this stored procedure will support other languages, such as python, javascript, both of which are popular, cross-platform languages, just like R. Coincidently, SQL Server vNext will run across multi-platforms as well.

With PowerShell to be run in linux/Mac soon, I hope T-SQL can also embed PowerShell code someday, if all this become true, TSQL itself will be both a language and a platform for administration and development tasks, and CLR integration may be minimized due to its complexity and cost for maintenance and development.

Next Steps

R language opens a door for both DBAs and database developers, and we may do lots of things differently yet more efficiently with R, so it is really worthwhile to dig deeper into the R language.

Please check the following for further information:

SQL Server 2016 R Services: Executing R code in SQL Server SQL Server 2016 R Services: Guide for Client Configuration Unable to communicate with the runtime for 'R' script in SQL Server SQL Server 2016 R Services: Executing R code in Revolution R Enterprise SQL Server R Services Tutorials

Last Update: 2/8/2017


Delete Files with SQL Server 2016 R Logic in T-SQL Code
Delete Files with SQL Server 2016 R Logic in T-SQL Code
About the author
Delete Files with SQL Server 2016 R Logic in T-SQL Code
Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation. View all my tips

Related Resources

More Database Developer Tips...

本文数据库(mssql)相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库

主题: SQLSQL ServerPowerShellSDNUTCUJavaScriptWindowsJavaLinux
分页:12
转载请注明
本文标题:Delete Files with SQL Server 2016 R Logic in T-SQL Code
本站链接:http://www.codesec.net/view/532158.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(mssql) | 评论(0) | 阅读(14)