未加星标

Export from Hadoop File System to a SQL Server Database Table

字体大小 | |
[数据库(mssql) 所属分类 数据库(mssql) | 发布者 店小二05 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏
Problem

I need to export data from the Hadoop Distributed File System (HDFS) to a SQL Server database table. How can I do this?

Solution

Apache's Sqoop allows for importing data from a database such as SQL Server to the HDFS, and for exporting data from the HDFS to a database table. In this tip we will use Apache Sqoop's export functionality to move data stored on the HDFS to a SQL Server database table. As with previous tips in this series, this tip is written using SQL Server 2014 and a Cloudera virtual machine running on a laptop.

The contents of the file on the HDFS are shown below. There are three columns in this comma-separated value file. The first column is a unique integer identifier. The second column is a string used for a description. The third column is the unit cost. The destination table will have columns to accommodate these three columns and their data types.

[[email protected]:/mssqltips]$ hdfs dfs -cat mssqlTips/linuxDataFile.csv
1,Product A,1.01
2,Product B,2.02
3,Product C,3.03
4,Product D,4.04
5,Product E,5.05
6,Product F,6.06
7,Product G,7.07
8,Product H,8.08
9,Product I,9.09
10,Product J,10.10
11,Product K,11.11
12,Product L,12.12
13,Product M,13.13
14,Product N,14.14
15,Product O,15.15
16,Product P,16.16
17,Product Q,17.17
18,Product R,18.18
19,Product S,19.19
20,Product T,20.20
21,Product U,21.21
22,Product V,22.22
23,Product W,23.23
24,Product X,24.24
25,Product Y,25.25
26,Product Z,26.26
[[email protected]:/mssqltips]$

The T-SQL below was used to create the destination table for this tip.

create table dbo.tblProductData
(
ProductKey int not null PRIMARY KEY,
ProductName varchar(50),
UnitCost money
)

The image below shows the destination table in the AdventureWorks2014 database as displayed in the Object Explorer.

.
Export from Hadoop File System to a SQL Server Database Table

The Sqoop command we will examine is listed below and we will break down each argument in the following bullet points. Please note that the command is supposed to be on one complete line or with the backslash (the Linux command line continuation character) at the end of each line except the last.

sqoop export --connect 'jdbc:sqlserver://aaa.bbb.ccc.ddd:pppp;databasename=AdventureWorks2014' --username 'sqoopLogin' -P --export-dir 'mssqlTips' --table 'tblProductData' -- --schema dbo sqoop export - The executable is named sqoop and we are instructing it to export the data from an HDFS file to a database. --connect - With the --connect argument, we are passing in the jdbc connect string for SQL Server. In this case, we use the IP address, port number, and database name. --username - In this example, the user name is a SQL Server login, not a windows login. Our database is set up to authenticate in mixed mode. We have a server login named sqoopLogin, and we have a database user name sqoopUser which is a member of the db_datawriter role and has a default schema of dbo. -P - This will prompt the command line user for the password. If Sqoop is rarely executed, this might be a good option. There are multiple other ways to automatically pass the password to the command, but we are trying to keep it simple for this tip. --export-dir - The export directory tells sqoop in which directory the file is stored on the HDFS. This argument is required by Sqoop when using the export option. --table - This argument provides sqoop with the destination table on the SQL Server database. This argument is also required by Sqoop when using the export option. -- This is not a typographical error. The double dash tells Sqoop that the following arguments are database specific. --schema - This argument specifies the schema in which our table resides.

The output from the sqoop command is shown below.

[[email protected]:/mssqltips]$ ./sqoopExportCommand.sh
16/09/19 12:03:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
Enter password:
16/09/19 12:03:25 INFO manager.SqlManager: Using default fetchSize of 1000
16/09/19 12:03:25 INFO manager.SQLServerManager: We will use schema dbo
16/09/19 12:03:25 INFO tool.CodeGenTool: Beginning code generation
16/09/19 12:03:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [dbo].[tblProductData] AS t WHERE 1=0
16/09/19 12:03:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-training/compile/3eaf9ad50f5ebf5ef893fba0e41859b1/tblProductData.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/09/19 12:03:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/3eaf9ad50f5ebf5ef893fba0e41859b1/tblProductData.jar
16/09/19 12:03:29 INFO mapreduce.ExportJobBase: Beginning export of tblProductData
16/09/19 12:03:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/09/19 12:03:32 INFO input.FileInputFormat: Total input paths to process : 1
16/09/19 12:03:32 INFO input.FileInputFormat: Total input paths to process : 1
16/09/19 12:03:33 INFO mapred.JobClient: Running job: job_201609071028_0005
16/09/19 12:03:34 INFO mapred.JobClient: map 0% reduce 0%
16/09/19 12:03:48 INFO mapred.JobClient: map 25% reduce 0%
16/09/19 12:03:49 INFO mapred.JobClient: map 50% reduce 0%
16/09/19 12:04:00 INFO mapred.JobClient: map 100% reduce 0%
16/09/19 12:04:02 INFO mapred.JobClient: Job complete: job_201609071028_0005
16/09/19 12:04:02 INFO mapred.JobClient: Counters: 24
16/09/19 12:04:02 INFO mapred.JobClient: File System Counters
16/09/19 12:04:02 INFO mapred.JobClient: FILE: Number of bytes read=0
16/09/19 12:04:02 INFO mapred.JobClient: FILE: Number of bytes written=1185776
16/09/19 12:04:02 INFO mapred.JobClient: FILE: Number of read operations=0
16/09/19 12:04:02 INFO mapred.JobClient: FILE: Number of large read operations=0
16/09/19 12:04:02 INFO mapred.JobClient: FILE: Number of write operations=0
16/09/19 12:04:02 INFO mapred.JobClient: HDFS: Number of bytes read=1778
16/09/19 12:04:02 INFO mapred.JobClient: HDFS: Number of bytes written=0
16/09/19 12:04:02 INFO mapred.JobClient: HDFS: Number of read operations=16
16/09/19 12:04:02 INFO mapred.JobClient: HDFS: Number of large read operations=0
16/09/19 12:04:02 INFO mapred.JobClient: HDFS: Number of write operations=0
16/09/19 12:04:02 INFO mapred.JobClient: Job Counters
16/09/19 12:04:02 INFO mapred.JobClient: Launched map tasks=4
16/09/19 12:04:02 INFO mapred.JobClient: Data-local map tasks=4
16/09/19 12:04:02 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=47822
16/09/19 12:04:02 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
16/09/19 12:04:02 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/09/19 12:04:02 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/09/19 12:04:02 INFO mapred.JobClient: Map-Reduce Framework
16/09/19 12:04:02 INFO mapred.JobClient: Map input records=26
16/09/19 12:04:02 INFO mapred.JobClient: Map output records=26
16/09/19 12:04:02 INFO mapred.JobClient: Input split bytes=576
16/09/19 12:04:02 INFO mapred.JobClient: Spilled Records=0
16/09/19 12:04:02 INFO mapred.JobClient: CPU time spent (ms)=4580
16/09/19 12:04:02 INFO mapred.JobClient: Physical memory (bytes) snapshot=452489216
16/09/19 12:04:02 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2948849664
16/09/19 12:04:02 INFO mapred.JobClient: Total committed heap usage (bytes)=127401984
16/09/19 12:04:02 INFO mapreduce.ExportJobBase: Transferred 1.7363 KB in 31.5282 seconds (56.394 bytes/sec)
16/09/19 12:04:02 INFO mapreduce.ExportJobBase: Exported 26 records.
[[email protected]:/mssqltips]$

Notice the last line of output above shows that 26 records were exported. This corresponds to the 26 records in the file on the HDFS. When we query the count of rows in our database table, we see that 26 rows are present as displayed in the image below. Also in the image below, we see the first 17 records returned by the query.


Export from Hadoop File System to a SQL Server Database Table
Next Steps

Please experiment with different data types. Also, please check out these other tips and tutorials on big data and Sqoop onMSSQLTips.com.

SQL Server Big Data Tips Sqoop Runtime Exception: Cannot Load SQL Server Driver Use Sqoop to Load Data from a SQL Server Table to a Hadoop Distributed File System Using Sqoop WHERE Argument to Filter Data from a SQL Server Our complete tutorial list

Last Update: 10/10/2016


Export from Hadoop File System to a SQL Server Database Table
Export from Hadoop File System to a SQL Server Database Table
About the author
Export from Hadoop File System to a SQL Server Database Table
Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience. View all my tips

Related Resources

More SQL Server DBA Tips...

本文数据库(mssql)相关术语:熊片数据库 mssql数据库 oracle数据库 pubmed数据库 access数据库 万方数据库

主题: SQLHDFSHadoopSQL ServerCPULinuxWindowsRIMRY
分页:12
转载请注明
本文标题:Export from Hadoop File System to a SQL Server Database Table
本站链接:http://www.codesec.net/view/481400.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(mssql) | 评论(0) | 阅读(31)