未加星标

Hadoop Submarine Adds Deep Learning To Hadoop

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二03 | 时间 2019 | 作者 红领巾 ] 0人收藏点击收藏

There's a new Hadoop project for building deep learning frameworks, like TensorFlow, on Apache Hadoop. Hadoop Submarine has integrations with Zeppelin and Azkaban for running jobs.

Hadoop is a framework that can be used to process large data sets across clusters of computers using simple programming models. The new project aims to improve the support for using deep learning to analyze Hadoop data.


Hadoop Submarine Adds Deep Learning To Hadoop

The aim of Hadoop Submarine is to make it easier to launch, manage and monitor distributed deep learning/machine learning applications created in frameworks such as TensorFlow. Other improvements alongside Submarine include better GPU support, Docker container support, container-DNS support, and scheduling improvements.

The developers say the improvements make it as easy to run distributed deep learning/machine learning applications on Apache Hadoop YARN as it would be to run such applications locally. Users will be able to run deep learning workloads with other ETL/streaming jobs running on the same cluster.

The Submarine project has two parts: the Submarine computation engine and a set of submarine ecosystem integration plugins and tools.


Hadoop Submarine Adds Deep Learning To Hadoop

The computation engine submits customized deep learning applications (like Tensorflow, Pytorch, etc.) to YARN from the command line. These applications run side by side with other applications on YARN, such as Apache Spark and Hadoop Map/Reduce.

A set of integrations sit on top of the computation engine. The current list adds integration between Submarine and Zeppelin, and between Submarine and Azkaban.

The Zeppelin integration means data scientists can code inside Zeppelin notebooks, and submit/ and manage training jobs directly from the notebook. Zeppelin is a web-based notebook that supports interactive data analysis via SQL, Scala, and python. to make data-driven, interactive, collaborative documents. There are more than 20 interpreters in Zeppelin, covering products such as Spark, Hive, Cassandra, Elasticsearch, Kylin, and HBase for collecting data, cleaning data, feature extraction etc. These can be used first, then the machine learning model training can work on clean data.

Azkaban is a batch workflow scheduling service. It was developed at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track workflows. The Integration with Submarine means a data scientist can submit a set of tasks with dependencies directly to Azkaban from notebooks.

The developers say that the overall goal of the Hadoop Submarine project is to provide the service support capabilities of deep learning algorithms for data (data acquisition, data processing, data cleaning), algorithms (interactive, visual programming and tuning), resource scheduling, algorithm model publishing, and job scheduling.

The use of Zeppelin takes care of the data and algorithm, while adding in Azkaban handles the job scheduling. The plan is that the three-piece toolset: of Zeppelin, Hadoop Submarine and Azkaban will provide an open and ready-to-use deep learning development platform.


Hadoop Submarine Adds Deep Learning To Hadoop

本文数据库(综合)相关术语:系统安全软件

分页:12
转载请注明
本文标题:Hadoop Submarine Adds Deep Learning To Hadoop
本站链接:https://www.codesec.net/view/628407.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(95)