未加星标

Hadoop Weekly Issue #189

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二03 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏

02 October 2016

Strata + Hadoop World was this week, so this issue is full of news and releases. Highlights include a new version of the ODPi runtime, a new R interface for Spark, and a new version of the Confluent Platform with more enterprise features. In technical articles, there's a great overview of best practices for long-running Spark Streaming jobs on YARN, and an introduction to a new graph computation framework from the folks at Berkeley AMPLab.

Technical

GraphTau is a new programming model for graph computation on changing graphs. Developed by the folks at Berkeley AMPLab, it's built on Spark's GraphX. Using a "pause-shift-resume" pattern that takes advantage of graph snapshots, it can greatly reduce the amount of computation needed when a graph changes.

https://blog.acolyer.org/2016/09/26/time-evolving-graph-processing-at-scale/

Cloudera has written about the current state of using Apache Kudu as a backend to Apache Impala (incubating), which includes support for CREATE, DROP, INSERT, UPDATE, and more.

http://blog.cloudera.com/blog/2016/09/apache-kudu-and-apache-impala-incubating-the-integration-roadmap/

This guide provides a walkthrough of building a Kafka cluster on AWS using CloudFormation and writing a Spark streaming job that runs on Amazon EMR to analyze the data on Kafka.

http://blogs.aws.amazon.com/bigdata/post/Tx2CDD4Y46WIWOV/Real-time-Stream-Processing-Using-Apache-Spark-Streaming-and-Apache-Kafka-on-AWS

Apache MADlib (incubating) is a library for SQL-based machine learning that supports Apache HAWQ (incubating), PostgreSQL, and others. Version 1.9.1 was recently released, with support for pivot, sessionization, and prediction metrics. The Pivotal blog has details on how to use these three new features.

https://blog.pivotal.io/big-data-pivotal/products/new-tools-to-shape-data-in-apache-madlib

The IBM developer blog has distilled the process of enabling security for Hadoop web interfaces to a few steps. This post summarizes them and also discusses a couple of other configuration options for this setup.

https://developer.ibm.com/hadoop/2016/09/28/securing-hadoop-user-interfaces-kerberos-delegation-token-apache-knox/

This post provides a fantastic overview of the practical considerations of using YARN for long-running Spark streaming jobs. It covers the necessary command-line options for spark-submit to keep a long-running job alive, suggestions for YARN queue configuration, details on configuring kerberos ticket refresh, logging and monitoring suggestions (and example configs for ELK and Graphite), and details on implementing graceful shutdown.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

News

The SAP acquisition of big data-as-a-service vendor Altiscale has officially been announced.

https://www.altiscale.com/blog/altiscale-is-now-part-of-sap/

ODPi announced version 2.0 of it's runtime specification for Hadoop distributions. Major changes include the addition of a Hadoop Compatible File System spec (the article lists a number of compatible implementations) and the addition of Apache Hive 1.2.

https://www.datanami.com/2016/09/27/odpi-tackles-hive-latest-hadoop-runtime-spec/

Strata + Hadoop World was this week in New York. ZDNet has a good summary of the announcements from the event, including those from MapR, Cazena, BlueData, and more.

http://www.zdnet.com/article/strata-nyc-brings-announcements-from-mapr-pentaho-zoomdata-and-more/

The Cloudera blog has an overview of a new Apache Incubator project, Spot, which comes for the Open Network Insight (ONI) project. The project is a collection of security tools originally developed by Intel.

http://blog.cloudera.com/blog/2016/09/spot-fighting-cyber-threats-via-an-open-data-model/

Akamai has acquired Concord, maker of the Concord stream processing framework built on Apache Mesos.

http://siliconangle.com/blog/2016/09/28/akamai-picks-up-stream-processing-startup-concord/

Among the vendor announcements from this week, Confluent has announced a new release of Confluent Enterprise that's shipping later this month. The highlights of the release are multi-datacenter replication and automatic data balancing. The introductory blog post describes these features in more detail.

http://www.confluent.io/blog/introducing-apache-kafka-for-the-enterprise/

Releases

Version 2.3.2 of Luigi, the workflow engine written in python, was released.

https://github.com/spotify/luigi/releases/tag/2.3.2

StreamSets has announced version 2.0 of the StreamSets Data Collector. Highlights include support for Oracle CDC, MapR 5.2.0 (and MapR Streams), and integration with StreamSets Dataflow Performance Manager.

https://streamsets.com/blog/announcing-streamsets-data-collector-version-2-0/

Apache Phoenix 4.8.1 was released. It resolves 43 (mostly bug fix) issues.

https://lists.apache.org/thread.html/ee156cbb83cd234e1a36fa6a7454e299812da7ca407a77624ce913f7@%3Cuser.phoenix.apache.org%3E

IBM has announced that Big SQL can now run on Hortonworks HDP in addition to its own distribution, IOP.

https://developer.ibm.com/hadoop/2016/09/29/ibm-big-sql-hdp-support-is-here/

RStudio has announced a new open-source project, sparklyr, which is an R interface to Spark. It supports dplyr verbs against spark tables, suppot for SQL queries, Spark MLlib & H20 Sparking Water integrration for machine learning, and additional extensions.

https://blog.rstudio.org/2016/09/27/sparklyr-r-interface-for-apache-spark/

Microsoft has announced that Hortonworks HDP 2.5 with Spark 2.0 and Hive Live Long and Prosper is now generally available on Azure HDInsight. The release also includes security enhancements―integration with Azure Active Directory and support for transparent encryption at rest.

http://www.zdnet.com/article/microsoft-hdinsight-gets-spark-2-0-faster-hive-and-better-security/

Events

Curated by Datadog ( http://www.datadog.com )

UNITED STATES California

Robust Stream Processing with Apache Flink (San Francisco) - Wednesday, October 5

http://www.meetup.com/SF-Big-Analytics/events/234209595/

Using Spark to Accelerate Big Data at Dollar Shave Club (Marina Del Rey) - Thursday, October 6

http://www.meetup.com/Los-Angeles-Big-Data-Users-Group/events/233972195/ Texas

Apache Kafka, Stream Processing, and Microservices (Austin) - Tuesday, October 4

http://www.meetup.com/Austin-Apache-Kafka-Meetup-Stream-Data-Platform/events/234177319/

Dean Wampler: Why Scala Is Great for Data Science and Engineering (Austin) - Wednesday, October 5

http://www.meetup.com/Austin-Scala-Enthusiasts/events/233795204/ Illinois

Free Spark Training Session (Chicago) - Tuesday, October 4

http://www.meetup.com/TDWI-Chicago-Chapter-Business-Intelligence-Mobility/events/234096288/ Wisconsin

Data Science with Apache Spark (Milwaukee) - Tuesday, October 4

http://www.meetup.com/MKE-Big-Data/events/233051343/ Maryland

Scale Out and Optimize Spark 2.0 (Laurel) - Monday, October 3

http://www.meetup.com/Apache-Spark-Maryland/events/234225029/ New Jersey

Introduction to AlluxioFormerly Tachyon - Thursday, October 6

http://www.meetup.com/futureofdata-princeton/events/232927731/

IRELAND Machine Learning and the Serving Layer + Successful Big Data Architecture (Dublin) - Monday, October 3

http://www.meetup.com/hadoop-user-group-ireland/events/234240469/ ROMANIA

Spark v2.0 Workshop (Bucharest) - Friday, October 7

http://www.meetup.com/The-Bucharest-Agile-Software-Meetup-Group/events/234094683/ UKRAINE

Introduction to Machine Learning with Apache Spark & Apache Zeppelin (L'viv) - Friday, October 7

http://www.meetup.com/futureofdata-lviv/events/233502261/ SINGAPORE

Learn Distributed Tracing and Data Applications at Twitter (Singapore) - Friday, October 7

http://www.meetup.com/singasug/events/234470503/

本文数据库(综合)相关术语:系统安全软件

主题: SparkHadoopKafkaSQLHiveScalaPostgreSQLSAPIBMPython
分页:12
转载请注明
本文标题:Hadoop Weekly Issue #189
本站链接:http://www.codesec.net/view/481153.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(44)