This is the first installment to a two-part blog series focused on Governing Big Data and Hadoop.

So, you’re ready to embark on your data-driven journey, huh? The business case and project blueprint are well defined and you’ve already secured executive sponsorship for your digital transformation. You’re ready to run a modern data platform based on Hadoop and your team is set-up on the starting blocks to deliver the promises of Big Data to the wider organization.

But then you feel some hesitation as you envision a whole new set of challenges: are you ready to operate at the fast-pace of Big Data?; to control the risks that will inevitably arise from the proliferation of data in your data lake?; to scale a data lab that is currently only accessible by a few data scientists, to a broadly shared, self-service, center of excellence that anyone can access and that seamlessly connects with your critical business processes?

Like it or not, you’re not equipped for success until you address the legacy enterprise challenges related to security, documentation, auditing and traceability. But the good news is that there is a modern way to harness the power of your Hadoop initiative with data governance in order to bring you significant business benefits.

Tackling the Six Most Pressing Issues in Governing Various types of New Big Data

To get a full understanding of the potential benefits and best practices related to Data Governance on Hadoop, Talend commissioneda report by TDWI, which outlines six pillars to ensure the success of your Big Data project:

1. Deliver Big Data accessibility to a wide audience, without putting data at risk .Self-service approaches and tools allow IT leaders to empower data workers and analysts to do their own data provisioning in an autonomous way. But one cannot just throw data preparation tools into the hands of business users without first having a governance framework to deliver this service in a managed and scalable way.

2. Accelerate data ingestion with smart discovery and exploration . It takes weeks, sometimes months, to onboard new sets of data and publish it to the right audience(s) using traditional data platforms. Now, with new “ schema-on-read ” approaches, IT and data experts can onboard data as it comes. As soon as it is done, data is accessible on tap to a whole community of data workers that gain the flexibility to further discover, model, connect and refine data in an ad-hoc way, at any time.

3. Capture metadata for the fullest use and governance. Metadata is the crown jewel of data-driven applications. It increases data accessibility by embedding documentation, brings context on top of raw data for better interpretation and draws the connection between disparate data points to turn data into meaning and insights. Last but not least, it brings control and traceability over the information supply chain. Modern data platforms provide new ways to capture, stitch, crowdsource and curate metadata.

4. Unify the disciplines of data management into a common platform. Silos are destroying the value of enterprise data and bring both quality and security risks. There’s a need to establish a single point of control and access to data across integration styles, while decentralizing responsibilities across data citizens.

5. Consider Hadoop for its flexibility, but beware of its governance challenge. Hadoop can process bigger and more diverse data faster, and delivers it to a wider audience in a more agile way. But, now that you can operate at extreme scale, speed and reach, there’s a mandate to master data traceability and auditability, protection, documentation, policy enforcement, etc. Consider environments like Apache Atlas or Cloudera Navigator, together with metadata driven platforms, to fully address those challenges.

6. Get ready for change, continuous innovation and diversity. IT systems are evolving from monolithic to multi-platform. SQL databases are no longer a one size fits all environment where data is modeled, stored, linked, processed and access. Metadata driven approaches help simplify data access across disparate data stores, provide data lineage and traceability, as well as accelerate data migration and movement.

In part 2 of this series, we will see how Talend can guide you through addressing these challenges withTalend Big Data, Metadata Manager , Talend Data Preparation , andTalend Data Fabric.

本文数据库(综合)相关术语:系统安全软件

主题: HadoopSQL
分页:12
转载请注明
本文标题:6 Steps that will Pave the way for your Hadoop Journey with Data Governance and ...
本站链接:http://www.codesec.net/view/481624.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(33)