Ingestion and Processing of Data For Big Data and IoT Solutions

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Ingestion and Processing of Data For Big Data and IoT Solutions

In the era of the Internet of Things and Mobility, with a huge volume of data becoming available at a fast velocity, there must be the need for an efficient analytics system.

Also, the variety of data is coming from various sources in various formats, such as sensors, logs, structured data from a RDBMS, etc. In the past few years, the generation of new data has drastically increased. More applications are being built and they are generating more data at a fasterrate.

Earlier, Data Storage was costly and there was an absence of technology which could process the data in an efficient manner. Now the storage costs have become cheaper, and the availability of technology to process Big Data is areality.

What is BigData

According to the Author Dr. Kirk Borne, Principal Data Scientist, Big Data Definition is Everything, Quantified and Tracked. Let’s pick that apart-

Everything ― Means every aspect of life, work, consumerism, entertainment, and play is now recognized as a source of digital information about you, your world, and anything else we may encounter. Quantified ― Means we are storing those “everything” somewhere, mostly in digital form, often as numbers, but not always in such formats. The quantification of features, characteristics, patterns, and trends in all things is enabling data mining, machine learning, statistics, and discovery at an unprecedented scale on an unprecedented number of things. The Internet of Things is just one example, but the Internet of Everything is even moreawesome. Tracked ― Means we don’t simply quantify and measure everything just once, but we do so continuously. This includes ― tracking your sentiment, your web clicks, your purchase logs, your geo-location, your social media history, etc. or tracking every car on the road, or every motor in a manufacturing plant or every moving part on an airplane, etc. Consequently, we are seeing the emergence of smart cities, smart highways, personalized medicine, personalized education, precision farming, and so muchmore.

All of these quantified and tracked data streams willenable

Smarter Decisions Better Products Deeper Insights Greater Knowledge Optimal Solutions Customer-Centric Products Increased CustomerLoyalty More Automated Processes, more accurate Predictive and Prescriptive Analytics Better models of future behaviors and outcomes in Business, Government, Security, Science, Healthcare, Education, andmore. Big data Defines threeD2D’s Data-to-Decisions Data-to-Discovery Data-to-Dollars The 10 V’s of BigData
Ingestion and Processing of Data For Big Data and IoT Solutions
Big Data Framework

The Best Way for a solution is to “Split The Problem”. Big Data solution can be well understood using Layered Architecture. The Layered Architecture is split into different Layers where each layer performs a particular function.

This Architecture helps in designing the Data Pipeline with different requirements of either Batch Processing System or Stream Processing System. This architecture consists of 6 layers which ensure a secure flow ofdata.

Ingestion and Processing of Data For Big Data and IoT Solutions
Data Ingestion Layer ― This layer is the first step for the data coming from variable sources to start its journey. Data here is prioritised and categorised which makes data flow smooth in furtherlayers. Data Collector Layer ― In this Layer, more focus is on the transportation of data from ingestion layer to rest of data pipeline. This is the Layer, where components are decoupled so that analytic capabilities maybegin. Data Processing Layer ― In this layer main focus is to specialize the data pipeline processing system or we can say the data we have collected in the previous layer is to be processed in this layer. Here we do some magic with the data to route them to a different destination, classify the data flows and it’s the first point where the analytic may takeplace. Data Storage Layer ― Storage becomes a challenge when the size of the data you are dealing with, becomes large. There are several possible solutions that can rescue from such problems. Finding a storage solution is very much important when the size of your data becomes large. This layer focuses on “where to store such a large data efficiently”. Data Query Layer ― This is the layer where strong analytic processing takes place. Here main focus is to gather the data value so that they are made to be more helpful for the nextlayer. Data Visualization Layer ― The visualization, or presentation tier, probably the most important tier, where the data pipeline users may feel the VALUE of DATA. We need something that will grab people’s attention, pull them into, make your findings well-understood. 1. Data Ingestion Layer
Ingestion and Processing of Data For Big Data and IoT Solutions

Data ingestion is the first step for building Data Pipeline and also the toughest task in the System of Big Data. In this layer we plan the way to ingest data flows from hundreds or thousands of sources into Data Center. As the Data coming from Multiple sources at variable speed, in different formats.

That’s why we should properly ingest the data for the successful business decisions making. It’s rightly said that “If starting goes well, then, half of the work is alreadydone”

1.1 What is Big Data Ingestion?

Big Data Ingestion involves connecting to various data sources, extracting the data, and detecting the changed data. It’s about moving data ― and especially the unstructured data ― from where it is originated, into a system where it can be stored and analyzed.

We can also say that Data Ingestion means taking data coming from multiple sources and putting it somewhere it can be accessed. It is the beginning of Data Pipeline where it obtains or import data for immediate use.

Data can be streamed in real time or ingested in batches, When data is ingested in real time then, as soon as data arrives it is ingested immediately. When data is ingested in batches, data items are ingested in some chunks at a periodic interval of time. Ingestion is the process of bringing data into Data Processing system.

Effective Data Ingestion process begins by prioritizing data sources, validating individual files and routing data items to the correct destination.

1.2 Challenges Faced with Data Ingestion

As the number of IoT devices increases, both the volume and variance of Data Sources are expanding rapidly. So, extracting the data such that it can be used by the destination system is a significant challenge in terms of time and resources. Some of other challenges faced by Data Ingestion are-

When numerous Big Data sources exist in the different format, it’s the biggest challenge for the business to ingest data at the reasonable speed and further process it efficiently so that data can be prioritized and improves business decisions. Modern Data Sources and consuming application evolverapidly. Data produced changes without notice independent of consuming application. Data Semantic Change over time as same Data Powers newcases. Detection and cap


本文标题:Ingestion and Processing of Data For Big Data and IoT Solutions

技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(10)