未加星标

大数据是什么?你需要了解的一切

字体大小 | |
[大数据资讯 所属分类 大数据资讯 | 发布者 店小二05 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏

  大数据的重要意义已经无需赘言,但时至今日,仍有不少朋友对其定义以及相关概念感到难以把握。在今天的文章中,我们将尽可能通过简单的方式,帮助你理解大数据以及与之相关的一切。

大数据是什么?你需要了解的一切

  1. 大数据的定义

  2. 大数据与分析

  3. 用于支持大数据的IT基础设施

  4. 大数据专用技术

  5. 大数据相关技能

  6. 大数据用例

  原文标题:What is big data? Everything you need to know

  There’s data, and then there’s big data. So, what’s the difference?

  Big data defined

  Big data in general refers to sets of data that are so large in volume and so complex that traditional data processing software products are not capable of capturing, managing, and processing the data within a reasonable amount of time.

  These big data sets can include structured, unstructured, and semistructured data, each of which can be mined for insights.

  How much data actually constitutes “big” is open to debate, but it can typically be in multiples of petabytes—and for the largest projects in the exabytes range.

  Often, big data is characterized by the three Vs:

  an extreme volume of data

  a broad variety of types of data

  the velocity at which the data needs to be processed and analyzed

  The data that constitutes big data stores can come from sources that include web sites, social media, desktop and mobile apps, scientific experiments, and—increasingly—sensors and other devices in the internet of things (IoT).

  The concept of big data comes with a set of related components that enable organizations to put the data to practical use and solve a number of business problems. These include the IT infrastructure needed to support big data; the analytics applied to the data; technologies needed for big data projects; related skill sets; and the actual use cases that make sense for big data.

  Big data and analytics

  What really delivers value from all the big data organizations are gathering is the analytics applied to the data. Without analytics, it’s just a bunch of data with limited business use.

  By applying analytics to big data, companies can see benefits such as increased sales, improved customer service, greater efficiency, and an overall boost in competitiveness.

  Data analytics involves examining data sets to gain insights or draw conclusions about what they contain, such as trends and predictions about future activity.

  By analyzing data, organizations can make better-informed business decisions such as when and where to run a marketing campaign or introduce a new product or service.

  Analytics can refer to basic business intelligence applications or more advanced, predictive analytics such as those used by scientific organizations. Among the most advanced type of data analytics is data mining, where analysts evaluate large data sets to identify relationships. patterns, and trends.

  Data analytics can include exploratory data analysis (to identify patterns and relationships in data) and confirmatory data analysis (applying statistical techniques to find out whether an assumption about a particular data set is true.

  Another distinction is quantitative data analysis (or analysis of numerical data that has quantifiable variables that can be compared statistically) vs. qualitative data analysis (which focuses on nonnumerical data such as video, images, and text).

  IT infrastructure to support big data

  For the concept of big data to work, organizations need to have the infrastructure in place to gather and house the data, provide access to it, and secure the information while it’s in storage and in transit.

  At a high level, these include storage systems and servers designed for big data, data management and integration software, business intelligence and data analytics software, and big data applications.

  Much of this infrastructure will likely be on-premises, as companies look to continue leveraging their datacenter investments. But increasingly organizations rely on cloud computing services to handle much of their big data requirements.

  Data collection requires having sources to gather the data. Many of these—such as web applications, social media channels, mobile apps, and email archives—are already in place. But as IoT becomes entrenched, companies might need to deploy sensors on all sorts of devices, vehicles, and products to gather data, as well as new applications that generate user data. (IoT-oriented big data analytics has its own specialized techniques and tools.)

  To store all the incoming data, organizations need to have adequate data storage in place. Among the storage options are traditional data warehouses, data lakes, and cloud-based storage.

  Security infrastructure tools might include data encryption, user authentication and other access controls, monitoring systems, firewalls, enterprise mobility management, and other products to protect systems and data,

  Big-data-specific technologies

  In addition to the foregoing IT infrastructure used for data in general. There several technologies specific to big data that your IT infrastructure should support.

  hadoop ecosystem

  Hadoop is one of the technologies most closely associated with big data. The Apache Hadoop project develops open source software for scalable, distributed computing.

  The Hadoop software library is a framework that enables the distributed processing of large data sets across clusters of computers using simple programming models. It’s designed to scale up from a single server to thousands, each offering local computation and storage.

  The project includes several modules:

  Hadoop Common, the common utilities that support other Hadoop modules

  Hadoop Distributed File System, which provides high-throughput access to application data

  Hadoop YARN, a framework for job scheduling and cluster resource management

  Hadoop MapReduce, a YARN-based system for parallel processing of large data sets.

  Apache Spark

  Part of the Hadoop ecosystem, Apache Spark is an open source cluster-computing framework that serves as an engine for processing big data within Hadoop. Spark has become one of the key big data distributed processing frameworks, and can be deployed in a variety of ways. It provides native bindings for the Java, Scala, python (especially the Anaconda Python distro), and R programming languages (R is especially well suited for big data), and it supports SQL, streaming data, machine learning, and graph processing.

  Data lakes

  Data lakes are storage repositories that hold extremely large volumes of raw data in its native format until the data is needed by business users. Helping to fuel the growth of data lakes are digital transformation initiatives and the growth of the IoT. Data lakes are designed to make it easier for users to access vast amounts of data when the need arises.

  NoSQL databases

  Conventional SQL databases are designed for reliable transactions and ad hoc queries, but they come with restrictions such as rigid schema that make them less suitable for some types of applications. NoSQL databases address those limitations, and store and manage data in ways that allow for high operational speed and great flexibility. Many were developed by companies that sought better ways to store content or process data for massive websites. Unlike SQL databases, many NoSQL databases can be scaled horizontally across hundreds or thousands of servers.

  In-memory databases

  An in-memory database (IMDB) is a database management system that primarily relies on main memory, rather than disk, for data storage. In-memory databases are faster than disk-optimized databases, an important consideration for big data analytics uses and the creation of data warehouses and data marts.

  Big data skills

  Big data and big data analytics endeavors require specific skills, whether they come from inside the organization or through outside experts.

  Many of these skills are related to the key big data technology components, such as Hadoop, Spark, NoSQL databases, in-memory databases, and analytics software.

  Others are specific to disciplines such as data science, data mining, statistical and quantitative analysis, data visualization, general-purpose programming, and data structure and algorithms. There is also a need for people with overall management skills to see big data projects through to completion.

  Given how common big data analytics projects have become and the shortage of people with these types of skills, finding experienced professionals might be one of the biggest challenges for organizations.

  Big data use cases

  Big data and analytics can be applied to many business problems and use cases. Here are a few examples:

  Customer analytics. Companies can examine customer data to enhance customer experience, improve conversion rates, and increase retention.

  Operational analytics. Improving operational performance and making better use of corporate assets are the goals of many companies. Big data analytics can help businesses find ways to operate more efficiently and improve performance.

  Fraud prevention. Data analysis can help organizations identify suspicious activity and patterns that might indicate fraudulent behavior and help mitigate risks.

  Price optimization. Companies can use big data analytics to optimize the prices they charge for products and services, helping to boost revenue.


大数据是什么?你需要了解的一切
主题: HadoopSQLSpark数据MapReduceJavaScala大数据PythonIMD
tags: data,big,analytics,Hadoop
分页:12
转载请注明
本文标题:大数据是什么?你需要了解的一切
本站链接:http://www.codesec.net/view/572438.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 大数据资讯 | 评论(0) | 阅读(227)