Understanding the Big Data World
- The Data Revolution
- Traditional Data Systems
- The Modern Data Architecture
- Industry Transformation
- Summary
- The intelligent TIR infrastructure—the Internet of Things—will connect everyone and everything in a seamless network. People, machines, natural resources, production lines, logistics networks, consumption habits, recycling flows, and virtually every other aspect of economic and social life will be connected via sensors and software to the TIR platform, continually feeding Big Data to every node—businesses, homes, vehicles, etc.—moment to moment in real time. The Big Data, in turn, will be analyzed with advanced analytics, transformed into predictive algorithms, and programmed into automated systems, to improve thermodynamic efficiencies, dramatically increase productivity, and reduce the marginal cost of producing and delivering a full range of goods and services to near zero across the entire economy.
- —Jeremy Rifkin
Big data has become the solution of choice for data disruption occurring today. Social media, sensors, GPS, Renaissance Place ID (RPID), clickstream, server logs, and so on are generating massive volumes of data to be looked at. Personalization and the omni-channel experience is increasing the need to make business decisions faster. A perfect storm of data is occurring in the business world. Organizations are looking at big data platforms to help them define strategies to not only ride out the storm but also to leverage their knowledge of data to gain a competitive advantage.
The Data Revolution
There have been multiple global changes that are so significant they are referred to as revolutions. Everyone has different categories and levels. One list of top global changes include the First (industrial), Second (technology/Internet), Third (renewable energy), and Fourth (data). The way organizations collect, use, manage, and leverage data is changing how organizations make decisions as well as changing our lives in ways beyond our imagination.
Hadoop’s capability to store all kinds of data from different sources at extremely large volume, cost effectively, enables predictive analytics, correlation, and business insight to go to new levels. Organizations want to know how products and services fit into the emotional lives of customers (digital personalization). Understanding human behavior in more detail helps organizations understand price levels that create action in customers, the triggers that create the actions and responses, products that a customer is going to buy in the future, and why they will buy them. Sensors exist in just about everything, and being able to process the large volumes of data coming from sensors that go in cars, toasters, soda machines, jet engines, and even kids clothes are redefining how people look at their products, competition, and customers. Organizations want more detail and need to understand “thick” and “thin” data. Thick data helps someone understand the triggers, intentions, meanings, context, and development of an action a person or organization may take. Thin data provides details on the action or the facts that occurred, which focuses more on causation. Data warehouses have been doing this for years—the concept is not new. The dramatically low cost of local disks being used around distributed highly parallel systems, thus allowing organizations to store extremely large volumes of detailed data, is the catalyst for this new data environment.
We’ve had highly parallel systems and distributed platforms for years. Hadoop software makes it easier to store large volumes of data cost effectively, ingest incredibly high rates of data, and work easily with data of all types, including semi-structured and unstructured data. Hadoop is also considered a next generation Extract, Transform, and Load (ETL) and data retention platform. Hadoop can be used as an ETL off-load optimization from Enterprise Data Warehouse (EDW) because of the lower cost per TB. Data can be stored much longer in a Hadoop cluster versus an EDW due to the much lower total cost of ownership.
Hadoop enables organizations to look at thick and thin data in great detail and manage extremely large-scale data cost effectively. This enables organizations to be able to ask questions they could never ask before. It is important to understand that Hadoop is not a new type of data warehouse. There is overlap in function and objectives, however; Hadoop and data warehouses were designed from the ground up to solve different types of problems. It’s good to note that all the skills and expertise of existing data experts can be leveraged in Hadoop because it’s still about solving business problems with data. Hadoop software distributions and NoSQL databases offer new ways of solving today’s data challenges. As a Hadoop environment matures the data flows between relational databases and data warehouses, and Hadoop will increase. We discuss the differences between Hadoop and EDWs in more detail later.
Customers are sending tremendous amounts of detail through their social media activities, clickstream activity on websites, email, and cell phones. This digital information can provide incredible insights into the patterns and behaviors of individuals and groups. Digital personalization is about deeper understanding and then being able to provide customized services around value choices that are relevant and dynamic across all digital channels (computer, smart phone, tablet, watch, and so on). Combining this digital information with a history of a customer’s transactions and external data about other customers or groups with similar characteristics provides tremendous clarity into the likely next actions of a customer or group. Someone who understands a customer’s likely next actions not only has the capability of influencing the next action but can also influence the drivers of that action. This can provide a distinct competitive advantage.
The digital revolution is taking the business world through a tremendous transformation that demands competitive organizations make accurate business decisions faster than their competition. The digital revolution has flattened out global competition. Organizations are faced with new aggressive competition that can allow small organizations to compete against large organizations in the digital space. Customers have ever-increasing expectations across the different digital channels they use. The future industry leaders will be the organizations that can adapt and make business decisions faster than their competition with higher accuracy and confidence and with less risk. This transformation will impact everyone’s personal environment and how they interact in that environment. Every organization must understand its customers better and to be forward thinking by understanding their customers’ next best steps. This requires better and faster analytics.
A store that sells expensive scotch or wine might learn through external data that their high-profile customers visit their store more often when the cigar store or cheese store has sales or introduces new products. Car sensors collect data points around how a car is used and how the car responds to this usage related to where a car is. This information can influence the next generation of car, highway, traffic light design, and neighborhood and city design. Combining data from hospitals and social media, as well as historical patterns, can allow the identification of virus outbreaks from a few weeks down to a few days and even a few hours. It now becomes very clear why analysts are predicting that as much as 80% of the data an organization needs to look at will be generated external to a business unit or group. The great magnifier is being able to also correlate internal data with external data sources to increase the insight and accuracy of the analytics.
Organizations can have hundreds or thousands of different relational databases, operational data stores, and enterprise data warehouses. Data arrives from clickstream, application servers, machines, social media, GPS, RFID, and the like in ever increasing numbers. The new enterprise data platform that organizations will use to solve these data challenges is big data. This chapter introduces the driving force behind big data and how big data is the right solution at the right time.