Register your product to gain access to bonus material or receive a coupon.
Get Started Scaling Your Database Infrastructure for High-Volume Big Data Applications
“Understanding Big Data Scalability presents the fundamentals of scaling databases from a single node to large clusters. It provides a practical explanation of what ‘Big Data’ systems are, and fundamental issues to consider when optimizing for performance and scalability. Cory draws on many years of experience to explain issues involved in working with data sets that can no longer be handled with single, monolithic relational databases.... His approach is particularly relevant now that relational data models are making a comeback via SQL interfaces to popular NoSQL databases and Hadoop distributions.... This book should be especially useful to database practitioners new to scaling databases beyond traditional single node deployments.”
—Brian O’Krafka, software architect
Understanding Big Data Scalability presents a solid foundation for scaling Big Data infrastructure and helps you address each crucial factor associated with optimizing performance in scalable and dynamic Big Data clusters.
Database expert Cory Isaacson offers practical, actionable insights for every technical professional who must scale a database tier for high-volume applications. Focusing on today’s most common Big Data applications, he introduces proven ways to manage unprecedented data growth from widely diverse sources and to deliver real-time processing at levels that were inconceivable until recently.
Isaacson explains why databases slow down, reviews each major technique for scaling database applications, and identifies the key rules of database scalability that every architect should follow.
You’ll find insights and techniques proven with all types of database engines and environments, including SQL, NoSQL, and Hadoop. Two start-to-finish case studies walk you through planning and implementation, offering specific lessons for formulating your own scalability strategy. Coverage includes
The Big Data Scalability Series is a comprehensive, four-part series, containing information on many facets of database performance and scalability. Understanding Big Data Scalability is the first book in the series.
Learn more and join the conversation about Big Data scalability at bigdatascalability.com.
Preface ix
About the Author xii
Chapter 1: Introduction 1
What You Will Learn 1
The Challenge of Big Data 2
Today’s Big Data Explosion 3
Background for This Book 6
Why the Focus on Database Sharding? 8
Summary 9
Chapter 2: Why Databases Slow Down 10
The Database Slowdown Curve 10
A Hard-Won Lesson 11
The Enemies of Database Performance 14
How to Identify Database Slowdown Issues 21
Summary 23
Chapter 3: What Is Big Data? 24
What Is Big Data Anyhow? 24
Sources of Big Data 28
Summary 32
Chapter 4: Big Data in the Real World 33
Some Real-World Examples of Big Data 33
FullContact 34
Social Point 36
Summary 38
Chapter 5: Scaling Your Application 39
The Goals of a Scalable Application Platform 39
The Excitement of a High-Growth Success 41
Application Scalability Fundamentals 42
A Typical Online Application Architecture 46
Analytics Application Architectures 50
Scaling an Analytics Application 53
How to Scale a Traditional Online Application 53
Summary 55
Chapter 6: When to Scale Your Database 56
The Last Mile of Application Scalability 56
How Do You Know When to Scale Your Database? 57
Options for Increasing Database Performance 58
Indications of the Need for Scale 65
Summary 68
Chapter 7: All Data Is Relational 69
Relational Data Overview 69
The Meaning of Data 70
Relationships Matter 73
Why Data Modelling Is Critical to Success 74
Summary 76
Chapter 8: It’s All About Sharding 77
Sharding: The Ultimate Answer to Database Slowdown 77
The Laws of Databases 78
Sharding Defined 80
Black-Box Sharding 83
Relational Sharding 86
Summary 88
Chapter 9: Scaling Big Data: The Endgame 89
The Game of Big Data Scalability 89
Scaling Big Data Theory 90
The Big Data Endgame 95
Data Locality 98
Summary 99
Index 101