MySQL Overview and New Performance-Related Features
With an ever-growing market share and a continual stream of new software and functionality, MySQL AB's products have made a dramatic impact on the technology industry. New versions, features, and functions are arriving at an increasing pace, but database designers, developers, and administrators might feel overwhelmed by the sheer number of products, along with how frequently they're updated.
Although the primary purpose of this book is to help you coax the most performance out of your MySQL installation, you should first get the lay of the land of MySQL's entire product suite. To help make things clear, each of the major products can be classified into a small group of categories. In addition, this chapter describes how they are covered within this book, and also mentions some helpful performance-related tools that now ship with the products.
After reviewing the product line, this chapter briefly calls out the versions that are covered in this book, along with some of the platforms that we tested when making our recommendations.
Finally, this chapter lists all of MySQL's major performance enhancements beginning with version 4.0. This list might help you determine when it's time to upgrade (if you're an existing user) or which version to choose (if you're new to MySQL).
MySQL Products
To help make the most sense of MySQL AB’s broad, rapidly growing product suite, these products are classified into the following categories:
MySQL Core Technologies
Database Storage Engines and Table Types
Distributed Computing Technologies
Graphical Tools and Assistants
Connectors
APIs
Utilities
Each of these categories are explained in the following sections. In addition to these products, MySQL’s website features hundreds of partner solutions (commercial, shareware, and freeware) that add value throughout the database design, development, deployment, and management cycles; there are also many applications built using MySQL technology.
MySQL Core Technologies
As the foundation of the entire product line, these technologies span a wide range of functionality, from MySQL’s implementation of SQL to its query optimizer to memory management and communication. This book continually points out ways to improve these components’ performance. Specifically, chapters are dedicated to making the most of your SQL statements, MySQL’s query optimizer, general database server engine settings, and other core technology-related features.
Database Storage Engines and Table Types
Responsible for accumulating and retrieving information, the database storage engine lies at the heart of your MySQL installation. When it comes to picking a specialized storage engine or table type, MySQL offers database designers and administrators a surfeit of choices. This book spends considerable time discussing the following:
MyISAM—Fast, compressible, and FULLTEXT-searchable, this is the default MySQL engine.
InnoDB—Robust, transaction-ready, with strong referential integrity, this storage engine is often used to support complex, high-volume applications, in which transactional guarantees are essential.
MERGE—By creating a single view of multiple identical MyISAM tables, this storage engine is essential to feed reporting or Decision Support System (DSS)/Online Analytical Processing (OLAP) tools.
MEMORY—Previously known as HEAP, its tables are memory-based, extremely fast and easy to configure, letting developers leverage the benefits of in-memory processing via a standard SQL interface.
ARCHIVE—As its name indicates, this storage engine is aimed at applications with very large volumes of infrequently-or-never updated information. Its tables are parsimonious in their consumption of disk resources.
CSV—By creating comma-separated files (.csv), this storage engine makes it very easy for developers to feed other applications that consume these kinds of files with MySQL-based data.
FEDERATED—Define and access remote tables as if they were hosted locally.
NDB Cluster—As the underlying storage engine technology of MySQL Cluster, NDB Cluster makes it possible for multiple computers to keep their in-memory data in sync, leading to dramatic scalability and performance improvements.
Of the preceding list, the MyISAM and InnoDB storage engines see the most usage, which is one reason why this book has chapters dedicated to each of them, along with a chapter exploring MySQL Cluster (Chapter 17, "Clustering and Performance").
MySQL offers several additional storage engines that are not covered in this book. These include the following:
ISAM—Although this is the original MySQL storage engine, the MyISAM engine has superseded this product; in fact, it will no longer be distributed from version 5.0. Nevertheless, many of the suggestions for improving MyISAM response might also apply for legacy ISAM tables.
Berkeley Database (BDB)—This was the first MySQL storage engine to offer transactional support, among many other advanced features. However, the InnoDB storage engine has garnered, by far, the higher market share for this kind of storage engine, so this book primarily focuses on InnoDB.
EXAMPLE—This is not a storage engine per se; instead, it can best be thought of as a template that shows MySQL’s worldwide development community how to write a storage engine.
MaxDB—This is not a storage engine, but a separate product, originally developed by Adabas, and then overseen by SAP. It’s used by thousands of SAP customers today. Given the different lineages of the main MySQL product line and MaxDB, it is not covered in this book. However, many of the general-purpose recommendations (for example, designing for speed, indexing, and overhead reduction) made in this book are also applicable to MaxDB.
Distributed Computing Technologies
Replication and MySQL Cluster are the two foremost MySQL distributed computing technologies. Replication refers to the act of keeping multiple "slave" computers in sync with a "master" server. Because this is such a simple yet powerful way to increase throughput, Chapter 16, "Optimal Replication," is dedicated to replication best practices.
MySQL Cluster leverages multiple computers into a single team; this yields impressive performance and reliability gains, and is only limited by the amount of hardware you have at your disposal. This topic also merits its own chapter. Chapter 17 explores scenarios in which clustering makes good performance sense.
Graphical Tools and Assistants
From the beginning, MySQL products have typically been configured, monitored, and managed from the command line. However, several MySQL offerings now provide an easy-to-use, graphical interface:
MySQL Administrator—Makes it possible for administrators to set up, evaluate, and tune their MySQL database server. This is intended as a replacement for mysqladmin.
MySQL Query Browser—Provides database developers and others with a graphical database operation interface. It is especially useful for seeing multiple query plans and result sets in a single user interface.
Configuration Wizard—Makes it easy for administrators to pick and choose from a predefined list of optimal settings, or create their own.
MySQL System Tray—Provides Windows-based administrators a single view of their MySQL instance, including the ability to start and stop their database servers. It is similar to tools offered by other database vendors.
These important capabilities are referred to throughout the book. The Configuration Wizard is examined later in this chapter.
Connectors
Connectors provide database application developers and third-party tools with packaged libraries of standards-based functions to access MySQL. These libraries range from Open Database Connectivity (ODBC) technology through Java and .NET-aware components.
By using the ODBC connector to MySQL, any ODBC-aware client application (for example, Microsoft Office, report writers, Visual Basic) can connect to MySQL without knowing the vagaries of any MySQL-specific keyword restrictions, access syntax, and so on; it’s the connector’s job to abstract this complexity into an easily used, standardized interface.
Chapter 9, "Developing High Speed Applications," coverage of optimizing application logic discusses how to streamline ODBC access to MySQL.
APIs
MySQL AB and several third parties provide application programming interface (API) libraries to let developers write client applications in a wide variety of programming languages, including the following:
C (provided automatically with MySQL)
C++
Eiffel
.NET
Perl
PHP
Python
Ruby
Tcl
Currently, C, PHP, and Perl represent the most widely used APIs from the preceding list, with ODBC connector-using client application development tools also seeing extensive usage. Although this book is not meant to be a detailed programming guide for any particular language, it does discuss the interplay between your chosen API and MySQL performance in Chapter 9.
Utilities
MySQL’s primarily character-based utilities cover a broad range of database management tasks, including the following:
Exporting information (mysqldump)
Importing information (mysqlimport)
Entering SQL statements, either interactively or via script (mysql)
Checking MyISAM table integrity (myisamchk)
Working with the binary log (mysqlbinlog)
Compressing MyISAM tables (myisampack)
Where applicable, this book points out how to use these tools to boost performance. For example, the mysqldump utility is covered in great detail in Chapter 15, "Improving Import and Export Operations."
Performance-Related Tools
MySQL ships a number of tools that can help database administrators configure, test, and tune their MySQL installations. Some of these tools are aimed at people interested in source code, whereas others are aimed at a broader audience. Each of these tools are briefly examined in the following sections.
Benchmark Suite
MySQL’s benchmark suite, available for download from their website, is a useful set of automated tests to help determine overall system performance for a broad collection of common database-oriented tasks. For example, the following is a snippet of Perl code that tests inserting new rows into a table:
... ... for ($i=0 ; $i < $opt_row_count ; $i++) { $query="insert into bench values ( " . ("$i," x ($opt_start_field_count-1)) . "$i)"; $dbh->do($query) or die $DBI::errstr; } if ($opt_fast && $server->{transactions}) { $dbh->commit; $dbh->{AutoCommit} = 1; } $end_time=new Benchmark; print "Time for insert ($opt_row_count)", timestr(timediff($end_time, $loop_time),"all") . "\n\n"; ... ...
Although these tests don’t help you determine the optimal database schema design, query construction, or application logic practices, they are useful for testing the before-and-after impact of changes to your MySQL server configuration settings. Just be certain that you take overall system load into consideration when evaluating the results.
BENCHMARK() Function
The built-in BENCHMARK() function is useful for running raw timing tests on various computational functions within MySQL. The results of these tests can help you:
Compare MySQL’s processing capabilities for disparate operations.
Compare the same operations on different hardware/OS platforms.
For example, you can compare how long it takes MySQL to calculate the MD5 128 bit checksum for a randomly generated number on a modern, multiprocessor Linux machine versus a five-year-old, single-CPU desktop computer. This actually tests two MySQL functions: MD5() and RAND().
You could perform this test by hand, time the results, and write them down on paper:
... mysql> SELECT MD5(RAND()); +----------------------------------+ | MD5(RAND()) | +----------------------------------+ | 165d139c2e6b40a5e476ecbba1981cc3 | +----------------------------------+ 1 row in set (0.00 sec) mysql> SELECT MD5(RAND()); +----------------------------------+ | MD5(RAND()) | +----------------------------------+ | 0774e12a284887041f60223e134d01a1 | +----------------------------------+ 1 row in set (0.00 sec) ...
This might get a little tedious after a while, so it’s best to use the BENCHMARK() function. To make the numbers significant, you can have MySQL perform the operation 500,000 times:
New, expensive Linux server:
mysql> SELECT BENCHMARK(500000,MD5(rand())); +-------------------------------+ | BENCHMARK(500000,MD5(rand())) | +-------------------------------+ | 0 | +-------------------------------+ 1 row in set (2.18 sec)
History museum-ready desktop:
mysql> SELECT BENCHMARK(500000,MD5(rand())); +-------------------------------+ | BENCHMARK(500000,MD5(rand())) | +-------------------------------+ | 0 | +-------------------------------+ 1 row in set (33.27 sec)
Notice the difference in how long it took to return the results: This is the number you should watch.
You can use this function to test the amount of time necessary to complete any expression. Note that BENCHMARK(), although valuable, does not tell you whether a particular query is efficient. For that kind of task, use the EXPLAIN statement, which is reviewed in great detail during Chapter 6, "Understanding the MySQL Optimizer," study of the MySQL query optimizer.
Configuration Wizard
Recent versions of MySQL now offer an optional Configuration Wizard, typically launched upon installation. This section takes a look at the sequence of steps followed by this wizard, along with how these topics are addressed throughout the book.
Note that this wizard is quite dynamic, so your experience might be different from the one presented here (see Figure 3.1).
Figure 3.1 The launch screen for the MySQL Configuration Wizard.
Your first decision is to choose either a boilerplate ("standard") or customized ("detailed") installation process. Don’t underestimate the value of the boilerplate configuration; it has been well thought out, and represents a good catch-all setup (see Figure 3.2).
Figure 3.2 Choose between a customized or general-purpose configuration.
If you choose the customized path, the first decision you must make is to select the type of database server that you are configuring as shown in Figure 3.3.
Figure 3.3 Choose one of three possible server configurations.
There are marked differences in memory caching and other key server settings depending on the server’s role. These distinctions are continually cited throughout the book.
After you’ve chosen a server type, you must then categorize your typical processing profile (see Figure 3.4).
Figure 3.4 Pick the dominant processing profile for this server.
This is an important decision because the workloads experienced by transactional and decision support database servers are quite different, meaning that their respective configurations need to reflect this diversity.
This book keeps this diversity in mind throughout, and makes recommendations accordingly.
The wizard next provides a choice on how to configure the initial InnoDB tablespace (see Figure 3.5).
Figure 3.5 Initial InnoDB configuration.
Enhancing InnoDB performance is explored in Chapter 12, "InnoDB Parameters and Tuning"; disk-specific considerations are covered as part of Chapter 13, "Improving Disk Speed," general-purpose data storage review.
Configuring the correct number of concurrent sessions, network protocols, and character sets are your next assessments, as shown in Figures 3.6, 3.7, and 3.8.
The impact of connectivity and network settings on performance are examined as part of several chapters, including those on general engine tuning, optimal application development, and network configuration. However, character set issues are not part of the subject matter in this book.
The wizard then gives us a choice on how the database server will be started, as well as security alternatives (see Figures 3.9 and 3.10).
Figure 3.6 Specifying the number of server connections.
Figure 3.7 Enabling TCP/IP support along with its port number.
Because a Windows server is running for this example, MySQL provides Windows-specific options. The interplay between MySQL and its host operating system is explored in Chapter 14, "Operating System, Web Server and Connectivity Tuning"; aside from the performance degradation inherent in overly complex permission schemes, security is largely a peripheral topic for this book.
Figure 3.8 Choosing a character set.
Figure 3.9 Setting operating-specific database service launch variables.
After answering the final questions, the wizard automatically generates the configuration file, and starts the server (see Figures 3.11 and 3.12).
Figure 3.10 Implementing security preferences.
Figure 3.11 Preparing to write the site-specific configuration.
Figure 3.12 Configuration written, MySQL service started.