- The DBA: Revered or Reviled?
- Why Learn Database Administration?
- The Management Discipline of Database Administration
- Database, Data, and System Administration
- DBA Tasks
- Types of DBAs
- Staffing Considerations
- Multiplatform DBA Issues
- Test and Production
- New Technology and the DBA
- DBA Certification
- The Rest of the Book
- Review
DBA Tasks
Ensuring that an organization's data and databases are useful, usable, available, and correct requires the DBA to perform a variety of tasks in a variety of areas. These areas include database design, performance monitoring and tuning, database availability, security, backup and recovery, data integrity, release migrationreally, anything that involves the company's databases. Let's examine each of these topics.
Database Design
To properly design and create relational databases, the DBA must understand and adhere to sound relational design practices. The DBA must understand both relational theory and the specific implementation of the relational database management system (RDBMS) he's using to create the database. Database design requires a sound understanding of conceptual and logical data modeling techniques. The ability to create and interpret entity-relationship diagrams is essential to designing a relational database.
The DBA must be able to transform a logical data model into a physical database implementation. The DBA must ensure that the database design and implementation will enable a useful database for the applications and clients that will use it.
Although database design is a significant skill for the DBA to possess, the job of the DBA is often disproportionately associated with database design. Although designing optimal databases is important, it is a relatively small portion of the DBA's job. A DBA will most likely spend more time administering and tuning databases than in designing and building databases.
By no means, though, should you interpret this to mean that database design is not important. A poor relational design can result in poor performance, a database that does not meet the needs of the organization, and potentially inaccurate data.
Performance Monitoring and Tuning
What is meant by database performance? Let's use the familiar concept of supply and demand. Users demand information from the database, and the DBMS supplies this demand for information. The rate at which the DBMS supplies the information can be termed database performance. However, it is not really that simple. Five factors influence database performance: workload, throughput, resources, optimization, and contention.
The workload that is requested of the DBMS defines the demand. It is a combination of online transactions, batch jobs, ad hoc queries, data warehousing, analytical queries, and commands directed through the system at any given time. Workload can fluctuate drastically from day to day, hour to hour, minute to minute, and even second to second. Sometimes workload can be predicted (such as heavy month-end processing of payroll, or very light access after 7:30 p.m., when most users have left for the day), but at other times it is unpredictable. The overall workload has a major impact on database performance.
Throughput defines the overall capability of the computer hardware and software to process data. It is a composite of I/O speed, CPU speed, parallel capabilities of the machine, and the efficiency of the operating system and system software. The hardware and software tools at the disposal of the system are known as the resources of the system. Examples include the database kernel, disk space, cache controllers, and microcode.
Optimization refers to the analysis of database requests with query cost formulas to generate efficient access paths to data. All types of systems can be optimized, but relational queries are unique in that optimization is primarily accomplished internal to the DBMS. However, many other factors need to be optimized (SQL formulation, database parameters, programming efficiently, and so on) to enable the database optimizer to create the most efficient access paths.
When the demand (workload) for a particular resource is high, contention can result. Contention is the condition in which two or more components of the workload are attempting to use a single resource in a conflicting way (for example, dual updates to the same piece of data). As contention increases, throughput decreases.
Therefore, database performance can be defined as the optimization of resource usage to increase throughput and minimize contention, enabling the largest possible workload to be processed.
Whenever performance problems are encountered by an application that uses a database, the DBA is usually the first one called to resolve the problem. Of course, the DBA cannot manage database performance in a vacuum. Applications regularly communicate with other applications, systems, and components of the IT infrastructure. An effective performance monitoring and tuning strategy requires not just DBMS expertise but knowledge outside the scope of database administration. Many performance management tasks must be shared between the DBA and other technicians. In other words, handling performance problems is truly an enterprisewide endeavor.
Many tasks and abilities are required of DBAs to ensure efficient access to databases. Some of these abilities include building appropriate indexes, specifying large enough buffers and caches, aligning the database implementation with the IT infrastructure, monitoring databases and applications, reorganizing databases, and adapting to business changesmore users, more data, additional processing, and changing requirements and regulations.
Availability
The availability of data and databases is often closely aligned with performance, but it is actually a separate concern. Of course, if the DBMS is offline, performance will be nonexistent because no data can be accessed. However, ensuring database availability is a multifaceted process.
The first component of availability is keeping the DBMS up and running. Vigilant monitoring and automated alerts can be used to warn of DBMS outages and the need for corrective action.
Individual databases also must be maintained so that data is available whenever applications and clients require it. The DBA needs to design the data- base so that it can be maintained with minimal disruptions, but he also helps developers design applications to minimize conflicts when concurrent access is required.
An additional component of availability is minimizing the amount of downtime required to perform administrative tasks. The faster the DBA can perform administrative tasks that require databases to be offline, the more available the data becomes. Increasingly, the DBMS vendors and ISVs are providing nondisruptive utilities that can be performed on databases while applications read and write from the databases. Additionally, database clustering technologies provide failover techniques that help to reduce downtime. Nevertheless, such technology usually requires more skill and up-front planning to implement.
The DBA must understand all of these aspects of availability and ensure that each application is receiving the correct level of availability for its needs.
Database Security and Authorization
Once the database is designed and implemented, programmers and users will need to access and modify the data. However, to prevent security breaches and improper data modification, only authorized programmers and users should have access. It is the responsibility of the DBA to ensure that data is available only to authorized users.
Typically, the DBA works with the internal security features of the DBMS in the form of SQL GRANT and REVOKE statements, as well as with any group-authorization features of the DBMS. Security must be administered for many actions required by the database environment:
Creating database objects, including databases, tables, views, and program structures
Altering the structure of database objects
Accessing the system catalog
Reading and modifying data in tables
Creating and accessing user-defined functions and data types
Running stored procedures
Starting and stopping databases and associated database objects
Setting and modifying DBMS parameters and specifications
Running database utilities such as LOAD, RECOVER, and REORG
Database security can be enforced in other ways as well. For example, views can be created that block access to sensitive data by end users and programmers. In addition, the DBA interfaces frequently with external security methods when they impact database security. In short, the DBA must understand and be capable of implementing any aspect of security that impacts access to databases.
Backup and Recovery
The DBA must be prepared to recover data in the event of a problem. "Problem" can mean anything from a system glitch or program error to a natural disaster that shuts down an organization. The majority of recoveries today occur as a result of application software error and human error. Hardware failures are not as prevalent as they used to be. In fact, analyst estimates indicate that 80% of application errors are due to software failures and human error. The DBA must be prepared to recover data to a usable point, no matter what the cause, and to do so as quickly as possible.
The first type of data recovery that usually comes to mind is a recover to current, usually in the face of a major shutdown. The end result of the recovery is that the database is brought back to its current state at the time of the failure. Applications are completely unavailable until the recovery is complete.
Another type of traditional recovery is a point-in-time recovery. Point-in-time recovery usually deals with an application-level problem. Conventional techniques to perform a point-in-time recovery remove the effects of all transactions since a specified point in time. This can cause problems if valid transactions occurred during that timeframe that still need to be applied.
Transaction recovery is a third type of recovery; it addresses the shortcomings of the traditional types of recovery: downtime and loss of good data. Thus, transaction recovery is an application recovery whereby the effects of specific transactions during a specified timeframe are removed from the database. Therefore, transaction recovery is sometimes referred to as application recovery.
To be prepared for any type of recovery, the DBA needs to develop a backup strategy to ensure that data is not lost in the event of an error in software, hardware, or a manual process. The strategy must be applicable to database processing, so it must include image copies of database files as well as a backup/recovery plan for database logs. It needs to account for any nondatabase file activity that can impact database applications, as well.
Data Integrity
A database must be designed to store the correct data in the correct way without that data becoming damaged or corrupted. To ensure this process, the DBA implements integrity rules using features of the DBMS. Three aspects of integrity are relevant to our discussion of databases: physical, semantic, and internal.
Physical issues can be handled using DBMS features such as domains and data types. The DBA chooses the appropriate data type for each column of each table. This action ensures that only data of that type is stored in the database. That is, the DBMS enforces the integrity of the data with respect to its type. A column defined as "integer" can only contain integers. Attempts to store non-numeric or non-integer values in a column defined as integer will fail. DBAs can also utilize constraints to further delineate the type of data that can be stored in database columns. Most relational DBMS products provide the following types of constraints:
Referential constraints are used to specify the columns that define any relationships between tables. Referential constraints are used to implement referential integrity, which ensures that all intended references from data in one column (or set of columns) of a table are valid with respect to data in another column of the same or a different table.
Unique constraints ensure that the values for a column or a set of columns occur only once in a table.
Check constraints are used to place more complex integrity rules on a column or set of columns in a table. Check constraints are typically defined using SQL and can be used to define the data values that are permissible for a column or set of columns.
Semantic integrity is more difficult to control and less easily defined. An example of semantic integrity is the quality of the data in the database. Simply storing any data that meets the physical integrity definitions specified to the database is not enough. Procedures and practices need to be in place to ensure data quality. For example, a customer database that contains a wrong address or phone number in 25% of the customer records is an example of a database with poor quality. There is no systematic, physical method of ensuring data accuracy. Data quality is encouraged through proper application code, sound business practices, and specific data policies. Redundancy is another semantic issue. If data elements are stored redundantly throughout the database, the DBA should document this fact and work to ensure that procedures are in place to keep redundant data synchronized and accurate.
The final aspect of integrity comprises internal DBMS issues. The DBMS relies on internal structures and code to maintain links, pointers, and identifiers. In most cases, the DBMS will do a good job of maintaining these structures, but the DBA needs to be aware of their existence and how to cope when the DBMS fails. Internal DBMS integrity is essential in the following areas:
Index consistency. An index is really nothing but an ordered list of pointers to data in database tables. If for some reason the index gets out of sync with the data, indexed access can fail to return the proper data. The DBA has tools at his disposal to check for and remedy these types of errors.
Pointer consistency. Sometimes large multimedia objects are not stored in the same physical files as other data. Therefore, the DBMS requires pointer structures to keep the multimedia data synchronized to the base table data. Once again, these pointers may get out of sync if proper administration procedures are not followed.
Backup consistency. Some DBMS products occasionally take improper backup copies that effectively cannot be used for recovery. It is essential to identify these scenarios and take corrective actions.
Overall, ensuring integrity is an essential DBA skill.
DBMS Release Migration
The DBA is also responsible for managing the migration from release to release of the DBMS. DBMS products change quite frequentlynew versions are usually released every year or so. The task of keeping the DBMS running and up- to-date is an ongoing effort that will consume many DBA cycles. Whatever approach is taken must conform to the needs of the organization, while reducing outages and minimizing the need to change applications.
Jack-of-All-Trades
Databases are at the center of modern applications. If the DBMS fails, applications fail, and if applications fail, business can come to a halt. And if business comes to a halt often enough, the entire business can fail. Database administration is therefore critical to the ongoing success of modern business.
Databases interact with almost every component of the IT infrastructure. The IT infrastructure of today comprises many tools:
Programming languages and environments such as COBOL, Microsoft Visual Studio, C/C++, and Java
Database and process design tools such as ERwin and Rational Rose
Transaction processing systems such as CICS and Tuxedo
Message queueing software such as MQSeries and MSMQ
Networking software and protocols such as SNA, VTAM, TCP/IP, and Novell
Networking hardware such as bridges, routers, hubs, and cabling
Multiple operating systems such as Windows, OS/390 and MVS, UNIX, Linux, and perhaps others
Data storage hardware and software such as enterprise storage servers, Microsoft SMS, IBM DFHSM, storage area networks (SANs), and NAS
Operating system security packages such as RACF, ACF2, and Kerberos
Other types of storage hardware such as tape machines, silos, and solid state (memory-based) storage
Non-DBMS data set and file storage techniques such as VSAM and B-tree
Database administration tools
Systems management tools and frameworks such as BMC PATROL and CA Unicenter
Operational control software such as batch scheduling software and job-entry subsystems
Software distribution solutions for implementing new versions of system software across the network
Internet and Web-enabled databases and applications
Client/server development techniques such as multitier, fat server/thin client, thin server/fat client
Object-oriented and component-based development technologies and techniques such as CORBA, COM, OLE DB, ADO, and EJB
PDAs such as Palm Pilots and PocketPCs
Although it is impossible to become an expert in all of these technologies, the DBA should have some knowledge of each of these areas and how they interrelate. Even more importantly, the DBA should have the phone numbers of experts to contact in case any of the associated software and hardware causes database access or performance problems.