SQL Essentials
- Data Definition Language
- Data Manipulation Language
- SQL Functions
- Transactions
SQL is a special-purpose language, sometimes described as a relational language, which can be used with a database for a number of different purposes. SQL can be considered a standard language, with some qualification. There are two significant standards, SQL-92 and SQL-99. Most database vendors, including Oracle, are largely compliant with SQL-92. SQL-99 greatly extends the scope of SQL, introducing new features such as persistent modules and multidimensional analytical capabilities. Most database vendors are largely compliant with the core features of SQL-99, but not the new features. This core compliance allows them to claim that they are compliant with SQL-99, even though this is virtually the same as being compliant with SQL-92. To confuse things further, most vendors, including Oracle, also have extensions to standard SQL, which are sometimes unavoidable. SQL commands can generally be grouped into a number of general categories, according to their purpose:
Data Definition Language (DDL), used for defining tables and the relationships between the tables and the data in them.
Data Manipulation Language (DML), used for adding, retrieving, modifying, and deleting data.
Transaction control commands, used to group sets of DML statements into a single unit of work; used to ensure data integrity.
Database administration commands.
SQL/PSM, used for writing procedural programs using persistent stored modules.
The first two categories of commands, DDL and DML, are the core SQL commands used for defining data models and for storing and querying data. These are the commands we will be concentrating on in this chapter. These commands are generally implemented in a standard way by vendors. We will generally adhere to the features defined in the SQL-92 standard for DML and DDL commands.
Transaction control commands are used to isolate groups of commands so that they can be treated as a unit. SQL-92 defines very basic support for transactions; there is no command to mark the start of a transaction, for example. We will largely use this basic model, but we will also consider SQL-99 features as implemented in Oracle.
Database administration commands are largely vendor-dependent. In Oracle, they are used to maintain the physical structure of the database, create users, grant rights to users, create database policies of various sorts, etc. Database administration is a large topic in its own right and well beyond the scope of this book. We'll cover only a few essential commands incidentally, when we set up a sample database.
The last category is an optional part of the SQL standard to allow procedural programming. It defines persistent stored modules that provide control flow statements and bind variables, similar to Oracle's PL/SQL. This standard was accepted in 1996, but since then, vendors have begun to converge on Java stored procedures. So instead of SQL/PSM, we will cover Java stored procedures and PL/SQL in Chapter 5.
Data Definition Language
SQL has three main commands for data definition: CREATE, for creating tables and databases; ALTER, for changing tables and databases; and DROP, for deleting tables and databases. Because creating, altering, and deleting databases are database administration tasks, we will consider only tables in this section.
Creating Tables
We used the CREATE TABLE command in the previous chapter when we started building our sample database. The most basic form of this command is:
CREATE TABLE table_name ( column_1_name column_1_type, column_2_name column_2_type, column_n_name column_n_type );
which can have as few as one or as many as 1,000 columns. Although it is perfectly legal to omit a primary key in Oracle, it's a good practice to include one, as follows:
CREATE TABLE table_name ( column_1_name column_1_type, column_2_name column_2_type, column_n_name column_n_type, PRIMARY KEY (column_list) );
This example shows only a single column as the key. This is commonly the case, especially when we use an identification number of some kind to identify each record uniquely. In some cases, the items in the tables have a unique number already associated with them that make a good keyfor individuals in the United States, a Social Security number is sometimes used in this way. Sometimes, it is necessary or just convenient to create a number for this purpose. Oracle, in particular, provides a featuresequencesthat can be used to generate numbers for keys.
In some cases, a combination of an item's properties serve to identify it uniquely. In the example we started discussing in the previous chapter, we mentioned that we could use the combination of an album title and an artist as the key.
We created our CD_COLLECTION table in the previous chapter with the following SQL command:
CREATE TABLE CD_COLLECTION ( ALBUM_TITLE VARCHAR2(100), ARTIST VARCHAR2(100), COUNTRY VARCHAR2(25), RELEASE_DATE DATE, LABEL VARCHAR2(25), PRIMARY KEY (ALBUM_TITLE, ARTIST) );
The primary key is treated as a separate object in the database and has a name associated with it. If we don't explicitly name it, Oracle will give it a name automaticallysomething fairly cryptic, such as SYS_C001427.
It's a good idea to name our objects explicitly whenever possible. It makes things easier to understand and simplifies maintaining our database. In this case, the way to do it is to name the PRIMARY KEY constraint when we create it. This is a better way to create a table:
CREATE TABLE table_name ( column_1_name column_1_type, column_2_name column_2_type, column_n_nam column_n_type, CONSTRAINT name PRIMARY KEY (column_list) );
If we were to do it all over again, we could use a command such as this to create our CD table:
CREATE TABLE CD_COLLECTION ( ALBUM_TITLE VARCHAR2(100), ARTIST VARCHAR2(100), COUNTRY VARCHAR2(25), RELEASE_DATE DATE, LABEL VARCHAR2(25), CONSTRAINT CD_COLLECTION_PK PRIMARY KEY (ALBUM_TITLE, ARTIST) );
Another option that we should consider applies to the columns in a table, NOT NULL. We can add this to our column definition after the datatype if we wish to require that a value be provided for a particular column. (This is automatically a requirement for key columns.) We may, for example, want to make the COUNTRY a mandatory field in the CD_COLLECTION table. We can do that by changing the CREATE statement like this:
CREATE TABLE CD_COLLECTION ( ALBUM_TITLE VARCHAR2(100), ARTIST VARCHAR2(100), COUNTRY VARCHAR2(25) NOT NULL, RELEASE_DATE DATE, LABEL VARCHAR2(25), CONSTRAINT CD_COLLECTION_PK PRIMARY KEY (ALBUM_TITLE, ARTIST) );
Throughout this chapter we will be creating, altering, and deleting the CD_COLLECTION and SONGS tables. Rather than typing these commands at the sqlplus prompt, you may wish to type the command into a text file then run it as a script because we will later need to re-create the tables in order to add and change data in them.
Although COUNTRY is the only column we explicitly declare NOT NULL, it is not the only column for which we must provide a value. Because the ALBUM_TITLE and ARTIST columns together are the primary key, they are implicitly NOT NULL also.
Finally, let's consider another type of constraint, a FOREIGN KEY. Before we knew we should name our constraints, we created our SONGS table with this statement:
CREATE TABLE SONGS ( SONG_TITLE VARCHAR2(100), COMPOSER VARCHAR2(100), LENGTH NUMBER, TRACK NUMBER, ARTIST VARCHAR2(100), ALBUM_TITLE VARCHAR2(100), PRIMARY KEY (SONG_TITLE), FOREIGN KEY (ARTIST, ALBUM_TITLE) REFERENCES CD_COLLECTION(ARTIST, ALBUM_TITLE) );
The FOREIGN KEY clause indicates that each record in this table is a child of a record in the CD_COLLECTION table. This means that we cannot enter a song in the SONGS table without first entering an album in the CD_COLLECTION table. This also means that we cannot delete any albums from the CD_COLLECTION table without first deleting the corresponding child records in the SONGS table. Were we to attempt to delete a record from CD_COLLECTION that had child records in SONGS, we would get an error like this:
SQL> delete from cd_collection; delete from cd_collection * ERROR at line 1: ORA-02292: integrity constraint (MYDB.SYS_C002745) violated - child record found
We can add an option to the foreign key constraint that will automatically delete child records when the parent record is deleted, ON DELETE CASCADE.
This is the form our statement for creating the SONGS table takes after we name our constraints and add the ON DELETE CASCADE OPTION to it:
CREATE TABLE SONGS ( SONG_TITLE VARCHAR2(100), COMPOSER VARCHAR2(100), LENGTH NUMBER, TRACK NUMBER, ARTIST VARCHAR2(100), ALBUM_TITLE VARCHAR2(100), CONSTRAINT SONGS_PK PRIMARY KEY (SONG_TITLE), CONSTRAINT SONGS_FK_CD_COLLECTION FOREIGN KEY (ARTIST, ALBUM_TITLE) REFERENCES CD_COLLECTION(ARTIST, ALBUM_TITLE) ON DELETE CASCADE );
Keys and Sequences
In the preceding examples we used an existing column or group of columns that uniquely identify a row as the primary key in a table. This is called a natural key. Sometimes, however, a natural key does not exist or it is not practical to use a natural key. If we have a list of names, for example, we may have many John Smiths. As mentioned earlier, one solution that is common in the United States is to use a person's Social Security number to identify unique individuals, but this is a practice that is oftenand justifiablycriticized.
Using long and complex keys, such as combinations of columns, makes it more difficult to use a table. This is a problem with using the combination of ALBUM and TITLE as the primary key for the CD_COLLECTION table.
The way to avoid these problems with natural keys is to use an artificial key. This is an arbitrary unique number that is assigned to each row in the table. This is such a common requirement for relational databases that it is surprising there is no standard way to generate artificial keys for a table. Every RDBMS has its own special way of doing this.
In Oracle, the way to generate artificial keys is by using a SEQUENCE. This is a database object that generates numbers guaranteed to be unique, even for multiple clients in a distributed environment.
Let's redefine our CD_COLLECTION table to use an artificial primary key, CD_ID. But first, we need to drop the SONGS table because it contains a foreign key that refers to this table, then we can drop the CD_COLLECTION table.
DROP TABLE SONGS; DROP TABLE CD_COLLECTION; CREATE TABLE CD_COLLECTION ( CD_ID NUMBER, ALBUM_TITLE VARCHAR2(100), ARTIST VARCHAR2(100), COUNTRY VARCHAR2(25) NOT NULL, RELEASE_DATE DATE, LABEL VARCHAR2(25), CONSTRAINT CD_COLLECTION_PK PRIMARY KEY (CD_ID) );
Next, we'll create a SEQUENCE to generate CD_IDs:
CREATE SEQUENCE CD_ID_SEQUENCE;
Now, when we want to add an album, in addition to the other information, we'll need to include a CD_ID, which we can obtain from CD_ID_SEQUENCE, using the NEXTVAL function:
INSERT INTO CD_COLLECTION (CD_ID, ...) VALUES(CD_ID_SEQUENCE.NEXTVAL, ...);
(We'll learn more about the INSERT statement later in this chapter; the important thing to note is that calling the CD_ID_SEQUENCE.NEXTVAL function returns a unique integer.)
In the same way, we can recreate the SONGS table to use a sequence for the primary key. Notice that we need to change the foreign key (and the corresponding columns) to reflect the new primary key in the CD_COLLECTION table, from ARTIST and ALBUM_TITLE to CD_ID:
CREATE TABLE SONGS ( SONG_ID NUMBER, CD_ID NUMBER, SONG_TITLE VARCHAR2(100), COMPOSER VARCHAR2(100), LENGTH NUMBER, TRACK NUMBER, CONSTRAINT SONGS_PK PRIMARY KEY (SONG_ID), CONSTRAINT SONGS_FK_CD_COLLECTION FOREIGN KEY (CD_ID) REFERENCES CD_COLLECTION(CD_ID) ON DELETE CASCADE ); CREATE SEQUENCE SONG_ID_SEQUENCE;
We don't need any of the details of the numbers that a sequence generates, other than their uniqueness. In fact, because they are entirely arbitrary, our applications should use them only internally, should not depend on their having any particular properties, and should not expose them to users.
By default, sequences generate consecutive numbers starting at 1, but as an optimization in a distributed environmentto minimize the number of round trips between the client application and the serverthe client session may reserve a set of numbers for later use. If it doesn't use all of these numbers, they are silently discarded. This means that there will be gaps in the numbers that get used as primary keys in the database. This shouldn't matter, but if the number appears in the user interface, as a sales order number, for example, nonconsecutive numbers will be perceived as a bug, not a performance feature!
Indexes and Functional Indexes
When we insert data into a database, the data typically is entered in an unordered fashion. One of the benefits of identifying a primary key for a table is that we are indicating to the database that we will be using this key to locate records in the database. Rather than keeping the data sorted by this key, the database creates an index behind the scenes by keeping a separate table of our primary keys in sorted order, together with a pointer to the record in our table.
Creating Indexes
If we know we will be searching the table often, using a specific set of criteria, we sometimes want to create additional indexes of our own. This is especially the case where we are using an artificial key. Using a CD_ID in our COLLECTION_TABLE will make programming easier, as we will see later, but we still will find ourselves frequently searching for ARTIST and ALBUM_TITLE. We may want to create additional indexes for ARTIST and TITLE or the combination of the two, to speed up queries.
There is a cost for indexes, however; each time we add a record to a table in the database, each index for that table must be updated. Adding indexes to support every likely query can make inserting records, especially into large tables, noticeably slow. This might be acceptable if the table is primarily used for queries (particularly ad hoc queries), but more typically, a balance must be struck, and we must be selective about what columns to index.
Suppose we anticipate that we will most frequently query our CD_COLLECTION by ARTIST and only occasionally by ARTIST and ALBUM_TITLE. We might decide to create a single column index for ARTIST. We do this as follows:
CREATE INDEX CD_COLL_ARTIST_IDX ON CD_COLLECTION(ARTIST);
But now let's suppose that, as we develop our application, we realize that although querying on ARTIST is, in fact, a common query, this returns a list of the ARTIST's albums, from which the user will typically then select a specific album. In other words for every search based on ARTIST, there will be another search based on ARTIST plus TITLE. We may want to create instead an index based on the combination of ARTIST and ALBUM_TITLE:
CREATE INDEX CD_COLL_ART_ALB_IDX ON CD_COLLECTION(ARTIST, ALBUM_TITLE);
Because ARTIST is the first column in this index, the database can use this composite index as an index for the ARTIST column too, so we don't need the single-column index we created above.
If, on the other hand, we found that we were often searching on the combination of ARTIST and ALBUM_TITLE, and never on ARTIST, we may want to reverse the order of the columns in this index. It is more efficient if the column with the most unique values appears first in an index. In this case, we can expect that each artist will have multiple albums and album titles for the most part will be unique.
Creating Function-Based Indexes
In Oracle 9i, it is also possible to create indexes based on functions. To support case-insensitive searches, for example, we may want to create a functional index. A functional index is like a regular index, except that instead of specifying a column name, we specify the function (including, of course, any applicable columns). The following will create an index on the uppercase version of ARTIST:
CREATE INDEX CD_COLL_UC_ARTIST_IDX CD_COLLECTION(UPPER(ARTIST));
As each record is inserted into the database, the UPPER() value of the ARTIST column will be indexed. This can greatly ameliorate the performance problems that using functions as part of a search criterion can cause.
Indexes and Performance
It's important to understand the purpose of indexes and their interaction on the code we writethis understanding often leads us to make better choices in how we design our queries. But as a practical matter, in a development environment, with small test databases, indexes make little or no difference.
Setting up indexes properly in a production environment is a complex task. Typically, indexes are assigned to their own tablespaces, which are usually on a separate disk drive than the data table. (The commands in the examples above default to using the same tablespace as the data tables.) Oracle provides a number of performance tools for verifying that queries are using indexes and other optimizations properly. If you will be deploying to such a system, you likely need to work together with the DBA (or refer to additional resources, such as Oracle's documentation) to build the appropriate indexes and make sure your application's queries are properly tuned to use these indexes.
Altering Existing Tables
Once a table is created, it's not too late to modify it. We can add, alter, or drop columns, and disable or enable the constraints, among other things. Here are some of the most commonly used formats that the ALTER command can take:
ALTER TABLE tablename ADD(column_name column_type); ALTER TABLE tablename ADD(CONSTRAINT constraint_name PRIMARY KEY (column_list)); ALTER TABLE tablename MODIFY(column_name, column_type) ALTER TABLE tablename ENABLE CONSTRAINT constraint_name; ALTER TABLE tablename DISABLE CONSTRAINT constraint_name; ALTER TABLE tablename DROP COLUMN column_name; ALTER TABLE tablename DROP CONSTRAINT constraint_name [CASCADE]; ALTER TABLE tablename DROP CONSTRAINT PRIMARY KEY [CASCADE]; ALTER TABLE tablename RENAME table_name TO new_table_name;
We'll first take a look at how to change the name of the primary key. Because there are no commands to rename columns or constraints, in order to rename our primary key, we first need to drop it, then add a new one with the correct name.
To drop a constraint, we need to know its name. If we've used a reasonable naming convention, we can guess what that name is. Otherwise we can find out from the database's data dictionarya set of views (views are a type of virtual table) that we can query to obtain information about the database. The view USER_CONSTRAINTS contains information about constraints such as primary keys. The following query displays the constraints associated with our SONGS table.
SQL> SELECT CONSTRAINT_NAME 2 FROM USER_CONSTRAINTS 3 WHERE TABLE_NAME='SONGS'; CONSTRAINT_NAME ------------------------------ SONGS_PK SONGS_FK_CD_COLLECTION
Knowing the constraint's name, we can drop it:
ALTER TABLE SONGS DROP CONSTRAINT SONGS_PK;
Now we can add a new primary key:
ALTER TABLE SONGS ADD CONSTRAINT SONGS_NEW_PK PRIMARY KEY(SONG_ID);
Querying again for the constraints for this table, we find:
SQL> SELECT CONSTRAINT_NAME 2 FROM USER_CONSTRAINTS 3 WHERE TABLE_NAME='SONGS'; CONSTRAINT_NAME ------------------------------ SONGS_NEW_PK SONGS_FK_CD_COLLECTION
Probably the most common changes we will make to tables are to add columns or to change the type or size of an existing column. Let's take a look at the SONGS table, using the SQL*Plus DESCRIBE table command.
SQL> DESC SONGS Name Null? Type --------------------------------- -------- ----------- SONG_ID NOT NULL NUMBER CD_ID NUMBER SONG_TITLE VARCHAR2(100) COMPOSER VARCHAR2(100) LENGTH NUMBER TRACK NUMBER
(Notice that we can abbreviate DESCRIBE as DESC and that we don't need to add a semicolon at the end because it is an SQL*Plus command, not an SQL command.)
First we'll add a column:
ALTER TABLE SONGS ADD(PUBLISHER VARCHAR2(50));
We can use DESCRIBE again to see that it has been added:
SQL> DESC SONGS Name Null? Type ---------------------------------- -------- ------------- SONG_ID NOT NULL NUMBER CD_ID NUMBER SONG_TITLE VARCHAR2(100) COMPOSER VARCHAR2(100) LENGTH NUMBER TRACK NUMBER PUBLISHER VARCHAR2(50)
Next, we'll change an existing column. But before we do that, we need to consider that existing data may interfere with our ability to do that. If we try to shorten the length of the SONG_TITLE field, but we have data that would not fit in the shortened column, the following error would occur:
SQL> ALTER TABLE SONGS MODIFY(SONG_TITLE VARCHAR2(20)); ALTER TABLE SONGS MODIFY(SONG_TITLE VARCHAR2(20)) * ERROR at line 1: ORA-01441: cannot decrease column length because some value is too big
As we'll see later, we can use an SQL UPDATE statement to find and change the value that is too long, then run this command again.
Another alternative is to drop the table and start fresh. In fact, this isn't as drastic an alternative as it may seem. It's not generally a good idea to create and alter tables ad hoc at an SQL*Plus prompt. To properly plan, document, and manage a database schema, it's better to write scripts to perform DDL commands. If our database is being used for development, we would modify our scripts for creating tables, rather than adding additional scripts to alter tables. In a development environment, we often create and recreate our tables, as we'll see, so scripts for creating tables often begin with a command to drop the table first. If we were in a production environment, we would need to consider preserving user data; we don't need to worry about that, because we also have scripts for creating test dataalthough if we were to decide to change the column length of the SONG_TITLE column, we'd have to remember to change the data to fit within the length restriction.
Dropping Tables
The command for dropping a table is easy. The format is:
DROP TABLE table_name;
But if we try this with our CD_COLLECTION table, we find that it fails, with a complaint from Oracle about the primary key in this table being referenced as a foreign key by another table:
SQL> drop table cd_collection; drop table cd_collection * ERROR at line 1: ORA-02449: unique/primary keys in table referenced by foreign keys
There are two ways to solve this problem, depending on our intention. If we are going to delete all the tables in our databasein order to rebuild it, perhapswe can do this by deleting the tables in an order that doesn't violate these dependencies. In this case, it means dropping the SONGS table first, which removes the reference to the CD_COLLECTION table's primary key, allowing us to drop CD_COLLECTION next.
If we have a large database, however, determining the right order to drop tables can be nontrivial. (It may even be impossible, given that it is possible to create circular dependencies.) We can find out which constraints reference other constraints by looking at the USER_CONSTRAINTS table using the following query:
SQL> SELECT CONSTRAINT_NAME, R_CONSTRAINT_NAME, TABLE_NAME FROM USER_CONSTRAINTS; CONSTRAINT_NAME R_CONSTRAINT_NAME TABLE_NAME ---------------------- ---------------------- ---------- CD_COLLECTION_PK CD_COLLECTION SONGS_NEW_PK SONGS SONGS_FK_CD_COLLECTION CD_COLLECTION_PK SONGS 3 rows selected.
Notice that the constraint SONGS_FK_CD_COLLECTION belonging to the SONGS table references the constraint CD_COLLECTION_PK, as listed under R_CONSTRAINT_NAME. If it weren't already clear from the name, we could also see that the CD_COLLECTION_PK constraint belongs to the CD_COLLECTION table. If there were more tables with additional dependencies, we could continue going through this listing to determine how to proceed. As it is, we can drop the tables with two commands:
DROP TABLE SONGS; DROP TABLE CD_COLLECTION;
Another way of deleting the CD_COLLECTION table is to remove the dependency between the two tables. We can do that by dropping the constraint in the SONGS table. If there were other tables that referenced CD_COLLECTION, we would need to drop the constraints in those, as well. We can do that in one step, however, by dropping the referenced constraint in CD_COLLECTION with option CASCADE, like this:
ALTER TABLE CD_COLLECTION DROP CONSTRAINT CD_COLLECTION_PK CASCADE;
In one step, this drops the CD_COLLECTION primary key constraint, CD_COLLECTION_PK, as well as any other constraints that reference it. Now we can drop the CD_COLLECTION table without a complaint from Oracle:
DROP TABLE CD_COLLECTION;