The SSL Protocol
SSL stands for Secure Sockets Layer and TLS stands for Transport Layer Security. They are a family of protocols that were originally designed to provide security for HTTP transactions, but that also can be used for a variety of other Internet protocols such as IMAP and NNTP. HTTP running over SSL is referred to as secure HTTP.
Netscape released SSL version 2 in 1994 and SSL version 3 in 1995. TLS is an IETF standard designed to standardize SSL as an Internet protocol. It is just a modification of SSL version 3 with a small number of added features and minor cleanups. The TLS acronym is the result of arguments between Microsoft and Netscape over the naming of the protocol because each company proposed its own name. However, the name has not stuck and most people refer to these protocols simply as SSL. Unless otherwise specified, the rest of this hour refers to SSL/TLS as SSL.
You specify that you want to connect to a server using SSL by replacing http with https in the protocol component of a URI. The default port for HTTP over SSL is 443.
The following sections explain how SSL addresses the confidentiality, integrity, and authentication requirements outlined in the previous section. In doing so, it explains, in a simplified manner, the underlying mathematical and cryptographic principles SSL is based on.
Confidentiality
The SSL protocol protects data from eavesdropping by encrypting it. Encryption is the process of converting a message, the plaintext, into a new encrypted message, the ciphertext. Although the plaintext is readable by everyone, the ciphertext will be completely unintelligible to an eavesdropper. Decryption is the reverse process, which transforms the ciphertext into the original plaintext.
Usually encryption and decryption processes involve an additional piece of information: a key. If both sender and receiver share the same key, the process is referred to as symmetric cryptography. If sender and receiver have different, complementary keys, the process is called asymmetric or public key cryptography.
Symmetric Cryptography
If the key used to both encrypt and decrypt the message is the same, the process is known as symmetric cryptography. DES, Triple-DES, RC4, and RC2 are algorithms used for symmetric key cryptography. Many of these algorithms can have different key sizes, measured in bits. In general, given an algorithm, the greater the number of bits in the key, the more secure the algorithm is and the slower it will run because of the increased computational needs of performing the algorithm.
Symmetric cryptography is relatively fast compared to public key cryptography, which is explained in the next section. Symmetric cryptography has two main drawbacks, however. One drawback is that keys should be changed periodically, to avoid providing an eavesdropper with access to large amounts of material encrypted with the same key. The other drawback is the key distribution problem: How to get the keys to each one of the parties in a safe manner? This was one of the original limiting factors, and before the invention of public key cryptography, the problem was solved by periodically having people traveling around with suitcases full of keys.
Public Key Cryptography
Public key cryptography takes a different approach. Instead of both parties sharing the same key, there is a pair of keys: one public and the other private. The public key can be widely distributed, whereas the owner keeps the private key secret. These two keys are complementary; a message encrypted with one of the keys can be decrypted only by the other key.
Anyone wanting to transmit a secure message to you can encrypt the message using your public key, assured that only the owner of the private keyyoucan decrypt it. Even if the attacker has access to the public key, he cannot decrypt the communication. In fact, you want the public key to be as widely available as possible. Public key cryptography can also be used to provide message integrity and authentication. RSA is the most popular public key algorithm.
The assertion that only the owner of the private key can decrypt it means that with the current knowledge of cryptography and availability of computing power, an attacker will not be able to break the encryption by brute force alone in a reasonable timeframe. If the algorithm or its implementation is flawed, realistic attacks are possible.
NOTE
Public key cryptography is similar to giving away many identical lockpads and retaining the key that opens them all. Anybody who wants to send you a message privately can do so by putting it in a safe and locking it with one of those lockpads (public keys) before sending it to you. Only you have the appropriate key (private key) to open that lockpad (decrypt the message).
The SSL protocol uses public key cryptography in an initial handshake phase to securely exchange symmetric keys that can then be used to encrypt the communication.
Integrity
Data integrity can be preserved by performing a special calculation on the contents of the message and storing the result with the message itself. When the message arrives at its destination, the recipient can perform the same calculation and compare the results. If the contents of the message changed, the results of the calculation will be different.
Digest algorithms perform just that process, creating message digests. A message digest is a method of creating a fixed-length representation of an arbitrary message that uniquely identifies it. You can think of it as the fingerprint of the message. A good message digest algorithm should be irreversible and collision resistant, at least for practical purposes. Irreversible means that the original message cannot be obtained from the digest and collision resistant means that no two different messages should have the same digest. Examples of digest algorithms are MD5 and SHA.
Message digests alone, however, do not guarantee the integrity of the message because an attacker could change the text and the message digest. Message authentication codes, or MACs, are similar to message digests, but incorporate a shared secret key in the process. The result of the algorithm depends both on the message and the key used. Because the attacker has no access to the key, he cannot modify both the message and the digest. HMAC is an example of a message authentication code algorithm.
The SSL protocol uses MAC codes to avoid replay attacks and to assure integrity of the transmitted information.
Authentication
SSL uses certificates to authenticate parties in a communication. Public key cryptography can be used to digitally sign messages. In fact, just by encrypting a message with your secret key, the receiver can guarantee it came from you. Other digital signature algorithms involve first calculating a digest of the message and then signing the digest.
You can tell that the person who created that public and private key pair is the one sending the message. But how can you tie that key to a person or organization that you can trust in the real world? Otherwise, an attacker could impersonate his identity and distribute a different public key, claiming it is the legitimate one. Trust can be achieved by using digital certificates. Digital certificates are electronic documents that contain a public key and information about its owner (name, address, and so on). To be useful, the certificate must be signed by a trusted third party (certification authority, or CA) who certifies that the information is correct. There are many different kinds of CAs, as described later in the hour. Some of them are commercial entities, providing certification services to companies conducting business over the Internet. Other CAs are created by companies providing internal certification services.
The CA guarantees that the information in the certificate is correct and that the key belongs to that individual or organization. Certificates have a period of validity and can expire or be revoked. Certificates can be chained so that the certification process can be delegated. For example, a trusted entity can certify companies, which in turn can take care of certifying its own employees.
If this whole process is to be effective and trusted, the certificate authority must require appropriate proof of identity from individuals and organizations before it issues a certificate.
By default, browsers include a collection of root certificates for trusted certificate authorities.
SSL and Certificates
The main standard defining certificates is X.509, adapted for Internet usage. An X.509 certificate contains the following information:
Issuer: The name of the signer of the certificate
Subject: The person holding the key being certified
Subject public key: The public key of the subject
Control information: Data such as the dates in which the certificate is valid
Signature: The signature that covers the previous data
You can check a real-life certificate by connecting to a secure server with your browser. If the connection has been successful, a little padlock icon or another visual clue will be added to the status bar of your browser. With Internet Explorer, you can click the locked padlock icon to open a page containing information on the SSL connection and the remote server certificate. You can access the same information by selecting Properties, and then Certificates from the File menu. Other browsers, such as Netscape, Mozilla, and Konqueror provide a similar interface.
Open the https://www.ibm.com URL in your browser and analyze the certificate, following the steps outlined in the preceding paragraph. You can see how the issuer of the certificate is the Equifax Secure E-Business Certification Authority-2, which, in turn, has been certified by the Thawte CA. The page downloaded seamlessly because Thawte is a trusted CA that has its own certificates bundled with Internet Explorer and Netscape Navigator.
To check which certificates are bundled with your Internet Explorer browser, select Tools, Internet Options, Content, Certificates, Trusted Root Certification Authorities.
You can see that both issuer and subject are provided as distinguished names (DN), a structured way of providing a unique identifier for every element on the network. In the case of the IBM certificate, the DN is C=US, S=New York, L=Armonk, O=IBM, CN=www.ibm.com.
C stands for country, S for state, L for locality, O for organization, and CN for common name. In the case of a Web site certificate, the common name identifies the fully qualified domain name of the Web site (FQDN). This is the server name part of the URL; in this case, www.ibm.com. If this does not match what you typed in the top bar, the browser will issue an error.
Figure 17.1 shows the certificate information described earlier.
Figure 17.1 Certificate information.
SSL Protocol Summary
You have seen how SSL achieves confidentiality via encryption, integrity via message authentication codes, and authentication via certificates and digital signatures.
The process to establish an SSL connection is the following:
The user uses his browser to connect to the remote Apache server.
The handshake phase starts, and the browser and server exchange keys and certificate information.
The browser checks the validity of the server certificate, including that it has not expired, that it has been issued by a trusted CA, and so on.
Optionally, the server can require the client to present a valid certificate as well.
Server and client use each other's public key to securely agree on a symmetric key.
The handshake phase concludes and transmission continues using symmetric cryptography.