2.2 What Is Cryptography?
Cryptography is the art of "extreme information security." It is extreme in the sense that once treated with a cryptographic algorithm, a message (or a database field) is expected to remain secure even if the adversary has full access to the treated message. The adversary may even know which algorithm was used. If the cryptography is good, the message will remain secure.
This is in contrast to most information security techniques, which are designed to keep adversaries away from the information. Most security mechanisms prevent access and often have complicated procedures to allow access to only authorized users. Cryptography assumes that the adversary has full access to the message and still provides unbroken security. That is extreme security.
A more popular conception of cryptography characterizes it as the science of "scrambling" data. Cryptographers invent algorithms that take input data, called plaintext, and produce scrambled output. Scrambling, used in this sense, is much more than just moving letters around or exchanging some letters for others. After a proper cryptographic scrambling, the output is typically indistinguishable from a random string of data. For instance, a cryptographic function might turn "Hello, whirled!" into 0x397B3AF517B6892C.
While simply turning a message into a random sequence of bits may not seem useful, you’ll soon see that cryptographic hashes, as such functions are known, are very important to modern computer security. Cryptography, though, offers much more.
Many cryptographic algorithms, but not all, are easily reversible if you know a particular secret. Armed with that secret, a recipient could turn 0x397B3AF517B6892C back into "Hello, whirled!" Anyone who did not know the secret would not be able to recover the original data. Such reversible algorithms are known as ciphers, and the scrambled output of a cipher is ciphertext. The secret used to unscramble ciphertext is called a key. Generally, the key is used for both scrambling, called encryption, and unscrambling, called decryption.
A fundamental principle in cryptography, Kerckhoffs’ Principle, states that the security of a cipher should depend only on keeping the key secret. Even if everything else about the cipher is known, so long as the key remains secret, the plaintext should not be recoverable from the ciphertext.
The opposite of Kerckhoffs’ Principle is security through obscurity. Any cryptographic system where the cipher is kept secret depends on security through obscurity.1 Given the difficulty that even professional cryptographers have in designing robust and efficient encryption systems, the likelihood of a secret cipher providing better security than any of the well-known and tested ciphers is vanishingly small. Plus, modern decompilers, disassemblers, debuggers, and other reverse-engineering tools ensure that any secret cipher likely won’t remain secret for long.
Cryptographic algorithms can be broadly grouped into three categories: symmetric cryptography, asymmetric (or public-key) cryptography, and cryptographic hashing. Each of these types has a part to play in most cryptographic systems, and we next consider each of them in turn.
2.2.1 Symmetric Cryptography
Symmetric key cryptography is so named because the cipher uses the same key for both encryption and decryption. Two famous ciphers, Data Encryption Standard (DES) and Advanced Encryption Standard (AES), both use symmetric keys. Because symmetric key ciphers are generally much faster than public-key ciphers, they are suitable for encrypting small and large data items.
Modern symmetric ciphers come in two flavors. Block ciphers encrypt a chunk of several bits all at once, while stream ciphers generally encrypt one bit at a time as the data stream flows past. When a block cipher must encrypt data longer than the block size, the data is first broken into blocks of the appropriate size, and then the encryption algorithm is applied to each. Several modes exist that specify how each block is handled. The modes enable an algorithm to be used securely in a variety of situations. By selecting an appropriate mode, for instance, a block cipher can even be used as stream cipher.
The chief advantage of a stream cipher for database cryptography is that the need for padding is avoided. Given that block ciphers operate on a fixed block size, any blocks of data smaller than that size must be padded. Stream ciphers avoid this, and when the data stream ends, the encryption ends. We’ll return to block and stream ciphers in the algorithm discussion in Chapter 4 "Cryptographic Engines and Algorithms."
The primary drawback of symmetric key ciphers is key management. Because the same key is used for both encryption and decryption, the key must be distributed to every entity that needs to work with the data. Should an adversary obtain the key, not only is the confidentiality of the data compromised, but integrity is also threatened given that the key can be used to encrypt as well as decrypt.
The risks posed by losing control of the key make distributing and storing the key difficult. How can the key be moved securely to all the entities that need to decrypt the data? Encrypting the key for transmission would make sense, but what key would be used to encrypt the key, and how would you get the key-encrypting key to the destination?
Once the key is at the decryption location, how should it be secured so that an attacker can’t steal it? Again, encryption offers a tempting solution, but then you face the problem of securing the key used to encrypt the original key.
2.2.2 Public-Key Cryptography
Public-key cryptography, also known as asymmetric cryptography, is a relatively recent invention. As you might guess from the name, the decryption key is different from the encryption key. Together, the two keys are called a key pair and consist of a public key, which can be distributed to the public, and a private key, which must remain a secret. Typically the public key is the encryption key and the private key is the decryption key, but this is not always the case. Well-known asymmetric algorithms include RSA, ElGamal, and Diffie-Hellman. Elliptic curve cryptography provides a different mathematical basis for implementing existing public-key algorithms.
Public-key ciphers are much slower than symmetric-key ciphers and so are typically used to encrypt smaller data items. One common use is to securely distribute a symmetric key. A sender first encrypts a message with a symmetric key and then encrypts that symmetric key with the intended receiver’s public key. He then sends both to the receiver. The receiver uses her private key to decrypt the symmetric key and then uses the recovered symmetric key to decrypt the message. In this manner the speed of the symmetric cipher is still a benefit, and the problem of distributing the symmetric key is removed. Such systems are known as hybrid cryptosystems.
Another important use for public-key cryptography is to create digital signatures. Digital signatures are used much like real signatures to verify who sent a message. The private key is used to sign the message, and the public key is used to verify the signature.
A common, easily understood digital signature scheme is as follows. To sign a message, the sender encrypts the message with the private key. Anyone with the corresponding public key can decrypt the message and know that it could only have been encrypted with the private key, which presumably only the sender possesses. Note that this does not protect the confidentiality of the message, considering anyone could have the sender’s public key. The goal of a digital signature is simply to verify the sender.
Because the public key can be distributed to anyone, we don’t have the same problem as we do with symmetric cryptography. However, we do have a problem of unambiguously matching the public key with the right person. How do we know that a particular public key truly belongs to the person or entity we think it does? This is the problem that public key infrastructure (PK I) has tried to solve. Unfortunately, PK I hasn’t lived up to its promise, and the jury is still out on what the long-term accepted solution will be.
Public-key cryptography is mentioned here to help readers new to cryptography understand how it is different from symmetric algorithms. We do not use public-key cryptography in this book, and we do not cover particular algorithms or implementation details. As is discussed in section 2.3, "Applying Cryptography," public-key schemes aren’t necessary for solving the problems in which we’re interested.
2.2.3 Cryptographic Hashing
The last type of cryptographic algorithm we’ll look at is cryptographic hashing. A cryptographic hash, also known as a message digest, is like the fingerprint of some data. A cryptographic hash algorithm reduces even very large data to a small unique value. The interesting thing that separates cryptographic hashes from other hashes is that it is virtually impossible to either compute the original data from the hash value or to find other data that hashes to the same value.
A common role played by hashing in modern cryptosystems is improving the efficiency of digital signatures. Because public-key ciphers are much slower than symmetric ciphers, signing large blocks of data is very time-consuming. Instead, most digital signature protocols specify that the digital signature is instead applied to a hash of the data. Given that computing a hash is generally fast and the resulting value is typically much smaller than the data, the signing time is drastically reduced.
Other common uses of cryptographic hashes include protecting passwords, time-stamping data to securely track creation and modification dates and times, and assuring data integrity. The well-known Secure Hash Algorithm family includes SHA-224, SHA-256, SHA-384, and SHA-512. The older SHA-1 and MD5 algorithms are currently in wider use, but flaws in both have been identified, and both should be retired in favor of a more secure hash.