7.3 Encrypted Viruses
From the very early days, virus writers tried to implement virus code evolution. One of the easiest ways to hide the functionality of the virus code was encryption. The first known virus that implemented encryption was Cascade on DOS4. The virus starts with a constant decryptor, which is followed by the encrypted virus body. Consider the example extracted from Cascade.1701 shown in Listing 7.1.
Listing 7.1 The Decryptor of the Cascade Virus
lea si, Start ; position to decrypt (dynamically set)
mov sp, 0682 ; length of encrypted body (1666 bytes) Decrypt: xor [si],si ; decryption key/counter 1 xor [si],sp ; decryption key/counter 2 inc si ; increment one counter dec sp ; decrement the other jnz Decrypt ; loop until all bytes are decrypted
Start: ; Encrypted/Decrypted Virus Body
Note that this decryptor has antidebug features because the SP (stack pointer) register is used as one of the decryption keys. The direction of the decryption loop is always forward; the SI register is incremented by one.
Because the SI register initially points to the start of the encrypted virus body, its initial value depends on the relative position of the virus body in the file. Cascade appends itself to the files, so SI will result in the same value if two host programs have equivalent sizes. However, the SI (decryption key 1) is changed if the host programs have different sizes. The SP register is a simple counter for the number of bytes to decrypt. Note that the decryption is going forward with word (double-byte) key length. The decryption position, however, is moved forward by one byte each time. This complicates the decryption loop, but it does not change its reversibility. Note that simple XOR is very practical for viruses because XORing with the same value twice results in the initial value.
Consider encrypting letter P (0x50) with the key 0x99. You see, 0x50 XOR 0x99 is 0xC9, and 0xC9 XOR 0x99 will return to 0x50. This is why virus writers like simple encryption so muchthey are lazy! They can avoid implementing two different algorithms, one for the encryption and one for the decryption.
Cryptographically speaking, such encryption is weak, though early antivirus programs had little choice but to pick a detection string from the decryptor itself. This led to a number of problems, however. Several different viruses might have the same decryptor, but they might have completely different functionalities. By detecting the virus based on its decryptor, the product is unable to identify the variant or the virus itself. More importantly, nonviruses, such as antidebug wrappers, might have a similar decryptor in front of their code. As a result, the virus that uses the same code to decrypt itself will confuse them.
Such a simple code evolution method also appeared in 32-bit Windows viruses very early. W95/Mad and W95/Zombie use the same technique as Cascade. The only difference is the 32-bit implementation. Consider the decryptor from the top of W95/Mad.2736, shown in Listing 7.2.
Listing 7.2 The Decryptor of the W95/Mad.2736 Virus
mov edi,00403045h ; Set EDI to Start
add edi,ebp ; Adjust according to base mov ecx,0A6Bh ; length of encrypted virus body mov al,[key] ; pick the key Decrypt: xor [edi],al ; decrypt body inc edi ; increment counter position loop Decrypt ; until all bytes are decrypted jmp Start ; Jump to Start (jump over some data) DB key 86 ; variable one byte key
Start: ; encrypted/decrypted virus body
In fact, this is an even simpler implementation of the simple XOR method. Detection of such viruses is still possible without trying to decrypt the actual virus body. In most cases, the code pattern of the decryptor of these viruses is unique enough for detection. Obviously, such detection is not exact, but the repair code can decrypt the encrypted virus body and easily deal with minor variants.
The attacker can implement some interesting strategies to make encryption and decryption more complicated, further confusing the antivirus program's detection and repair routines:
The direction of the loop can change: forward and backward loops are supported (see all cases in Figure 7.1).
Multiple layers of encryption are used. The first decryptor decrypts the second one, the second decrypts the third, and so on (see Figure 7.1c.). Hare5 by Demon Emperor, W32/Harrier6 by TechnoRat, {W32, W97M}/Coke by Vecna, and W32/Zelly by ValleZ are examples of viruses that use this method.
Several encryption loops take place one after another, with randomly selected directionsforward and backward loops. This technique scrambles the code the most (see Figure 7.1c.).
There is only one decryption loop, but it uses more than two keys to decrypt each encrypted piece of information on the top of the others. Depending on the implementation of the decryptor, such viruses can be much more difficult to detect. The size of the key especially mattersthe bigger the key size (8, 16, 32 -bit, or more), the longer the brute-force decryption might take if the keys cannot be extracted easily.
The start of decryptor is obfuscated. Some random bytes are padded between the decryptor and the encrypted body and/or the encrypted body and the end of the file.
Nonlinear decryption is used. Some viruses, such as W95/Fono, use a simple nonlinear algorithm with a key table. The virus encryption is based on a substitution table. For instance, the virus might decide to swap the letters A and Z, the letters P and L, and so on. Thus the word APPLE would look like ZLLPE after such encryption.
The attacker can decide not to store the key for encryption anywhere in the virus. Instead, the virus uses brute force to decrypt itself, attempting to recover the encryption keys on its own. Viruses like this are much harder to detect and said to use the RDA (random decryption algorithm) technique. The RDA.Fighter virus is an example that uses this method.
-
The attacker can use a strong encryption algorithm to encrypt the virus. The IDEA family of viruses, written by Spanska, utilizes this method. One of several decryptors uses the IDEA cipher.8 Because the virus carries the key for the decryption, the encryption cannot be considered strong, but the repair of such viruses is painful because the antivirus needs to reimplement the encryption algorithm to deal with it. In addition, the second decryption layer of IDEA virus9 uses RDA.
-
The Czech virus W32/Crypto by Prizzy demonstrated the use of Microsoft crypto API in computer viruses. Crypto encrypts DLLs on the system using a secret/public key pair generated on the fly. Other computer worms and backdoor programs also use the Crypto API to decrypt encrypted content. This makes the job of antivirus scanners more difficult. An example of a computer worm using the Crypto API is W32/Qint@mm, which encrypts EXE files.
-
Sometimes the decryptor itself is not part of the virus. Viruses such as W95/Resur10 and W95/Silcer are examples of this method. These viruses force the Windows Loader to relocate the infected program images when they are loaded to memory. The act of relocating the image is responsible for decrypting the virus body because the virus injects special relocations for the purpose of decryption. The image base of the executable functions as the encryption key.
-
The Cheeba virus demonstrated that the encryption key can be external to the virus body. Cheeba was released in 1991. Its payload is encrypted using a filename. Only when the virus accesses the file name will it correctly decrypt its payload11. Virus researchers cannot easily describe the payload of such virus unless the cipher in the virus is weak. Dmitry Gryaznov managed to reduce the key size necessary to attack the cipher in Cheeba to only 2,150,400 possible keys by using frequency cryptanalysis of the encrypted virus body, assuming that the code under the encryption was written in a similar style as the rest of the virus code12. This yielded the result, and the magic filename, "users.bbs" was found. This filename belonged to a popular bulletin board software. It is expected that more, so-called "clueless agents"13 will appear as computer viruses to disallow the defender to gain knowledge about the intentions of the attacker.
-
Encryption keys can be generated in different ways, such as constant, random but fixed, sliding, and shifting.
-
The key itself can be stored in the decryptor, in the host, or nowhere at all. In some cases, the decryptor's code functions as a decryption key, which can cause problems if the code of the decryptor is modified with a debugger. Furthermore, this technique can attack emulators that use code optimization techniques to run decryptors more efficiently. (An example of such as virus is Tequila.)
-
The randomness of the key is also an important factor. Some viruses only generate new keys once per day and are said to use a slow generator. Others prefer to generate keys every single time they infect an object; these are known as fast generators. The attacker can use many different methods to select the seed of randomness. Simple examples include timer ticks, CMOS time and date, and CRC32. A complicated example is the Mersenne Twister14 pseudo-number generator used by W32/Chiton and W32/Beagle.
The attacker can select several locations to decrypt the encrypted content. The most common methods are shown in Figure 7.2.
Figure 7.1 Decryption loop examples.
Because the virus decryption is not linear, the virus body is not decrypted one byte after another. This easily might confuse a junior virus analyst because in some cases, the virus body might not look encrypted at all. Consequently, if a detection string is picked from such a sample, the virus detection will be partial. This technique easily can confuse even advanced detection techniques that use an emulator. Although in normal cases the emulation can continue until linear detection is detected, such as consecutive byte changes in the memory of a virtual machine used by the scanner, a nonlinear algorithm will force the emulation to continue until a hard-to-guess minimum limit.
A variant of the W32/Chiton ("Efish") virus uses a similar approach to Fono's, but Chiton makes sure it always replaces each byte of the virus body with another value using a complete substitution table. In addition, Chiton uses multiple values to correspond to each byte in the code, significantly complicating the decryption.
Viruses such as W95/Drill and {W32, Linux}/Simile.D represent the state of the art in nonlinear encryption, decrypting each piece of the encrypted virus body in a semi-random order, hitting each position in the virus only once.7
Figure 7.2 Possible places of decryption. A) The decryptor decrypts the data at the location of the encrypted virus body. This method is the most common; however, the encrypted data must be writeable in memory, which depends on the actual operating system. B) The decryptor reads the encrypted content and builds the decrypted virus body on the stack. This is very practical for the attacker. The encrypted data does not need to be writeable. C) The virus allocates memory for the decrypted code and data. This can be a serious disadvantage for the attacker because nonencrypted code needs to allocate memory firstbefore the decryptor.
NOTE
Metamorphic viruses such as Simile circumvent this disadvantage because the code that allocates memory is made variable without providing the ability to pick a search string.
The preceding techniques work very effectively when combined with variable decryptors that keep changing in new generations of the virus. Oligomorphic and polymorphic decryption are discussed in the following sections.