7.6 Metamorphic Viruses
Virus writers still must often waste weeks or months to create a new polymorphic virus that does not have chance to appear in the wild because of its bugs. On the other hand, a researcher might be able to deal with the detection of such a virus in a few minutes or few days. One of the reasons for this is that there are a surprisingly low number of efficient external polymorphic engines.
Virus writers try, of course, to implement various new code evolution techniques to make the researcher's job more difficult. The W32/Apparition virus was the first-known 32-bit virus that did not use polymorphic decryptors to evolve itself in new generations. Rather, the virus carries its source and drops it whenever it can find a compiler installed on the machine. The virus inserts and removes junk code to its source and recompiles itself. In this way, a new generation of the virus will look completely different from previous ones. It is fortunate that W32/Apparition did not become a major problem. However, such a method would be more dangerous if implemented in a Win32 worm. Furthermore, these techniques are even more dangerous on platforms such as Linux, where C compilers are commonly installed with the standard system, even if the system is not used for development. In addition, MSIL (Microsoft Intermediate Langauge) viruses already appeared to rebuild themselves using the System.Reflection.Emit namespace and implement a permutation engine. An example of this kind of metamorphic engine is the MSIL/Gastropod virus, authored by the virus writer, Whale.
The technique of W32/Apparition is not surprising. It is much simpler to evolve the code in source format rather than in binary. Not surprisingly, many macro and script viruses use junk insertion and removal techniques to evolve in new generations20.
7.6.1 What Is a Metamorphic Virus?
Igor Muttik explained metamorphic viruses in the shortest possible way: "Metamorphics are body-polymorphics." Metamorphic viruses do not have a decryptor or a constant virus body but are able to create new generations that look different. They do not use a data area filled with string constants but have one single-code body that carries data as code.
Material metamorphosis does exist in real life. For instance, shape memory polymers have the ability to transform back to their parent shape when heated21. Metamorphic computer viruses have the ability to change their shape by themselves from one form to another, but they usually avoid generating instances that are very close to their parent shape.
Figure 7.4 illustrates the problem of metamorphic virus bodies as multiple shapes.
Figure 7.4 The virus body keeps changing in different generations of a metamorphic virus.
Although there are some DOS metamorphic viruses, such as ACG (Amazing Code Generator), these did not become a significant problem for end users. There are already more metamorphic Windows viruses than metamorphic DOS viruses. The only difference between the two is in their respective potentials. The networked enterprise gives metamorphic binary worms the ability to cause major problems. As a result, we will not be able to turn a blind eye to them, assuming that we do not need to deal with them because they are not causing problems. They will.
7.6.2 Simple Metamorphic Viruses
In December of 1998, Vecna (a notorious virus writer) created the W95/Regswap virus. Regswap implements metamorphosis via register usage exchange. Any part of the virus body will use different registers but the same code. The complexity of this, clearly, is not very high. Listing 7.10 shows some sample code fragments selected from two different generations of W95/Regswap that use different registers.
Listing 7.10 Two Different Generations of W95/Regswap
a.)
5A pop edx BF04000000 mov edi,0004h 8BF5 mov esi,ebp B80C000000 mov eax,000Ch 81C288000000 add edx,0088h 8B1A mov ebx,[edx] 899C8618110000 mov [esi+eax*4+00001118],ebx b.) 58 pop eax BB04000000 mov ebx,0004h 8BD5 mov edx,ebp BF0C000000 mov edi,000Ch 81C088000000 add eax,0088h 8B30 mov esi,[eax]
89B4BA18110000 mov [edx+edi*4+00001118],esi
The bold areas show the common areas of the two code generations. Thus a wildcard string could be useful in detecting the virus. Moreover, support for half-byte wildcard (indicated with the ? mark) bytes such as 5? B? (as described by Frans Veldman) could lead to even more accurate detection. Using the 5?B? wildcard pattern we can detect snippets such as 5ABF, 58BB, and so on.
Depending on the actual ability of the scanning engine, however, such a virus might need an algorithmic detection because of the missing support of wildcard search strings. If algorithmic detection is not supported as a single database update, the product update might not come out for several weeksor monthsfor all platforms!
Other virus writers tried to re-create older permutation techniques. For instance, the W32/Ghost virus has the capability to reorder its subroutines similarly to the BadBoy DOS virus family (see Figure 7.5). Badboy always starts in the entry point (EP) of the virus.
Figure 7.5 The Badboy virus uses eight modules.
The order of the subroutines will be different from generation to generation, which leads to n! different virus generations, where n is the number of subroutines. BadBoy had eight subroutines, and 8!=40,320 different generations. W32/Ghost (discovered in May 2000) has 10 functions, so 10!=3,628,800 combinations. Both of them can be detected with search strings, but some scanners need to deal with such a virus algorithmically.
Two different variants of the W95/Zmorph virus appeared in January of 2000. The polymorphic engine of the virus implements a build-and-execute code evolution. The virus rebuilds itself on the stack with push instructions. Blocks of code decrypt the virus instruction-by-instruction and push the decrypted instructions to the stack. The build routine of the virus is already metamorphic. The engine supports jump insertion and removal between any instructions of the build code. Regardless, code emulators can be used to deal easily with such viruses. A constant code area of the virus is useful for identification because the virus body is decrypted on the stack.
7.6.3 More Complex Metamorphic Viruses and Permutation Techniques
The W32/Evol virus appeared in July of 2000. The virus implements a metamorphic engine and can run on any major Win32 platform. In Listing 7.11, section a. shows a sample code fragment, mutated in b. to a new form in a new generation of the same virus. Even the "magic" DWORD values (5500000Fh, 5151EC8Bh) are changed in subsequent generations of the virus, as shown in c. Therefore any wildcard strings based on these values will not detect anything above the third generation of the virus. W32/Evol's engine is capable of inserting garbage between core instructions.
Listing 7.11 Different Generations of the W32/Evol Virus
a. An early generation:
C7060F000055 mov dword ptr [esi],5500000Fh C746048BEC5151 mov dword ptr [esi+0004],5151EC8Bh b. And one of its later generations: BF0F000055 mov edi,5500000Fh 893E mov [esi],edi 5F pop edi 52 push edx B640 mov dh,40 BA8BEC5151 mov edx,5151EC8Bh 53 push ebx 8BDA mov ebx,edx 895E04 mov [esi+0004],ebx c. And yet another generation with recalculated ("encrypted") "constant" data: BB0F000055 mov ebx,5500000Fh 891E mov [esi],ebx 5B pop ebx 51 push ecx B9CB00C05F mov ecx,5FC000CBh 81C1C0EB91F1 add ecx,F191EBC0h ; ecx=5151EC8Bh
894E04 mov [esi+0004],ecx
Variants of the W95/Zperm family appeared in June and September of 2000. The method used is known from the Ply DOS virus. The virus inserts jump instructions into its code. The jumps will be inserted to point to a new instruction of the virus. The virus body is built in a 64K buffer that is originally filled with zeros. The virus does not use any decryption. In fact, it will not regenerate a constant virus body anywhere. Instead, it creates new mutations by the removal and addition of jump instructions and garbage instructions. Thus there is no way to detect the virus with search strings in the files or in the memory.
Most polymorphic viruses decrypt themselves to a single constant virus body in memory. Metamorphic viruses, however, do not. Therefore the detection of the virus code in memory needs to be algorithmic because the virus body does not become constant even there. Figure 7.6 explains the code structure changes of Zperm-like viruses.
Figure 7.6 The Zperm virus.
Sometimes the virus replaces instructions with other, equivalent instructions. For example, the instruction xor eax, eax (which sets the eax register to zero) will be replaced by sub eax, eax (which also zeroes the contents of the eax register). The opcode of these two instructions will be different.
The core instruction set of the virus has the very same execution order; however, the jumps are inserted in random places. The B variant of the virus also uses garbage instruction insertion and removal such as nop (a do-nothing instruction). It is easy to see that the number of generations can be at least n!, where n is the number of core set instructions in the virus body.
Zperm introduced the real permutating engine (RPME). RPME is available for other virus writers to create new metamorphic viruses. We should note here that permutation is only a single item on the list of metamorphic techniques. To make the virus truly metamorphic, instruction opcode changes are introduced. Encryption can be used in combination with antiemulation and polymorphic techniques.
In October 2000, two virus writers created a new permutation virus, W95/Bistro, based on the sources of the Zperm virus and the RPME. To further complicate the matter, the virus uses a random code block insertion engine. A randomly activated routine builds a do-nothing code block at the entry point of the virus body before any active virus instructions. When executed, the code block can generate millions of iterations to challenge a code emulator's speed.
Simple permutating viruses and complex metamorphic viruses can be very different in their complexity of implementation. In any case, both permutating viruses and metamorphic viruses are different from traditional polymorphic techniques.
In the case of polymorphic viruses, there is a particular moment when we can take a snapshot of the completely decrypted virus body, as illustrated Figure 7.7. Typically, antivirus software uses a generic decryption engine (based on code emulation) to abstract this process. It is not a requirement to have a complete snapshot to provide identification in a virus scanner, but it is essential to find a particular moment during the execution of virus code when a complete snapshot can be madeto classify a virus as a traditional polymorphic virus. It is efficient to have a partial result, as long as there is a long-enough decrypted area of each possible generation of the virus.
Figure 7.7 Snapshot of a decrypted "polymorphic virus."
On the contrary, a complex metamorphic virus does not provide this particular moment during its execution cycle. This is true even if the virus uses metamorphic techniques combined with traditional polymorphic techniques.
7.6.4 Mutating Other Applications: The Ultimate Virus Generator?
Not only does W95/Bistro itself mutate in new generations, it also mutates the code of its host by a randomly executed code-morphing routine. In this way, the virus might generate new worms and viruses. Moreover, the virus cannot be perfectly repaired because the entry-point code area of the application could be different. The code sequence at the entry point of the host application is mutated for a range 480 bytes long. The next listing shows an original and a permutated code sequence of a typical entry point code:
Original entry-point code:
55 push ebp 8BEC mov ebp, esp 8B7608 mov esi, dword ptr [ebp + 08] 85F6 test esi, esi 743B je 401045 8B7E0C mov edi, dword ptr [ebp + 0c] 09FF or edi, edi 7434 je 401045 31D2 xor edx, edx
Permutated entry-point code:
55 push ebp 54 push esp 5D pop ebp 8B7608 mov esi, dword ptr [ebp + 08] 09F6 or esi, esi 743B je 401045 8B7E0C mov edi, dword ptr [ebp + 0c] 85FF test edi, edi 7434 je 401045 28D2 sub edx, edx
Thus an instruction such as test esi, esi can be replaced by or esi, esi, its equivalent format. A push ebp; mov ebp, esp sequence (very common in high-level language applications) can be permutated to push ebp; push esp, pop ebp. It would certainly be more complicated to replace the code with different opcode sizes, but it would be possible to shorten longer forms of some of the complex instructions and include do-nothing code as a filler. This is a problem for all scanners.
If a virus or a 32-bit worm were to implement a similar morphing technique, the problem could be major. New mutations of old viruses and worms would be morphed endlessly! Thus, a virtually endless number of not-yet-detectable viruses and worms would appear without any human intervention, leading to the ultimate virus generator.
NOTE
An even more advanced technique was developed in the W95/Zmist virus22, which is described in the following section.
At the end of 1999, the W32/Smorph Trojan was developed. It implements a semimetamorphic technique to install a backdoor in the system. The standalone executable is completely regenerated during the installation of the Trojan. The PE header is re-created and will include new section names and section sizes. The actual code at the entry point is metamorphically generated. The code allocates memory and then decrypts its own resources, which contain a set of other executables. The Trojan uses API calls to its own import address table, which is filled with many nonessential API imports, as well as some essential ones. Thus everything in the standalone Trojan code will be different in new generations.
7.6.5 Advanced Metamorphic Viruses: Zmist
During Virus Bulletin 2000, Dave Chess and Steve White of IBM demonstrated their research results on undetectable viruses. Shortly after, the Russian virus writer, Zombie, released his Total Zombification magazine, with a set of articles and viruses of his own. One of the articles in the magazine was titled "Undetectable Virus Technology."
Zombie has already demonstrated his set of polymorphic and metamorphic virus-writing skills. His viruses have been distributed for years in source format, and other virus writers have modified them to create new variants. Certainly this is the case with Zombie's "masterpiece" creation, W95/Zmist.
Many of us have not seen a virus approach this complexity for some time. We could easily call Zmist one of the most complex binary viruses ever written. W95/SK, One_Half, ACG, and a few other virus names popped into our minds for comparison. Zmist is a little bit of everything: It is an entry-point obscuring (EPO) virus that is metamorphic. Moreover, the virus randomly uses an additional polymorphic decryptor.
The virus supports a unique new technique: code integration. The Mistfall engine contained in the virus is capable of decompiling PE files to their smallest elements, requiring 32MB of memory. Zmist will insert itself into the code; it moves code blocks out of the way, inserts itself, regenerates code and data references (including relocation information), and rebuilds the executable. This is something that has never been seen in any previous virus.
Zmist occasionally inserts jump instructions after every single instruction of the code section, each of which points to the next instruction. Amazingly, these horribly modified applications will still run as before, just as the infected executables do, from generation to generation. In fact, we have not seen a single crash during test replications. Nobody expected this to worknot even the virus's author Zombie. Although it is not foolproof, it seems to be good enough for a virus. It takes some time for a human to find the virus in infected files. Because of this extreme camouflage, Zmist is easily the perfect antiheuristic virus.
They say a good picture is worth a thousand words. The T-1000 model from the film Terminator 2 is the easiest analogy to use. Zmist integrates itself into the code section of the infected application as the T-1000 model could hide itself on the floor.
7.6.5.1 Initialization
Zmist does not alter the entry point of the host. Instead, it merges with the existing code, becoming part of the instruction flow. However, the code's random location means that sometimes the virus will never receive control. If the virus does run, it will immediately launch the host as a separate process and hide the original process (if the RegisterServiceProcess() function is supported on the current platform) until the infection routine completes. Meanwhile, the virus will begin searching for files to infect.
7.6.5.2 Direct Action Infection
After launching the host process, the virus will check whether there are at least 16MB of physical memory installed. The virus also checks that it is not running in console mode. If these checks pass, it will allocate several memory blocks (including a 32MB area for the Mistfall workspace), permutate the virus body, and begin a recursive search for PE files. This search will take place in the Windows directory and all subdirectories, the directories referred to by the PATH environment variable, and then all fixed or remote drives from A: to Z:. This is a rather brute-force approach to spreading.
7.6.5.3 Permutation
The permutation is fairly slow because it is done only once per infection of a machine. It consists of instruction replacement, such as the reversing of branch conditions, register moves replaced by push/pop sequences, alternative opcode encoding, XOR/SUB and OR/TEST interchanging, and garbage instruction generation. It is the same engine, RPME, that is used in several viruses, including W95/Zperm, which also was written by Zombie.
7.6.5.4 Infection of Portable Executable Files
A file is considered infectable if it meets the following conditions:
It is smaller than 448KB.
It begins with MZ. (Windows does not support ZM-format Windows applications.)
It is not infected already. (The infection marker is Z at offset 0x1C in the MZ header, a field generally not used by Windows applications.)
It is a PE file.
The virus will read the entire file into memory and then choose one of three possible infection types. With a one-in-ten chance, only jump instructions will be inserted between every existing instruction (if the instruction was not a jump already), and the file will not be infected. There is also a one in ten chance that the file will be infected by an unencrypted copy of the virus. Otherwise, the file will be infected by a polymorphically encrypted copy of the virus. The infection process is protected by structured exception handling, which prevents crashes if errors occur. After the rebuilding of the executable, the original file is deleted, and the infected file is created in its place. However, if an error occurs during the file creation, the original file is lost, and nothing will replace it.
The polymorphic decryptor consists of "islands" of code that are integrated into random locations throughout the host code section and linked by jumps. The decryptor integration is performed in the same way as for the virus body integrationexisting instructions are moved to either side, and a block of code is placed between them. The polymorphic decryptor uses absolute references to the data section, but the Mistfall engine will update the relocation information for these references, too. An antiheuristic trick is used for decrypting the virus code: Instead of making the section writeable to alter its code directly, the host is required to have, as one of the first three sections, a section containing writeable, initialized data. The virtual size of this section is increased by 32KB, large enough for the decrypted body and all variables used during decryption. This allows the virus to decrypt code directly into the data section and transfer control there.
If such a section cannot be found, the virus will infect the file without using encryption. The decryptor receives control in one of four ways:
Via an absolute indirect call (0xFF 0x15)
Via a relative call (0xE8)
Via a relative jump (0xE9)
As part of the instruction flow itself
If one of the first three methods is used, the transfer of control will appear soon after the entry point. In the case of the last method, though, an island of the decryptor is simply inserted into the middle of a subroutine somewhere in the code (including before the entry point). All used registers are preserved before decryption and restored afterward, so the original code will behave as before. Zombie calls this last method UEP, perhaps an acronym for "unknown entry point" because there is no direct pointer anywhere in the file to the decryptor.
When encryption is used, the code is encrypted with ADD, SUB, or XOR with a random key, which is altered on each iteration by ADD/SUB/XOR with a second random key. Between the decryption instructions are various garbage instructions. They use a random number of registers and a random choice of loop instruction, all produced by the executable trash generator (ETG) engine, which was also written by Zombie. Randomness features heavily in this virus.
7.6.5.5 Code Integration
The integration algorithm requires that the host have fixups to distinguish between offsets and constants. After infection, however, the fixup data is not required by the virus. Therefore, though it is tempting to look for a gap of about 20KB in the fixup area (which would suggest that the virus body is located there), it would be dangerous to rely on this during scanning.
If another application (such as one of an increasing number of viruses) were to remove the fixup data, the infection would be hidden. The algorithm also requires that the name of each section in the host be one of the following: CODE, DATA, AUTO, BSS, TLS, .bss, .tls, .CRT, .INIT, .text, .data, .rsrc, .reloc, .idata, .rdata, .edata, .debug, or DGROUP. These section names are produced by the most common compilers and assemblers in use: those of Microsoft, Borland, and Watcom. The names are not visible in the virus code because the strings are encrypted.
A block of memory is allocated that is equivalent to the size of the host memory image, and each section is loaded into this array at the section's relative virtual address. The location of every interesting virtual address is noted (import and export functions, resources, fixup destinations, and the entry point), and then the instruction parsing begins.
This is used to rebuild the executable. When an instruction is inserted into the code, all following code and data references must be updated. Some of these references might be branch destinations, and in some cases the size of these branches will increase as a result of the modification. When this occurs, more code and data references must be updated, some of which might be branch destinations, and the cycle repeats. Fortunatelyat least from Zombie's point of viewthis regression is not infinite; although a significant number of changes might be required, the number is limited. The instruction parsing consists of identifying the type and length of each instruction. Flags are used to describe the types, such as instruction is an absolute offset requiring a fixup entry, instruction is a code reference, and so on.
In some cases, an instruction cannot be resolved in an unambiguous manner to either code or data. In that case, Zmist will not infect the file. After the parsing stage is completed, the mutation engine is called, which inserts the jump instructions after every instruction or generates a decryptor and inserts the islands into the file. Then the file is rebuilt, the relocation information is updated, the offsets are recalculated, and the file checksum is restored. If overlay data is appended to the original file, then it is copied to the new file too.
7.6.6 {W32, Linux}/Simile: A Metamorphic Engine Across Systems
W32/Simile is the latest "product" of the developments in metamorphic virus code. The virus was released in the 29A #6 issue in early March 2002. It was written by the virus writer who calls himself The Mental Driller. Some of his previous viruses, such as W95/Drill (which used the Tuareg polymorphic engine), were already very challenging to detect.
W32/Simile moves yet another step in complexity. The source code of the virus is approximately 14,000 lines of Assembly code. About 90% of the virus code is spent on the metamorphic engine itself, which is extremely powerful. The virus's author has called it MetaPHOR, which stands for "metamorphic permutating high-obfuscating reassembler."
The first-generation virus code is about 32KB, and there are three known variants of the virus in circulation. Samples of the variant initially released in the 29A issue have been received by certain AV companies from major corporations in Spain, suggesting a minor outbreak.
W32/Simile is very obfuscated and challenging to understand. The virus attacks disassembling, debugging, and emulation techniques, as well as standard evaluating-based virus-analysis techniques. As with many other complex viruses, Simile uses EPO techniques.
7.6.6.1 Replication Routine
Simile contains a fairly basic direct-action replication mechanism that attacks PE files on the local machine and the network. The emphasis is clearly on the metamorphic engine, which is unusually complex.
7.6.6.2 EPO Mechanism
The virus searches for and replaces all possible patterns of certain call instructions (those that reference ExitProcess() API calls) to point to the beginning of the virus code. Thus the main entry point is not altered. The metamorphic virus body is sometimes placed, together with a polymorphic decryptor, into the same location of the file. In other cases, the polymorphic decryptor is placed at the end of the code section, and the virus body is not placed there but in another section. This is to further obfuscate the location of the virus body.
7.6.6.3 Polymorphic Decryptor
During the execution of an infected program, when the instruction flow reaches one of the hooks that the virus places in the code section, control is transferred to a polymorphic decryptor responsible for decoding the virus body (or simply copying it directly, given that the virus body is intentionally not always encrypted).
This decryptor, whose location in the file is variable, first allocates a large chunk of memory (about 3.5MB) and then proceeds to decipher the encrypted body into it. It does this in a most unusual manner: Rather than going linearly through the encrypted data, it processes it in a seemingly random order, thus avoiding the triggering of some decryption-loop recognition heuristics. This "pseudo-random index decryption" (as the virus writer calls it) relies on the use of a family of functions that have interesting arithmetic properties, modulo 2^n. Although the virus writer discovered this by trial and error, it is in fact possible to give a mathematical proof that his algorithm will work in all cases (provided the implementation is correct, of course). This proof was made by Frederic Perriot at Symantec.
The size and appearance of the decryptor varies greatly from one virus sample to the next. To achieve this high variability, the virus writer simply generates a code template and then puts his metamorphic engine to work to transform the template into a working decryptor.
Additionally, in some cases the decryptor might start with a header whose intent is not immediately obvious upon reading it. Further study reveals that its purpose is to generate antiemulation code on the fly. The virus constructs a small oligomorphic code snippet containing the instruction RDTSC (ReaD Time Stamp Counter), which retrieves the current value of an internal processor ticks counter. Then, based on one random bit of this value, the decryptor either decodes and executes the virus body or bypasses the decryption logic altogether and simply exits.
Besides confusing emulators that would not support the somewhat peculiar RDTSC instruction (one of Mental Driller's favorites, already present in W95/Drill), this is also a very strong attack against all algorithms relying on emulation either to decrypt the virus body or to determine viral behavior heuristicallybecause it effectively causes some virus samples to cease infecting completely upon a random time condition.
When the body of the virus is first executed, it retrieves the addresses of 20 APIs that it requires for replication and for displaying the payload. Then it will check the system date to see whether either of its payloads should activate. Both payloads require that the host import functions from USER32.DLL. If this is the case, the virus checks whether or not it should call the payload routine (which is explained further on).
7.6.6.4 Metamorphism
After the payload check has been completed, a new virus body is generated. This code generation is done in several steps:
Disassemble the viral code into an intermediate form, which is independent of the CPU on which the native code executes. This allows for future extensions, such as producing code for different operating systems or even different CPUs.
Shrink the intermediate form by removing redundant and unused instructions. These instructions were added by earlier replications to interfere with disassembly by virus researchers.
Permutate the intermediate form, such as reordering subroutines or separating blocks of code and linking them with jump instructions.
Expand the code by adding redundant and unused instructions.
Reassemble the intermediate form into a final, native form that will be added to infected files.
Not only can Simile expand as most first-generation metamorphic viruses did, it can also shrink (and shrink to different forms). Simile.D is capable of translating itself to different metamorphic forms (V1, V2...Vn) and does so on more than one operating system (O1,O2...On). So far, the virus does not use multiple CPU forms, but it could also introduce code translation and metamorphism for different processors (P1..Pn) in the future, as shown in Figure 7.8.
Figure 7.8 Stages of Simile from Linux to Windows and from one generation to another.
7.6.6.5 Replication
The replication phase begins next. It begins by searching for *.exe in the current directory and then on all fixed and mapped network drives. The infection will scan recursively into directories, but only to a depth of three subdirectories, avoiding completely any directory that begins with the letter W. For each file that is found, there is a 50% chance that it will be skipped. Additionally, files will be skipped if they begin with F-, PA, SC, DR, or NO or contain the letter V anywhere in the name. Due to the nature of the comparison, other character combinations are unintentionally skipped, such as any directory that begins with the number 7, any file that begins with FM, or any file that contains the number 6 anywhere in the name.
The file-infection routine contains many checks to filter files that cannot be infected safely. These checks include such conditions as the file must contain a checksum, it must be an executable for the Intel 386+ platform, and there must exist sections whose names are .text or CODE and .data or DATA. The virus also checks that the host imports some kernel functions, such as ExitProcess.
For any file that is considered infectable, random factors and the file structure will determine where the virus places its decryptor and virus body. If the file contains no relocations the virus body will be appended to the last section in the file. (Apparently, this can happen randomly with a small chance as well, regardless if there are any relocations in the file.)
In this case, the decryptor will be placed either immediately before the virus body or at the end of the code section. Otherwise, if the name of the last section is .reloc, the virus will insert itself at the beginning of the data section, move all of the following data, and update all of the offsets in the file.
7.6.6.6 Payload
The first payload activates only during March, June, September, or December. Variants A and B of W32/Simile display their messages on the 17th of these months. Variant C will display its message on the 18th of these months. Variant A will display "Metaphor v1 by The Mental Driller/29A," and variant B will display "Metaphor 1b by The Mental Driller/29A." Variant C intends to display "Deutsche Telekom by Energy 2002 **g**"; however, the author of that variant had little understanding of the code, and the message rarely appears correctly. In all cases, the cases of the letters are mixed randomly, as shown in Figure 7.9.
The second payload activates on the 14th of May in variants A and B, and on the 14th of July in variant C. Variants A and B will display "Free Palestine!" on computers that use the Hebrew locale. Variant C contains the text "Heavy Good Code!" but due to a bug this is displayed only on systems on which the locale cannot be determined.
Figure 7.9 The "metamorphic" activation routines of the Simile virus.
The first W32/Linux cross-infector, {W32,Linux}/Peelf, uses two separate routines to carry out the infection on PE and ELF files. Simile.D shares a substantial amount of code between the two infection functions, such as the polymorphic and metamorphic engines. The only platform-specific part in the infection routine is the directory traversal code and the API usage.
The virus was confirmed to infect successfully under versions 6.2, 7.0, and 7.2 of Red Hat Linux, and it very likely works on most other common Linux distributions.
Infected files will grow by about 110KB on average, but the size increase is variable due to the shrinking and expansion capability of the metamorphic engine and to the insertion method.
When the .D variant activates on Linux, it simply prints a console message, as shown in Figure 7.10.
Figure 7.10 The payload of the Simile.D virus on Linux.
7.6.7 The Dark FutureMSIL Metamorphic Viruses
As discussed in this chapter, some new MSIL viruses, such as MSIL/Gastropod, already support semi-metamorphic (permutating) code generation under the .NET Framework. Such viruses have a great advantage because they do not need to carry their own source code. Instead, viruses can simply decompile themselves to generate new binaries, for example, by using the System.Reflection.Emit namespace. Listing 7.12 illustrates the MSIL metamorphic engine of the MSIL/Gastropod virus in two generations.
Listing 7.12 Different Generations of the MSIL/Gastropod Virus
a.)
.method private static hidebysig specialname void .cctor() { ldstr "[ .NET.Snail - sample CLR virus (c) whale 2004 ]" stsfld class System.String Ylojnc.lgxmAxA::WaclNvK nop ldc.i4.6 ldc.i4.s 0xF call int32 [mscorlib]System.Environment::get_TickCount() nop newobj void nljvKpqb::.ctor(int32 len1, int32 len2, int32 seed) stsfld class nljvKpqb Ylojnc.lgxmAxA::XxnArefPizsour call int32 [mscorlib]System.Environment::get_TickCount() nop newobj void [mscorlib]System.Random::.ctor(int32) stsfld class [mscorlib]System.Random Ylojnc.lgxmAxA::aajqebjtoBxjf ret } b.) .method private static hidebysig specialname void .cctor() { ldstr "[ .NET.Snail - sample CLR virus (c) whale 2004 ]" stsfld class System.String kivAklozuas.ghqrRrlv::ngnMTzqo ldc.i4.6 ldc.i4.s 0xF call int32 [mscorlib]System.Environment::get_TickCount() newobj void xiWtNaocl::.ctor(int32 len1, int32 len2, int32 seed) stsfld class xiWtNaocl kivAklozuas.ghqrRrlv::yXuzlmssjjp call int32 [mscorlib]System.Environment::get_TickCount() newobj void [mscorlib]System.Random::.ctor(int32) stsfld // line continues on next line class [mscorlib]System.Random kivAklozuas.ghqrRrlv::kaokaufdiehjs nop ret
}
The extracted snippets represent the class constructor (".cctor") function of MSIL/Gastropod in two different generations. The semi-metamorphic engine of the virus obfuscates class names and method names23. In addition, the permutation engine also inserts and removes junk instructions (such as nop) into its body. Indeed, it is not well known to MSIL developers that they can invoke a compiler and generate code and assembly from a running application, but virus writers already use this feature.
MSIL/Gastropod is a code-builder virus: It reconstructs its own host program with itself. This method allows the virus body to be placed to an unpredictable position within the host program. The main entry-point method of the host program is replaced by the entry-point method of the virus. When the infected host is executed, the virus code is invoked, which will run the original main entry-point method eventually.
In addition, some viruses do not rely on the use of .NET Framework namespace to implement parasitic MSIL infection. For example, the MSIL/Impanate virus, written by roy g. biv, is fully aware of both 32-bit and 64-bit MSIL files and infects them using the EPO (Entry-Point Obscuring) technique. Thus next generation MSIL metamorphic viruses might not rely on the use of System.Reflection.Emit and similar namespaces anymore.