- Protecting the Ideas Behind Your Code
- Obfuscation As a Protection of Intellectual Property
- Transformations Performed by Obfuscators
- Knowing the Best Obfuscators
- Potential Problems and Common Solutions
- Using Zelix KlassMaster to Obfuscate a Chat Application
- Cracking Obfuscated Code
- Quick Quiz
- In Brief
Transformations Performed by Obfuscators
No standards exist for obfuscation, so the level of protection varies based on the quality of the obfuscator. The following sections present some of the features commonly found in obfuscators. We will use ChatServer's sendMessage method to illustrate how each transformation affects the decompiled code. The original source code for sendMessage is shown in Listing 3.1.
Listing 3.1 Original Source Code of sendMessage
public void sendMessage(String host, String message) throws Exception { if (host == null || host.trim().length() == 0) throw new Exception ("Please specify host name"); System.out.println("Sending message to host " + host + ": " + message); String url = "//" + host + ":" + this.registryPort + "/chatserver"; ChatServerRemote remoteServer = (ChatServerRemote)Naming.lookup(url); MessageInfo messageInfo = new MessageInfo(this.hostName, this.userName); remoteServer.receiveMessage(message, messageInfo); System.out.println("Message sent to host " + host); }
Stripping Out Debug Information
Java bytecode can contain information inserted by the compiler that helps debug the running code. The information inserted by javac can contain some or all of the following: line numbers, variable names, and source filenames. Debug information is not needed to run the class but is used by debuggers to associate the bytecode with the source code. Decompilers use this information to better reconstruct the source code. With full debug information in the class file, the decompiled code is almost identical to the original source code. When the debug information is stripped out, the names that were stored are lost, so decompilers have to generate their own names. In our case, after the stripping, sendMessage parameter names would appear as s1 and s2 instead of host and message.
Name Mangling
Developers use meaningful names for packages, classes, and methods. Our sample chat application's server implementation is called ChatServer and the method that sends a message to another user is called sendMessage. Good names are crucial for development and maintenance, but they mean nothing to the JVM. Java Runtime (JRE) doesn't care whether sendMessage is called goShopping or abcdefg; it still invokes it and executes it. By renaming the meaningful human-readable names to meaningless machine-generated ones, obfuscators make the task of understanding the decompiled code much harder. What used to be ChatServer.sendMessage becomes d.a; when many classes and methods exist with the same names, the decompiled code is extremely hard to follow. A good obfuscator takes advantage of polymorphism to make matters worse. Three methods with different names and signatures doing different tasks in the original code can be renamed to the same common name in the obfuscated code. Because their signatures are different, it does not violate the Java language specification but adds confusion to the decompiled code. Listing 3.2 shows an example of a decompiled sendMessage after obfuscation that stripped the debugging information and performed name mangling.
Listing 3.2 Decompiled sendMessage After Name Mangling
public void a(String s, String s1) throws Exception { if(s == null || s.trim().length() == 0) { throw new Exception("Please specify host name"); } else { System.out.println(String.valueOf(String.valueOf(( new StringBuffer("Sending message to host ") ).append(s).append(": ").append(s1)))); String s2 = String.valueOf(String.valueOf(( new StringBuffer("//")).append(s).append(":") .append(b).append("/chatserver"))); b b1 = (b)Naming.lookup(s2); MessageInfo messageinfo = new MessageInfo(e, f); b1.receiveMessage(s1, messageinfo); System.out.println("Message sent to host ".concat( String.valueOf(String.valueOf(s)))); return; } }
Encoding Java Strings
Java strings are stored as plain text inside the bytecode. Most of the well-written applications have traces inside the code that produce execution logs for debugging and audit trace. Even if class and method names are changed, the strings written by methods to a log file or console can betray the method purpose. In our case, ChatServer.sendMessage outputs a trace message using the following:
System.out.println("Sending message to host " + host + ": " + message);
Even if ChatServer.sendMessage is renamed to d.a, when you see a trace like this one in the decompiled message body, it is clear what the method does. However, if the string is encoded in bytecode, the decompiled version of the class looks like this:
System.out.println(String.valueOf(String.valueOf((new StringBuffer(a("A\025wV6|\0279_:a\003xU:2\004v\0227}\003m\022")) ).append(s).append(a("(P")).append(s1))));
If you look closely at the encoded string, it is first passed to the a() method, which decodes it and returns the readable string to the System.out.println() method. String encoding is a powerful feature that should be provided by a commercial-strength obfuscator.
Changing Control Flow
The transformations presented earlier make reverse engineering of the obfuscated code harder, but they do not change the fundamental structure of the Java code. They also do nothing to protect the algorithms and program control flow, which is usually the most important part of the innovation. The decompiled version of ChatServer.sendMessage shown earlier is still fairly understandable. You can see that the code first checks for valid input and throws an exception upon error. Then it looks up the remote server object and invokes a method on it.
The best obfuscators are capable of transforming the execution flow of bytecode by inserting bogus conditional and goto statements. This can slow down the execution somewhat, but it might be a small price to pay for the increased protection of the IP. Listing 3.3 shows what sendMessage has become after all the transformations discussed earlier have been applied.
Listing 3.3 Decompiled sendMessage After All Transformations
public void a(String s, String s1) throws Exception { boolean flag = MessageInfo.c; s; if(flag) goto _L2; else goto _L1 _L1: JVM INSTR ifnull 29; goto _L3 _L4 _L3: s.trim(); _L2: if(flag) goto _L6; else goto _L5 _L5: length(); JVM INSTR ifne 42; goto _L4 _L7 _L4: throw new Exception(a("\002)qUe7egDs1,rM6:*g@6<$yQ")); _L7: System.out.println(String.valueOf(String.valueOf(( new StringBuffer(a("\001 zP\177<\"4Ys!6uSsr1{\024~=6´\024")) ).append(s).append(a("he")).append(s1)))); String.valueOf(String.valueOf( (new StringBuffer(a("}j"))).append(s).append(":") .append(b).append(a("}&|Ub! fBs ")))); _L6: String s2; s2; covertjava.chat.b b1 = (covertjava.chat.b)Naming.lookup(s2); MessageInfo messageinfo = new MessageInfo(e, f); b1.receiveMessage(s1, messageinfo); System.out.println(a("\037 gGw5 4Gs<14@yr-{Gbr").concat(String.valueOf ¬(String.valueOf(s)))); if(flag) b.c = !b.c; return;
}
Now that's a total, but powerful, mess! sendMessage is a fairly small method with little conditional logic. If control flow obfuscation was applied to a more complex method with for loops, if statements, and local variables, the obfuscation would be even more effective.
Inserting Corrupt Code
Inserting corrupt code is a somewhat dubious technique used by some obfuscators to prevent obfuscated classes from decompiling. The technique is based on a loose interpretation of the Java bytecode specification by the Java Runtime. JRE does not strictly enforce all the rules of bytecode format verification, and that allows obfuscators to introduce incorrect bytecode into the class files. The introduced code does not prevent the original code from executing, but an attempt to decompile the class file results in a failureor at best in confusing source code full of JVM INSTR keywords. Listing 3.3 shows how a decompiler might handle corrupt code. The risk of using this method is that the corrupted code might not run on a version of JVM that more closely adheres to the specification. Even if it is not an issue with the majority of JVMs today, it might become a problem later.
Eliminating Unused Code (Shrinking)
As an added benefit, most obfuscators remove unused code, which results in application size reduction. For example, if a class called A has a method called m() that is never called by any class, the code for m() is stripped out of A's bytecode. This feature is especially useful for code that is downloaded via the Internet or installed in unsecured environments.
Optimizing Bytecode
Another added benefit touted by obfuscators is potential code optimization. The vendors claim that declaring nonfinal methods as final where possible and performing minor code improvements can help speed up execution. It is hard to assess the real performance gains, and most vendors do not publish the metrics. What is worth noting here is that, with every new release, JIT compilers are becoming more powerful. Therefore, features such as method finalization and dead code elimination are most likely performed by it anyway.