Software Maintenance: File Format Evolution in Java
- Java Serialization Evolution
- Changing Version Numbers
- Conclusion
Joshua Engel examines how code changes require an evolution in file formats and how to deal with those changes. As he points out, it's not uncommon to lose data when new application versions change how some tasks are accomplished. While there's no completely graceful solution, you can make file format upgrades as painless as possible. This article examines how Java serialized files can be made to evolve better.
Adding a new capability to a released program often requires changing the way users save data, which means a change to the file format. Usually you'll have to store additional information. Sometimes you'll drastically alter the way information is organized or represented. The file format evolves to match the new capabilities of the program. However, you can't afford to forget about the old versions. In the animal kingdom, those who don't adapt die out; in software, users may upgrade, or they may not.
No matter how much better your new file format is, however, and no matter how many improvements it includes, it's generally unacceptable to users for their old files to become unusable with the new software. You have a couple of options for dealing with this problem:
Keep your old code around for reading old files. You'll have to write additional code to convert the old data into the new format (usually done most easily by converting it into your new internal objects, and then using the code you've already written for the new objects to write the new file format). As a bonus, you can keep the old writing code and make it compatible with your new objects. There's still sometimes some loss of information, but it's better than losing everything.
Be able to read and write old file formats. This can be a lot of work, since new versions of a program often have capabilities that older ones lack, so there's usually no place to store the data required to make the new capabilities work.
Data loss is not uncommon when new versions fundamentally change the way some things are done. Old capabilities may no longer be necessary in the new version when the new version achieves the same goal in a different fashion. For example, a program that has changed from a Swing-based interface to a web-oriented interface will lose a lot of information about user preferences that no longer apply. A mail program that changes from a folder-based indexing system to a word-based system will probably lose information in the upgrade between index file formats, which can be especially tragic if one index has saved a lot of user preferences and optimizations that are no longer necessary.
There's no completely graceful solution to these scenarios. However, you can try to make file format upgrades as painless as possible. Because Java serialization is becoming a popular option for saving files, as it's simple and easy to use, let's examine how Java serialized files can be made to evolve better.
Java Serialization Evolution
There are numerous advantages to using Java serialization:
It's very easy to do.
It writes out all the objects that your object links to.
If an object occurs more than once, it's only written a single time. This is particularly important not only because it saves space in the file, but because you don't have to worry about the potential infinite loops you'd get if you were to write this code in a naïve way. (The naïve way would be to recursively write out each object, but if you don't keep track of what you've already written out, you can find yourself going forever.)
Unfortunately, file formats defined by Java serialization tend to be very fragile; very simple modifications to your class can make old objects unreadable. Even simple extensions are not handled easily. For example, this code has a very simple file format:
public class Save implements Serializable { String name; public void save() throws IOException { FileOutputStream f = new FileOutputStream("foo"); ObjectOutputStream oos = new ObjectOutputStream(f); oos.writeObject(this); oos.close(); } }
If you add a field, like this:
final int val = 7;
you'll get an exception when you try to read a previously saved object:
java.io.InvalidClassException: Save; local class incompatible: stream classdesc serialVersionUID = -2805274842657356093, local class serialVersionUID = 3419534311899376629
The big number in the message above is a hash of various properties of the class:
Class name (Save)
Field names (name)
Method names (save)
Implemented interfaces (Serializable)
Change any of those items (adding or deleting), and you'll get a different hash code, which will generate that exception. It's called the serial version universal identifier (UID). You can get around this problem by forcing the class to have the old serialVersionUID by adding a field to the class. It must be
staticso that it's a property of the class, not the object
finalso that it can't change as the code is running
longbecause it's a 64-bit number
So you add the following line:
static final long serialVersionUID=-2805274842657356093L;
The number given is the "stream classdesc"; that is; the one in the saved stream. The L tacked onto the end is for long numbers; this is about the only time I ever use long constants.
Of course, not all changes are compatible. If you change the type of a field from a String to an int, the de-serializer won't know what to do with the value, and you'll get an error message like this:
java.io.InvalidClassException: Save; incompatible types for field name
The Java specification for serialization has a lengthy list of incompatible changes and compatible changes. The lists say just what sort of changes you can make to a class and have older serialized forms be readable. Although the details are tedious, it's fairly easy to understand:
Change |
Okay |
Not Okay |
Adding fields |
X |
|
Changing public/private properties |
X |
|
Changing field names or types |
|
X |
Deleting field names or types |
|
X |
Altering static or transient properties |
|
X |
Changing Serializable/Externalizable interfaces |
|
X |
In short, if you can find a place for all the data in the file, then you can read it, though you may have to play around with the serialization ID.