- Java Serialization Evolution
- Changing Version Numbers
- Conclusion
Changing Version Numbers
Too many programs assume that this is the last file version they'll ever need, and only deal with the case of out-of-date versions. They try to read files with a later version number, discover halfway through that something doesn't make sense, and crash horribly. It's easier to deal with such files if they contain a lot of metadata. In Java, each field is marked, by name, in the file. As long as the changes are only additive (additional fields added, nothing deleted or seriously altered), you can easily imagine that it's possible to read future file formats with an old version of the program, losing some of the information in the process but getting at least the gist of the file.
File formats change as program capabilities change. In a perfect world, programs are both backwardly compatible (new versions can read and possibly update files in old formats) and forwardly compatible (old versions make what sense they can out of future file formats).
Microsoft is particularly guilty of failing to do this, such that each new version of Word or PowerPoint that comes out requires upgrades by the entire office, since one person using Word 2012 saves files in a format unreadable to everybody else, even if you're not using any of the new features. Worse, such files are often treated as corrupt, rather than recognized as coming from upgraded software. The Microsoft solution is to offer explicit backward compatibility; you can make an effort to save the file in an old version. This is feasible, but cumbersome. You can do better.
On the surface, it's hard to tell what version a file is. Most programs don't change file extensions for different versions, and there's no universally accepted way to mark file version numbers. Therefore, you should always include the version number of the file format in the file itself. If there is no version number in your present file format, put one in as soon as you make an incompatible change, or find a way to add it to the present format without breaking anything.
The file format is generally included at the beginning of the file, because the program needs to check the file format before doing anything else with the file. You can't begin to read the rest of the file until you know what version number it is.
By convention, file format numbers have two parts: a major version and a minor version. A particular program should be able to work with anything with a particular minor version; major version number changes denote changes to the file format that are so hard to conquer that they require serious processing to convert to the new version.
There's usually one other thing in the file even before the major and minor version numbers. That's the magic number, and the idea is to make sure that you're dealing with the right kind of file. Java class files, for example, all begin with the following bytes (in hexadecimal): CA FE BA BE. There is no universal register for such things, though UNIX systems come with an incomplete list in /etc/magic. They're generally at least four bytes long, offering billions of possibilities, so you have a pretty good chance that yours won't conflict with somebody else's.
It's very helpful to keep forward and backward compatibility in mind when you write or maintain code to read or write files. Read the version number, and pass off the rest of the file to the appropriate processing method, depending on what you read. If it's too old and the program no longer supports that version, the program should simply say so.