- Service Is King!
- The Role of Logging in Complex Software Systems
- What Do We Do with Log Files?
- Error Categories
- Self-Healing Systems and Standards
- Conclusion
- References
The Role of Logging in Complex Software Systems
I'm often struck by the simplicity of modern software systems now that component-oriented development (COD) is widely used. In COD projects, a major software feature is divided into a set of componentsfile handling, network messaging, user interface, and so on. This structure is not unlike the way a carpenter divides a project, such as building a table, into its component parts. The components are built, gathered together, and joined to form the finished table. COD software development is similar and is increasingly the norm.
One difficulty with component-oriented development is that a single software component can be very complex in its own right. The individual developer tasked with building the component may not have a great deal of time available for worrying about error conditions that can arise after integration with other components. This is where logging and exception handling come in. Figure 1 illustrates a common approach to these two crucial areas.
Figure 1 Exception handling and logging in Java.
In the first line of Figure 1 we see the start of a Java try block (arrowhead 1). This code has potential for failure; for example, opening and updating or reading a file. Once inside the try block, we execute the code that performs the file I/O. If no error occurs, the code runs successfully and we end up at the bottom of Figure 1generally this is the end of a method or function call.
Now suppose an exception occurs. We end up in the catch clause (arrowhead 2), meaning that some important error has occurred, preventing normal execution. At this point, we log the error to a file for later examination. Our code has determined that a problem has occurred and it has been recorded, giving us the beginning of an audit trail. Some problems are benign in nature; for instance, when an error occurs because a file is locked and in use by another component. After catching such an exception, the operation can be retried later. This is analogous to typing the wrong password into a login screenan error occurs and you just type the correct password. Problem solved!
Next, in the finally clause (arrowhead 3), we can carry out cleanup such as closing files or network connections. This technique can be useful in avoiding situations in which resources remain unnecessarily allocated after an error has occurred.
The key point about the code in Figure 1 is that it "catches" exceptions; the application doesn't just crash out to the operating system. Instead, the exception is handled gracefully.
Bear in mind that a typical software solution will consist of more than one product, each of which can generate logged exceptions. Also, the software may not log just exceptionsthere can also be informational events that are recorded to help verify correct operation. All this recording may add up to a lot of log file data. Not surprisingly, the combination of putting all the log file data together, interpreting it, and fixing the problem is a highly skilled task, particularly because programmers normally choose the format and content of their log file messages.