Concurrent Programming in Java: State Dependence
Two kinds of enabling conditions are generally needed to perform any action:
External. An object receives a message requesting that an action be performed.
Internal. The object is in an appropriate state to perform the action.
As a non-programming example, suppose you are asked to write down a telephone message. To do this, you need to have a pencil and paper (or some other recording device).
Exclusion techniques are mainly concerned with maintaining invariants. State-dependent concurrency control imposes additional concerns surrounding preconditions and postconditions. Actions may have state-based preconditions that need not always hold when clients invoke methods on the host object. Conversely, actions may have postconditions that are unattainable when the host object is not in a proper state, when the actions of other objects it relies on fail to achieve their own postconditions, or when the actions of other threads have changed the states of other objects being relied on.
Most design issues for classes with state-dependent actions revolve around the considerations necessary to complete a design so that you take into account all possible combinations of messages and states, as in:
have pencil | do not have pencil | |
phone ring | answer phone | answer phone |
take message | write message | ? |
As hinted in the table, designs usually need to take into account situations in which the object is not in a state that permits any “normal” action. In an ideal system, all methods would have no state- based preconditions and would always fulfill their postconditions. When sensible, classes and methods should be written in this fashion, thus avoiding nearly all the issues discussed in this chapter. But many activities are intrinsically state- dependent and just cannot be programmed to achieve postconditions in all states.
There are two general approaches to the design and implementation of any state-dependent action, that stem from liveness-first versus safety-first design perspectives:
Optimistic try-and-see methods can always be tried when invoked, but do not always succeed, and thus may have to deal with failure.
Conservative check-and-act methods refuse to proceed unless preconditions hold. When preconditions do hold, the actions always succeed.
If methods check neither their preconditions nor their postconditions, they can be called only in contexts in which the preconditions are somehow known to hold. Reliance on such practices in concurrent systems is problematic at best.
Optimistic and conservative approaches are about equally prevalent, and appropriate forms of them may be equally good or bad with respect to various design forces. But since their general forms are governed by issues that may be outside of your control, the two are not always interchangeable. Optimistic approaches rely on the existence of exceptions and related mechanism that indicate when postconditions do not hold. Conservative approaches rely on the availability of guard constructions that indicate when preconditions hold and guarantee that they continue to hold across the course of an action relying on them. Mixtures are of course possible and are in fact common. In particular, many conservative designs contain code that may encounter exceptions, and thus must be prepared to deal with failure.
Concurrency control measures that deal with state- dependent actions can require significant effort and attention in concurrent programming. This chapter divides coverage as follows:
3.1 discusses exceptions and cancellation.
3.2 introduces the guard constructions used in conservative designs, along with the mechanics used to implement them.
3.3 presents structural patterns for classes employing concurrency control.
3.4 shows how utility classes can reduce complexity while improving reliability, performance, and flexibility.
3.5 extends problems and solutions to deal with joint actions — those depending on the states of multiple participants.
3.6 provides a brief overview of transactional concurrency control.
3.7 concludes with some techniques seen in the construction of concurrency control utility classes.
3.1 Dealing with Failure
Pure optimistic control designs originate from optimistic update and transaction protocols. But optimistic approaches of some sort are seen in just about any code making calls to methods that may encounter failures. Try-and-see designs attempt actions without first ensuring that they will succeed, often because the constraints that would ensure success cannot be checked. However, optimistic methods always check postconditions (often by catching failure exceptions) and, if they fail to hold, apply a chosen failure policy.
The need for try-and-see approaches usually stems from inability or unwillingness to check preconditions and related constraints. This can arise in the following ways:
Some conditions cannot be computed using the constructs available in a given language or execution context. For example, it is not possible to check whether a given lock is being held or a given reference is unique (see 2.3).
In concurrent programs, preconditions may have temporal scopes (in which case they are sometimes called activation constraints). If a constraint is not under the control of the host object, then even if it is known to hold momentarily, it need not hold throughout the course of an action relying on it. For example, your pencil may break while you are writing a message. A file system that is known at entry to a method to have enough space to write a file may run out of space (due to the actions of other independent programs) before the method finishes writing the file. Similarly, the fact that a given remote machine is currently available says nothing about whether it will crash or become unreachable in the course of a method relying on it.
Some conditions change due to the signaling actions of other threads. The most common example is cancellation status, which may asynchronously become true while any thread is performing any action (see 3.1.2).
Some constraints are too computationally expensive to check, for example a requirement that a matrix be normalized in upper-triangular form. When actions are simple and easy to undo or the chances of failure are extremely low, it might not be worth computing even simple preconditions, instead relying on fallback strategies upon later detection of failure.
In all these cases, the lack of provisions that would ensure success forces methods to detect and deal with potential failures to achieve postconditions.
3.1.1 Exceptions
Accommodations for failure infiltrate the design of multithreaded programs. Concurrency introduces the possibility that one part of a program will fail while others continue. But without care, a failing action may leave objects in states such that other threads cannot succeed.
Methods may throw exceptions (as well as set status indicators or issue notifications) when they have detected that their intended effects or postconditions cannot be attained. There are six general responses to such failed actions: abrupt termination, continuation (ignoring failures), rollback, roll- forward, retry, and delegation to handlers. Abrupt termination and continuation are the two most extreme responses. Rollback and roll- forward are intermediate options that ensure that objects maintain consistent states. Retries locally contain failure points. Delegation allows cooperative responses to failure across objects and activities.
Choices among these options must be agreed upon and advertised. It is sometimes possible to support multiple policies and let client code decide which one to use — for example via dialogs asking users whether to retry reading from a disk. Additional examples of these options are illustrated throughout this book.
3.1.1.1 Abrupt termination
An extreme response to failure is to let a method die immediately, returning (usually via an exception) regardless of the state of the current object or status of the current activity. This may apply if you are certain that local failure forces failure of the entire activity and that the objects engaged in the activity will never be used again (for example if they are completely confined within a session — see 2.3.1). For example, this might be the case in a file-conversion component that fails to open the file to be converted.
Abrupt termination is also the default strategy for uncaught (and undeclared) RuntimeExceptions, such as NullPointerException, that most often indicate programming errors. When a normally recoverable failure cannot be dealt with, you can force more extreme responses by escalating it to a throw of a RuntimeException or Error.
Short of full program termination (via System.exit), options for further recovery from such errors are often very limited. When objects are intrinsically shared across activities, and there is no way to re-establish consistent object states upon failure, and there is no possible (or practical) way to back out of a failing operation, then the only recourse is to set a broken or corrupted flag in the object encountering the failure and then abruptly terminate. Such a flag should cause all future operations to fail until the object is somehow repaired, perhaps via the actions of an error handler object.
3.1.1.2 Continuation
If a failed invocation has no bearing on either the state of the caller object or the overall functionality requirements of the current activity, then it may be acceptable just to ignore the exception and continue forward. While it is ordinarily too irresponsible to contemplate, this option may apply in event frameworks and oneway messaging protocols (see 4.1). For example, a failed invocation of a change-notification method on a listener object might at worst cause some parts of an animation sequence to be skipped, without any other long-term consequences.
Continuation policies are also seen within other error handlers (and inside most finally clauses) that ignore other incidental exceptions occurring while they are trying to deal with the failure that triggered them, for example ignoring exceptions while closing files. They may also be used in threads that should never shut down, and thus try their best to continue in the face of exceptions.
3.1.1.3 Rollback
The most desirable semantics in optimistic designs are clean- fail guarantees: Either the operation succeeds completely, or it fails in a way that leaves the object in exactly the same state as before the operation was attempted. The optimistic update techniques in 2.4.4.2 demonstrate one form of this approach in which the success criterion is lack of interference by other threads trying to perform updates.
There are two complementary styles for maintaining state representations that can be used in rollbacks:
Provisional action. Before attempting updates, construct a new representation that will, upon success, be swapped in as the current state. Methods perform updates on the tentative new version of the state representations, but do not commit to the new version until success is assured. This way, nothing needs to be undone upon failure.
Checkpointing. Before attempting updates, record the current state of the object in a history variable, perhaps in the form of a Memento (see the Design Patterns book). Methods directly perform updates on the current representation. But upon failure, fields can be reverted to the old values.
Provisional action is usually necessary when actions are not otherwise fully synchronized. Provisional action eliminates the possibility that other threads will see inconsistent, partially updated representations. It is also more efficient when reads are much more common than writes. Checkpointing is usually simpler to arrange and is thus often preferable in other situations. In either approach, it is not always necessary to create new representation objects to record state: often, a few extra fields in the object, or local variables inside the methods, suffice.
Situation-specific rollback techniques are needed for actions other than state updates that must be undone upon failure, including actions resulting from sending other messages. Every message sent within such a method should have an inverse antimessage. For example, a credit operation might be undone via debit. This idea can be extended to maintaining undo-lists associated with sequences of actions, in order to allow rollback to any given point.
Some kinds of operations can neither be provisionally attempted nor undone via antimessages, and thus cannot employ rollback techniques. This rules out methods with externally visible effects that irrevocably change the real world by performing IO or actuating physical devices unless it is possible to undo the actions without harm. In the case of IO, conventions can be adopted to allow the conceptual equivalent of rollback. For example, if methods log actions in a log file and the log file supports a “please disregard log entry XYZ” option, then this can be invoked in case of failure.
However, as discussed further in 3.1.2.2, rollback of most IO objects (such as InputStreams) themselves is typically not possible. There are no control methods to revert the internal buffers or other fields of most IO objects back to the values they held at some arbitrary point. Typically, the best you can do is close the IO objects and construct new ones bound to the same files, devices, or network connections.
3.1.1.4 Roll-forward
When rollback is impossible or undesirable but full continuation is also impossible, you may instead push ahead as conservatively as possible to re- establish some guaranteed legal, consistent state that may be different from the one holding upon entry to the method. Roll-forward (sometimes known simply as recovery) is often perfectly acceptable as far as other objects, methods, and threads are concerned; in many cases, they cannot even distinguish it from rollback.
Some such actions may be placed in finally clauses that perform minimal cleanup (for example closing files, cancelling other activities) necessary to reach safe points of program execution. Most roll-forward techniques otherwise take forms similar to rollback techniques. But because they do not require full representations of saved or provisional state, they are usually slightly easier to arrange.
Some methods can be divided into two conceptual parts: a preliminary part that can roll back easily (for example, by either returning or rethrowing the exception immediately), and the part occurring after a point of no return, at which some unrecoverable action has already begun, that must be advanced to a safe point even upon failure. For example, a method may reach a point in a protocol at which an acknowledgment must be sent or received (see 3.4.1.4).
3.1.1.5 Retry
You can contain local failure to the current method, rather than throwing exceptions back to clients, if you have reason to believe that retrying an action will succeed. Retries are in general only possible when local rollback options can be applied, so that the state of the object and status of the activity remain the same at the beginning of each retry attempt.
Retry-based tactics may be used when failure is due to other independent objects that may have been in temporarily bad or undesired states; for example, when dealing with IO devices and remote machines. As seen in 2.4.4.2, optimistic state update methods also typically rely on retries, since interference patterns are extremely unlikely to persist indefinitely. Retries are also common in polling designs, for example those discussed in 4.1.5. Variants of retries are seen in cascading algorithms that first try the most desirable of several alternative actions, and if that fails, try a series of less desirable alternatives until one succeeds.
Without care, retries can consume unbounded amounts of CPU time (see 3.2.6). You can minimize the likelihood of repeated contention-based failures, as well as reduce CPU wastage, by inserting heuristic delays between attempts. One popular strategy (seen for example in Ethernet protocols) is exponential backoff, in which each delay is proportionally longer than the last one.
For example, you could use the following method to connect to a server that sometimes refuses connections because it is overloaded. The retry loop backs off for a longer time after each failure. However, it fails upon thread interruption (see 3.1.2) since there is no point in continuing if the current thread has been cancelled. (As noted in 3.1.2.2, on some releases of JDK, you may need to modify this to catch InterruptedIOException and rethrow InterrruptedException.)
class ClientUsingSocket { // Code sketch // ... Socket retryUntilConnected() throws InterruptedException { // first delay is randomly chosen between 5 and 10secs long delayTime = 5000 + (long)(Math.random() * 5000); for (;;) { try { return new Socket(server, portnumber); } catch (IOException ex) { Thread.sleep(delayTime); delayTime = delayTime * 3 / 2 + 1; // increase 50% } } } }
3.1.1.6 Handlers
Calls, callbacks, or notifications to error-handling objects can be useful when you need to offload error processing operations to centralized handlers because an exception in one thread or one part of a system requires compensating actions in other threads or other parts of a system that wouldn't otherwise be known to the method catching the exception. They can also be used to make code more extensible and more resilient when used by clients that cannot be expected to know how to respond to failures. However, some care is needed when replacing exceptions with callbacks, events, and related notification techniques. When they escape the stack-based flow-of-control rules of exceptions, their use can make it more difficult to predict and manage responses to failure across different parts of a system.
One way to set up a handler is to create a before/after class (see 1.4) that deals with exceptions as its after- action. For example, suppose you have an interface describing a service that can throw a ServiceException, and an interface describing handlers for the resulting exceptions. Implementations of ServiceExceptionHandler serve here as Strategy objects, as discussed in the Design Patterns book. You can then make a proxy for use by clients that do not handle ServiceException themselves. For example:
interface ServerWithException { void service() throws ServiceException; } interface ServiceExceptionHandler { void handle(ServiceException e); } class HandledService implements ServerWithException { final ServerWithException server = new ServerImpl(); final ServiceExceptionHandler handler = new HandlerImpl(); public void service() { // no throw clause try { server.service(); } catch (ServiceException e) { handler.handle(e); } } }
Note that while it is legal to declare that HandledService implements ServerWithException, all usages that rely on handlers would need to be statically typed to use HandledService, not the generic ServerWithException type.
A handler object can perform any action that any code in a catch clause can, including shutting down processing in one or more threads or starting up other cleanup threads. The handler call can also somehow communicate the problem to error handling facilities occurring in a different thread, engage in some interactive protocol, rethrow the exception as a RuntimeException or Error, wrap it in an InvocationTargetException to indicate cascaded failures (see 4.3.3.1), and so on.
You can set up services in which clients always use handlers by supplying callback arguments to service methods. Callback- based handling may also apply when the service itself does not even know which exception it should throw upon failure. This can be set up via:
interface ServerUsingCallback { void anotherservice(ServiceFailureHandler handler); }
Here all callers must supply a callback target (which may just be themselves) to be invoked in exceptional situations. Further details, alternatives, and variants are discussed in 4.3.1.
Handlers may also be used when converting one style of messaging protocol to another (see 4.1.1). For example, when using event- based frameworks, a service may generate and issue a new ExceptionEvent that is processed by an ExceptionEventListener. The following ServiceIssuingExceptionEvent class shows one way to set this up. It uses the CopyOnWriteArrayList from 2.4.4 for managing lists of handlers. Alternatively, the events could be issued asynchronously (see 4.1).
class ExceptionEvent extends java.util.EventObject { public final Throwable theException; public ExceptionEvent(Object src, Throwable ex) { super(src); theException = ex; } } class ExceptionEventListener { // Incomplete public void exceptionOccured(ExceptionEvent ee) { // ... respond to exception... } } class ServiceIssuingExceptionEvent { // Incomplete // ... private final CopyOnWriteArrayList handlers = new CopyOnWriteArrayList(); public void addHandler(ExceptionEventListener h) { handlers.add(h); } public void service() { // ... if ( /* failed */ ) { Throwable ex = new ServiceException(); ExceptionEvent ee = new ExceptionEvent(this, ex); for (Iterator it = handlers.iterator(); it.hasNext();) { ExceptionEventListener l = (ExceptionEventListener)(it.next()); l.exceptionOccured(ee); } } } }
An inverse style of conversion, of events to exceptions, is used in the java.beans package, as described in 3.6.4.
3.1.2 Cancellation
When activities in one thread fail or change course, it may be necessary or desirable to cancel activities in other threads, regardless of what they are doing. Cancellation requests introduce inherently unforeseeable failure conditions for running threads. The asynchronous nature of cancellation1 leads to design tactics reminiscent of those in distributed systems where failures may occur at any time due to crashes and disconnections. Concurrent programs have the additional obligation to ensure consistent states of internal objects participating in other threads.
Cancellation is a natural occurrence in most multithreaded programs, seen in:
Nearly any activity associated with a GUI CANCEL button.
Media presentations (for example animation loops) associated with normally terminating activities.
Threads that produce results that are no longer needed. For example, when multiple threads are used to search a database, once one thread returns an answer, the others may be cancelled.
Sets of activities that cannot continue because one or more of them encounter unexpected errors or exceptions.
3.1.2.1 Interruption
The best-supported techniques for approaching cancellation rely on per-thread interruption2 status that is set by method Thread.interrupt, inspected by Thread.isInterrupted, cleared (and inspected) by Thread.interrupted, and sometimes responded to by throwing InterruptedException.
Thread interrupts serve as requests that activities be cancelled. Nothing stops anyone from using interrupts for other purposes, but this is the intended convention. Interrupt-based cancellation relies on a protocol between cancellers and cancellees to ensure that objects that might be used across multiple threads do not become damaged when cancelled threads terminate. Most (ideally all) classes in the java.* packages conform to this protocol.
In almost all circumstances, cancelling the activity associated with a thread should cause the thread to terminate. But there is nothing about interrupt that forces immediate termination. This gives any interrupted thread a chance to clean up before dying, but also imposes obligations for code to check interruption status and take appropriate action on a timely basis.
This ability to postpone or even ignore cancellation requests provides a mechanism for writing code that is both very responsive and very robust. Lack of interruption may be used as a precondition checked at safe points before doing anything that would be difficult or impossible to undo later. The range of available responses includes most of the options discussed in 3.1.1:
Continuation (ignoring or clearing interruptions) may apply to threads that are intended not to terminate; for example, those that perform database management services essential to a program's basic functionality. Upon interrupt, the particular task being performed by the thread can be aborted, allowing the thread to continue to process other tasks. However, even here, it can be more manageable instead to replace the thread with a fresh one starting off in a known good initial state.
Abrupt termination (for example throwing Error) generally applies to threads that provide isolated services that do not require any cleanup beyond that provided in a finally clause at the base of a run method. However, when threads are performing services relied on by other threads (see 4.3), they should also somehow alert them or set status indicators. (Exceptions themselves are not automatically propagated across threads.)
Rollback or roll-forward techniques must be applied in threads using objects that are also relied on by other threads.
You can control how responsive your code is to interrupts in part by deciding how often to check status via Thread.currentThread().isInterrupted() . Checks need not occur especially frequently to be effective. For example, if it takes on the order of 10,000 instructions to perform all the actions associated with the cancellation and you check for cancellation about every 10,000 instructions, then on average, it would take 15,000 instructions total from cancellation request to shutdown. So long as it is not actually dangerous to continue activities, this order of magnitude suffices for the majority of applications. Typically, such reasoning leads you to place interrupt-detection code at only at those program points where it is both most convenient and most important to check cancellation. In performance-critical applications, it may be worthwhile to construct analytic models or collect empirical measurements to determine more accurately the best trade-offs between responsiveness and throughput (see also 4.4.1.7).
Checks for interruption are performed automatically within Object.wait Thread.join, Thread.sleep, and their derivatives. These methods abort upon interrupt by throwing InterruptedException, allowing threads to wake up and apply cancellation code.
By convention, interruption status is cleared when InterruptedException is thrown. This is sometimes necessary to support clean-up efforts, but it can also be the source of error and confusion. When you need to propagate interruption status after handling an InterruptedException, you must either rethrow the exception or reset the status via Thread.currentThread().interrupt(). If code in threads you create calls other code that does not properly preserve interruption status (for example, ignoring InterruptedException without resetting status), you may be able to circumvent problems by maintaining a field that remembers cancellation status, setting it whenever calling interrupt and checking it upon return from these problematic calls.
There are two situations in which threads remain dormant without being able to check interruption status or receive InterruptedException: blocking on synchronized locks and on IO. Threads do not respond to interrupts while waiting for a lock used in a synchronized method or block. However, as discussed in 2.5, lock utility classes can be used when you need to drastically reduce the possibility of getting stuck waiting for locks during cancellation. Code using lock classes dormantly blocks only to access the lock objects themselves, but not the code they protect. These blockages are intrinsically very brief (although times cannot be strictly guaranteed).
3.1.2.2 IO and resource revocation
Some IO support classes (notably java.net.Socket and related classes) provide optional means to time out on blocked reads, in which case you can check for interruption on time-out.
An alternative approach is adopted in other java.io classes — a particular form of resource revocation. If one thread performs s.close() on an IO object (for example, an InputStream) s, then any other thread attempting to use s (for example, s.read()) will receive an IOException. Revocation affects all threads using the closed IO objects and causes the IO objects to be unusable. If necessary, new IO objects can be created to replace them.
This ties in well with other uses of resource revocation (for example, for security purposes). The policy also protects applications from having a possibly shared IO object automatically rendered unusable by the act of cancelling only one of the threads using it. Most classes in java.io do not, and cannot, clean-fail upon IO exceptions. For example, if a low-level IO exception occurs in the midst of a StreamTokenizer or ObjectInputStream operation, there is no sensible recovery action that will preserve the intended guarantees. So, as a matter of policy, JVMs do not automatically interrupt IO operations.
This imposes an additional obligation on code dealing with cancellation. If a thread may be performing IO, any attempt to cancel it in the midst of IO operations must be aware of the IO object being used and must be willing to close the IO object. If this is acceptable, you may instigate cancellation by both closing the IO object and interrupting the thread. For example:
class CancellableReader { // Incomplete private Thread readerThread; // only one at a time supported private FileInputStream dataFile; public synchronized void startReaderThread() throws IllegalStateException, FileNotFoundException { if (readerThread != null) throw new IllegalStateException(); dataFile = new FileInputStream("data"); readerThread = new Thread(new Runnable() { public void run() { doRead(); } }); readerThread.start(); } protected synchronized void closeFile() { // utility method if (dataFile != null) { try { dataFile.close(); } catch (IOException ignore) {} dataFile = null; } } protected void doRead() { try { while (!Thread.interrupted()) { try { int c = dataFile.read(); if (c == -1) break; else process(c); } catch (IOException ex) { break; // perhaps first do other cleanup } } } finally { closeFile(); synchronized(this) { readerThread = null; } } } public synchronized void cancelReaderThread() { if (readerThread != null) readerThread.interrupt(); closeFile(); } }
Most other cases3 of cancelled IO arise from the need to interrupt threads waiting for input that you somehow know will not arrive, or will not arrive in time to do anything about. With most socket-based streams, you can manage this by setting socket time-out parameters. With others, you can rely on InputStream.available, and hand-craft your own timed polling loop to avoid blocking in IO during a time-out (see 4.1.5). These constructions can use a timed back-off retry protocol similar to the one described in 3.1.1.5. For example:
class ReaderWithTimeout { // Generic code sketch // ... void attemptRead(InputStream stream, long timeout) throws... { long startTime = System.currentTimeMillis(); try { for (;;) { if (stream.available() > 0) { int c = stream.read(); if (c != -1) process(c); else break; // eof } else { try { Thread.sleep(100); // arbitrary fixed back-off time } catch (InterruptedException ie) { /* ... quietly wrap up and return ... */ } long now = System.currentTimeMillis(); if (now - startTime >= timeout) { /* ... fail ...*/ } } } } catch (IOException ex) { /* ... fail ... */ } } }
3.1.2.3 Asynchronous termination
The stop method was originally included in class Thread, but its use has since been deprecated. Thread.stop causes a thread to abruptly throw a ThreadDeath exception regardless of what it is doing. (Like interrupt, stop does not abort waits for locks or IO. But, unlike interrupt, it is not strictly guaranteed to abort wait, sleep, or join.)
This can be an arbitrarily dangerous operation. Because Thread.stop generates asynchronous signals, activities can be terminated while they are in the midst of operations or code segments that absolutely must roll back or roll forward for the sake of program safety and object consistency. For a bare generic example, consider:
class C { // Fragments private int v; // invariant: v >= 0 synchronized void f() { v = -1 ; // temporarily set to illegal value as flag compute(); // possible stop point (*) v = 1; // set to legal value } synchronized void g() { while (v != 0) { --v; something(); } } }
If a Thread.stop happens to cause termination at line (*), then the object will be broken: Upon thread termination, it will remain in an inconsistent state because variable v is set to an illegal value. Any calls on the object from other threads might make it perform undesired or dangerous actions. For example, here the loop in method g will spin 2*Integer.MAX_VALUE times as v wraps around the negatives.
The use of stop makes it extremely difficult to apply rollback or roll-forward recovery techniques. At first glance, this problem might not seem so serious — after all, any uncaught exception thrown by the call to compute would also corrupt state. However, the effects of Thread.stop are more insidious since there is nothing you can do in these methods that would eliminate the ThreadDeath exception (thrown by Thread.stop) while still propagating cancellation requests. Further, unless you place a catch(ThreadDeath) after every line of code, you cannot reconstruct the current object state precisely enough to recover, and so you may encounter undetected corruption. In contrast, you can usually bullet-proof code to eliminate or deal with other kinds of run-time exceptions without such heroic efforts.
In other words, the reason for deprecating Thread.stop was not to fix its faulty logic, but to correct for misjudgments about its utility. It is humanly impossible to write all methods in ways that allow a cancellation exception to occur at every bytecode. (This fact is well known to developers of low-level operating system code. Programming even those few, very short routines that must be asynch-cancel- safe can be a major undertaking.)
Note that any executing method is allowed to catch and then ignore the ThreadDeath exception thrown by stop. Thus, stop is no more guaranteed to terminate a thread than is interrupt, it is merely more dangerous. Any use of stop implicitly reflects an assessment that the potential damage of attempting to abruptly terminate an activity is less than the potential damage of not doing so.
3.1.2.4 Resource control
Cancellation may play a part in the design of any system that loads and executes foreign code. Attempts to cancel code that does not conform to standard protocols face a difficult problem. The code may just ignore all interrupts, and even catch and discard ThreadDeath exceptions, in which case invocations of Thread.interrupt and Thread.stop will have no effect.
You cannot control exactly what foreign code does or how long it does it. But you can and should apply standard security measures to limit undesirable effects. One approach is to create and use a SecurityManager and related classes that deny all checked resource requests when a thread has run too long. (Details go beyond the scope of this book; see Further Readings.) This form of resource denial, in conjunction with resource revocation strategies discussed in 3.1.2.2 can together prevent foreign code from taking any actions that might otherwise contend for resources with other threads that should continue. As a byproduct, these measures often eventually cause threads to fail due to exceptions.
Additionally, you can minimize contention for CPU resources by invoking setPriority(Thread.MIN_PRIORITY) for a thread. A SecurityManager may be used to prevent the thread from re-raising its priority.
3.1.2.5 Multiphase cancellation
Sometimes, even ordinary code must be cancelled with more extreme prejudice than you would ordinarily like. To deal with such possibilities, you can set up a generic multiphase cancellation facility that tries to cancel tasks in the least disruptive manner possible and, if they do not terminate soon, tries a more disruptive technique.
Multiphase cancellation is a pattern seen at the process level in most operating systems. For example, it is used in Unix shutdowns, which first try to terminate tasks using kill -1, followed if necessary by kill -9. An analogous strategy is used by the task managers in most window systems.
Here is a sketch of sample version. (More details on the use of Thread.join seen here may be found in 4.3.2.)
class Terminator { // Try to kill; return true if known to be dead static boolean terminate(Thread t, long maxWaitToDie) { if (!t.isAlive()) return true; // already dead // phase 1 -- graceful cancellation t.interrupt(); try { t.join(maxWaitToDie); } catch(InterruptedException e){} // ignore if (!t.isAlive()) return true; // success // phase 2 -- trap all security checks theSecurityMgr.denyAllChecksFor(t); // a made-up method try { t.join(maxWaitToDie); } catch(InterruptedException ex) {} if (!t.isAlive()) return true; // phase 3 -- minimize damage t.setPriority(Thread.MIN_PRIORITY); return false; } }
Notice here that the terminate method itself ignores interrupts. This reflects the policy choice that cancellation attempts must continue once they have begun. Cancelling a cancellation otherwise invites problems in dealing with code that has already started termination-related cleanup.
Because of variations in the behavior of Thread.isAlive on different JVM implementations (see 1.1.2), it is possible for this method to return true before all traces of the killed thread have disappeared.
3.1.3 Further Readings
A pattern-based account of exception handling may be found in:
Renzel, Klaus. “Error Detection”, in Frank Buschmann and Dirk Riehle (eds.) Proceedings of the 1997 European Pattern Languages of Programming Conference, Irsee, Germany, Siemens Technical Report 120/SW1/FB, 1997.
Some low-level techniques for protecting code from asynchronous cancellation or interruption (e.g., masking hardware interrupts) are not available or appropriate in the Java programming language. But even many systems-level developers avoid asynchronous cancellation at all costs. See for example Butenhof's book listed in 1.2.5. Similar concerns are expressed about concurrent object-oriented programs in:
Fleiner, Claudio, Jerry Feldman, and David Stoutamire. “Killing Threads Considered Dangerous”, Proceedings of the POOMA '96 Conference, 1996.
Detecting and responding to termination of a group of threads can require more complex protocols when applied in less structured contexts than seen in most concurrent programs. General-purpose termination detection algorithms are discussed in several of the sources on concurrent and distributed programming listed in 1.2.5.
Security management is described in:
Gong, Li. Inside Java™ 2 Platform Security, Addison-Wesley, 1999.
A resource control framework is described in:
Czajkowski, Grzegorz, and Thorsten von Eicken. “JRes: A Resource Accounting Interface for Java”, Proceedings of 1998 ACM OOPSLA Conference, ACM, 1998.