Transaction Processing in Distributed Service-Oriented Applications
Transaction processing has always been at the core of data-driven enterprise applications. However, the prospect of creating transactional applications was daunting and initially viewed as the province of a select group of highly specialized developers. Therefore, various products, APIs, and platforms evolved over the years to shield a developer from the underlying complexity. The Java platform provides a classic case study of this evolution.
Since the release of JDBC, the first Java-based resource management API, the Java platform has provided the facilities for the demarcation of transactional work. Initially, these facilities simply provided the ability to perform transactional work while leveraging a single database. However, with the release of the Java 2 Enterprise Edition (J2EE) platform and the associated JDBC XA specification interfaces, developers were able to programmaticallyor, in the case of Enterprise Java Beans (EJB), declarativelyspecify the boundaries of a distributed transaction.
However, with the introduction and growing popularity of service-oriented architectures (SOA) the underlying transactional mechanisms from which the developer has been shielded have become much more relevant. The loosely coupled nature of distributed service applications in many cases breaks some of the essential assumptions of existing transaction-processing systems and shifts some of the burden of transaction management from the core infrastructure to the actual service developer.
This article explores the two-phase commit protocol, explains its limitations with respect to distributed service architectures, and describes some of the characteristics of the evolving service-oriented transaction standards aimed at providing data consistency across a distributed service-oriented application.
Two-Phase Commit
The fundamental goal of transaction processing platforms is to guarantee that work performed across multiple distributed components within a system can execute atomically ("all or nothing" semantics), in isolation from other elements in the system, and can be recorded permanently on some form of durable media. These characteristics are collectively referred to as the ACID properties of a transactionatomicity, consistency, isolation, and durability. Achieving this goal requires that the leveraged resource managers (such as databases) reach a consensus about whether it's appropriate to make permanent and visible the work each performed during the execution of the transaction. This goal is most commonly achieved by using the two-phase commit protocol.
Once an application has completed its work across multiple distributed databases, it can issue a commit request that initiates the termination (two-phase commit) protocol.
NOTE
Although other resources such as JMS providers can participate, we'll focus on databases for this discussion.
During the initial (prepare) phase of the protocol, a transaction coordinator sends out a "prepare" message to all databases that have enlisted in the transaction, requesting that each database indicate its readiness to commit or roll back the work managed in the scope of the given transaction. For their part, the databases attempt to checkpoint their work and obtain locks for the affected records and, if successful, respond with a vote to commit. Otherwise, they issue a vote to roll back the transaction. The coordinator proceeds with the second (commit) phase only if all databases have voted to commit. During this phase, the coordinator issues the appropriate command (commit or roll back) to all databases and records the result of the transaction (see Figure 1).
Figure 1 Two-phase commit protocol.
It's important to note that the two-phase commit protocol is a blocking protocol. (Hint: This will be critical to our discussion of service-oriented transactions.) Once a resource manager receives the prepare message and replies with a commit vote, it's obligated to lock the relevant records or data until the coordinator communicates an outcome during the second phase. During this period, a resource manager is said to be "in doubt" or "uncertain." In addition, the protocol message exchanges occur between a transaction coordinator and resource vendorprovided implementations of the XA resource interfaces; the running application is oblivious to the process it has triggered by requesting a commit of the distributed transaction.
Now that you have a basic understanding of the two-phase commit protocol and some of its underlying mechanisms, let's explore its applicability in a service-oriented architecture.