- 13.1 Concurrentgate
- 13.2 A Brief History of Data Sharing
- 13.3 Look, Ma, No (Default) Sharing
- 13.4 Starting a Thread
- 13.5 Exchanging Messages between Threads
- 13.6 Pattern Matching with receive
- 13.7 File Copyingwith a Twist
- 13.8 Thread Termination
- 13.9 Out-of-Band Communication
- 13.10 Mailbox Crowding
- 13.11 The shared Type Qualifier
- 13.12 Operations with shared Data and Their Effects
- 13.13 Lock-Based Synchronization with synchronized classes
- 13.14 Field Typing in synchronized classes
- 13.15 Deadlocks and the synchronized Statement
- 13.16 Lock-Free Coding with shared classes
- 13.17 Summary
13.13 Lock-Based Synchronization with synchronized classes
A historically popular method of writing multithreaded programs is lock-based synchronization. Under that discipline, access to shared data is protected by mutexes—synchronization objects that serialize execution of portions of the code that temporarily break data coherence, or that might see such a temporary breakage. Such portions of code are called critical sections.7
A lock-based program's correctness is ensured by introducing ordered, serial access to shared data. A thread that needs access to a piece of shared data must acquire (lock) a mutex, operate on the data, and then release (unlock) that mutex. Only one thread at a time may acquire a given mutex, which is how serialization is effected: when several threads want to acquire the same mutex, one "wins" and the others wait nicely in line. (The way the line is served—that is, thread priority—is important and may affect applications and the operating system quite visibly.)
Arguably the "Hello, world!" of multithreaded programs is the bank account example—an object accessible from multiple threads that must expose a safe interface for depositing and withdrawing funds. The single-threaded baseline version looks like this:
import std.contracts; // Single-threaded bank account class BankAccount { private double _balance; void deposit(double amount) { _balance += amount; } void withdraw(double amount) { enforce(_balance >= amount); _balance -= amount; } @property double balance() { return _balance; } }
In a free-threaded world, += and -= are a tad misleading because they "look" atomic but are not—both are read-modify-write operations. Really _balance += amount is encoded as _balance = _balance + amount, which means the processor loads _balance and _amount into its own operating memory (registers or an internal stack), adds them, and deposits the result back into _balance.
Unprotected concurrent read-modify-write operations lead to incorrect behavior. Say your account has _balance == 100.0 and one thread triggered by a check deposit calls deposit(50). The call gets interrupted, right after having loaded 100.0 from memory, by another thread calling withdraw(2.5). (That's you at the corner coffee shop getting a latte with your debit card.) Let's say the coffee shop thread finishes the entire call uninterrupted and updates _balance to 97.5, but that event happens unbeknownst to the deposit thread, which has loaded 100 into a CPU register already and still thinks that's the right amount. The call deposit(50) computes a new balance of 150 and writes that number back into _balance. That is a typical race condition. Congratulations—free coffee for you (be warned, though; buggy book examples may be rigged in your favor, but buggy production code isn't). To introduce proper synchronization, many languages offer a Mutex type that lock-based threaded programs use to protect access to balance:
// This is not D code // Multithreaded bank account in a language with explicit mutexes class BankAccount { private double _balance; private Mutex _guard; void deposit(double amount) { _guard.lock(); _balance += amount; _guard.unlock(); } void withdraw(double amount) { _guard.lock(); try { enforce(_balance >= amount); _balance -= amount; } finally { _guard.unlock(); } } @property double balance() { _guard.lock(); double result = _balance; _guard.unlock(); return result; } }
All operations on _balance are now protected by acquiring _guard. It may seem there is no need to protect balance with _guard because a double can be read atomically, but protection must be there for reasons hiding themselves under multiple layers of Maya veils. In brief, because of today's aggressive optimizing compilers and relaxed memory models, all access to shared data must entail some odd secret handshake that has the writing thread, the reading thread, and the optimizing compiler as participants; absolutely any bald read of shared data throws you into a world of pain (so it's great that D disallows such baldness by design). First and most obvious, the optimizing compiler, seeing no attempt at synchronization on your part, feels entitled to optimize access to _balance by holding it in a processor register. Second, in all but the most trivial examples, the compiler and the CPU feel entitled to freely reorder bald, unqualified access to shared data because they consider themselves to be dealing with thread-local data. (Why? Because that's most often the case and yields the fastest code, and besides, why hurt the plebes instead of the few and the virtuous?) This is one of the ways in which modern multithreading defies intuition and confuses programmers versed in classic multithreading. In brief, the balance property must be synchronized to make sure the secret handshake takes place.
To guarantee proper unlocking of Mutex in the presence of exceptions and early returns, languages with scoped object lifetime and destructors define an ancillary Lock type to acquire the lock in its constructor and release it in the destructor. The ensuing idiom is known as scoped locking [50] and its application to BankAccount looks like this:
// C++ version of an interlocked bank account using scoped locking class BankAccount { private: double _balance; Mutex _guard; public: void deposit(double amount) { auto lock = Lock(_guard); _balance += amount; } void withdraw(double amount) { auto lock = Lock(_guard); enforce(_balance >= amount); _balance -= amount; } double balance() { auto lock = Lock(_guard); return _balance; } }
Lock simplifies code and improves its correctness by automating the pairing of locking and unlocking. Java, C#, and other languages simplify matters further by embedding _guard as a hidden member and hoisting locking logic up to the signature of the method. In Java, the example would look like this:
// Java version of an interlocked bank account using // automated scoped locking with the synchronized statement class BankAccount { private double _balance; public synchronized void deposit(double amount) { _balance += amount; } public synchronized void withdraw(double amount) { enforce(_balance >= amount); _balance -= amount; } public synchronized double balance() { return _balance; } }
The corresponding C# code looks similar, though synchronized should be replaced with [MethodImpl(MethodImplOptions.Synchronized)].
Well, you've just seen the good news: in the small, lock-based programming is easy to understand and seems to work well. The bad news is that in the large, it is very difficult to pair locks with data appropriately, choose locking scope and granularity, and use locks consistently across several objects (not paying attention to the latter issue leads to threads waiting for each other in a deadlock). Such issues made lock-based coding difficult enough in the good ole days of classic multithreading; modern multithreading (with massive concurrency, relaxed memory models, and expensive data sharing) has put lock-based programming under increasing attack [53]. Nevertheless, lock-based synchronization is still useful in a variety of designs.
D offers limited mechanisms for lock-based synchronization. The limits are deliberate and have the advantage of ensuring strong guarantees. In the particular case of BankAccount, the D version is very simple:
// D interlocked bank account using a synchronized class synchronized class BankAccount { private double _balance; void deposit(double amount) { _balance += amount; } void withdraw(double amount) { enforce(_balance >= amount); _balance -= amount; } double balance() { return _balance; } }
D hoists synchronized one level up to the entire class. This allows D's BankAccount to provides stronger guarantees: even if you wanted to make a mistake, there is no way to offer back-door unsynchronized access to _balance. If D allowed mixing synchronized and unsynchronized methods in the same class, all bets would be off. In fact, experience with method-level synchronized has shown that it's best to either define all or none as synchronized; dual-purpose classes are more trouble than they're worth.
The synchronized class-level attribute affects objects of type shared(BankAccount) and automatically serializes calls to any method of the class. Also, protection checks get stricter for synchronized classes. Recall that according to § 11.1 on page 337, normal protection checks ordinarily do allow access to non-public members for all code within a module. Not so for synchronized classes, which obey the following rules:
- No public data is allowed at all.
- Access to protected members is restricted to methods of the class and its descendants.
- Access to private members is restricted to methods of the class.