Engineering Reliable Digital Interfaces
Ask a group of engineers what aspects of the job cause the most loss of sleep, and they will probably list lay-offs and non-physical schedules among their chief anxieties. One other item is likely to compete vigorously for this dubious title: deciding whether or not a design is ready to build. Sometimes it seems the more time a design spends in the checking phase, the more problems reveal themselves. Can an engineer ever state with confidence that the checking process has ferreted out the weaknesses in a design? Even if the answer to this question were yes, another more disturbing question presents itself: Is it possible to predict the exact combination of conditions that would deal a fatal blow to a digital interface? These questions represent the elusive Holy Grail of engineering, which is the ability to quantitatively predict operating margins across all future manufacturing and operating conditions. If there were only a handful of independent failure mechanisms active in a digital interface, the answers might not be so elusive. In reality, each digital interface comprises many multi-faceted failure mechanisms that interact with one another; understanding and quantifying their interaction can be extremely challenging. Yet the signal integrity engineer must take a position on the operating margins for every new design.
Operating Margin: the difference between a physical constraint and the sum of all parameters and variations in these parameters that could violate the constraint.
Imagine a sixteen-foot semi truck that has to pass under a bridge with a seventeen-foot clearance. The distance, D, between the floor and the ceiling of the trailer is static. The radius, R, of the tires varies around some nominal value depending on load, inflation, and temperature. Figure 1.1 captures the fundamental relationship between constraints and margin.
Figure 1.1 Operating margin
Since digital signal integrity emerged as a discipline in the 1960s, three main signaling paradigms have evolved: common-clock, source-synchronous, and high-speed serial. It would certainly be difficult to identify a unified set of practices for quantifying operating margins that applies equally well to all three paradigms. Even within one paradigm, different applications require different approaches for assigning a number to each effect that encroaches on the design constraints. Nevertheless, all digital interfaces share a common set of limitations:
- The number of picoseconds in a clock cycle (or unit interval) is finite.
- If the input to the primary flip-flop is not in a stable and correct state when the clock samples it, a bit error will occur.
- Because a receiver is really an amplifier, its output will become highly sensitive to unpredictable noise events if the signal amplitude at its input drops low enough.
- If a noise event at a receiver input is strong enough to cross into the threshold region, the output will switch.
These concepts form the basis for developing a strategy for quantifying the operating margins of any digital interface. The exact form a strategy may take is not as important as the exercise of examining all significant detractors from operating margin and assigning a number to each one. In fact, the form may vary widely from design to design, engineer to engineer, and company to company. If an engineer is fortunate enough to encounter a design that is relatively similar to its predecessor, the strategy may not need to change at all. Colleagues may take issue with the numbers or how to combine them, but that healthy exchange cannot take place unless somebody makes an initial estimate.
Engineers tend to develop their own unique bag of tricks that emerge from the evolutionary process of learning. This book is intended to help fellow signal integrity engineers add to their bag of tricks by challenging them to think deeply about how the fundamental sources of pathological effects in digital interfaces combine to form a fault condition. The maturing discipline of signal integrity demands an increasing level of accuracy and precision from each of us. Although the Holy Grail of failure prediction may never be found, the process of looking for it will produce more reliable digital systems.
A Sadly Familiar Tale
Project Coyote took off like a roadrunner from the very beginning. Conceived in a boardroom behind closed doors at Acme Inc., it was already well-defined before anyone in development engineering heard of it. The senior engineering staff voiced their opinions about the unrealistic schedule, especially during a time when other high-profile projects were consuming most of the company's resources. However, commitments to customers were already made, a marketing plan was in the works, finances were allocated, and the wheels were in motion: the first domino in a sequence of related events.
The second domino was in place before printed circuit (PC) board placement and routing began. The Vice President of Engineering had defined a budget that was consistent with the price point marketing deemed to be competitive, and this budget called for a four-layer PC board: signal-ground-power-signal. Upon seeing the form factor for the first time, the PC board designer expressed concern about routability in certain areas he perceived to be bottlenecks. The signal integrity engineer assigned to this project expressed her concern about high forward crosstalk due to microstrip transmission lines and high edge rates in the PCI Express signals. However, neither she nor the board designer was able to prove that a solution did not exist, so the design progressed as planned.
An exceptionally aggressive schedule became domino number three. In order to meet the schedule, the board design shop had two of their top designers working back-to-back 12-hour shifts for three weeks with one day off on the weekends. While it is true that an auto-router can save many hours of human labor, there is no substitute for an experienced designer when channels are fully allocated. Although the company's design process called for routing constraints to be in place before routing could begin, schedule won the day, and routing began before the signal integrity team was able to assign constraints to the 800 of 1,000 nets on the board that required their attention.
In a remarkable feat of skill and sweat, the design team generated Gerber files on schedule. Much to their credit, they reserved one day for a complete design review with mandatory attendance by everyone on the team. They even invited a few seasoned veterans who had since gone on to jobs in other parts of the company. It was one of these veterans who spotted the problem: a PCI Express differential pair routed next to an Inter-Integrated Circuit (I2C) clock signal. Running at 2.5 Gbps, PCI Express signals swing about 500 mV single-ended in 100 ps for an average edge rate of 3 V/ns 20-80%. He correctly guessed that the edge rate of the I2C clock was lower than this, making it vulnerable to crosstalk from aggressors with higher edge rates, but this information was unavailable on the component datasheet. Anne, the signal integrity engineer assigned to this project, took an action item to acquire IBIS models for the buffer driving the I2C clock, but the vendor required the company to sign a non-disclosure agreement before releasing the IBIS models. The legal process took much longer than the one day allocated for reviewing the design, and the team sent out the Gerber files to the PC board manufacturer. Domino number four.