Introduction to a Signal Integrity Engineer's Companion
- 1.1 Life Cycle: The Motivation to Develop a Simulation Strategy
- 1.2 Prototyping: Interconnecting High-Speed Digital Signals
- 1.3 Pre-emphasis
- 1.4 The Need for Real-Time Test and Measurement
- Conclusion
An engineer's companion is like any other companion: It's a fellow traveler and colleague who offers advice and support, sharing experiences with a friend, in what would otherwise be a solitary journey. Our journey will take us through the endeavors of embedded system design, simulation, prototype development, and test. The dangers are signal reflections, attenuation, crosstalk, unwanted ground currents, timing errors, electromagnetic radiation, and a host of other signal integrity (SI) issues. SI engineering is a relatively new branch of electronics engineering. For the most part, it relates to the analog factors that affect both the performance and reliability of modern high-speed digital signals and systems. In general, integrity has to do with truthfulness. When applied to digital electronics, such as communication and computer systems, it is specifically about signal accuracy and system reliability.
Although it is written for SI engineering professionals, this book is intended to support new engineers and students who have an interest in designing, simulating, and developing modern high-performance digital and embedded systems. Along with the central theme of how to think about SI engineering, this book includes practical guidance on how to achieve and interpret a simulation or real-time test and measurement. In many jobs within the SI industry, a technician, engineer, or designer could be anyone who thinks about the reliability or operation of a modern embedded system. This book aims to address the concerns and uncertainties faced by these people. Because this book was written with a wide audience in mind, each topic is presented with a prerequisite theoretical or practical preamble that supports novice engineers and students and that can often be omitted by practicing engineers. This chapter keeps with the format of this book, in that it follows the development of an embedded system, from its simulation to prototyping and real-time test.
We live in the digital age, in which providers of telephony, computing, and broadcast systems are busily facilitating media convergence. Music, video, and information systems must be transparent to the newly integrated communication and computing systems. Consequently, consumers expect their modern high-performance telephony and computing systems to interactively communicate the latest news and entertainment while providing e-business transactions. Driving forces such as media convergence are challenging digital designers to work in an era of reasonably priced, high-performance, highly dependable digital systems that typically are portable and generally are required to have worldwide compliance. Today the issues of interoperability are paramount where modern systems and components from disparate manufacturers are often required to work together seamlessly. Moreover it's widely documented that communication and computing systems double in computational throughput every eighteen months. This implies that any absolute frequency or data rate quoted in any SI book will be out of date by the time the book is published—and this book is no exception. Nevertheless, it is anticipated that, similar to most of the books in this SI engineering series, the fundamental practical examples, guidance, and underlying theoretical concepts will remain relevant for many years to come. Today's cutting-edge digital interfaces will become the bread and butter of tomorrow's digital systems.
1.1 Life Cycle: The Motivation to Develop a Simulation Strategy
Most signal integrity engineers would agree that the primary motivation for simulating chip-to-chip networks is to maximize the probability that those networks will function flawlessly on first power-up. Another compelling motivation is easy to overlook under the intense pressure of time to market: understanding operating margins (see Figure 1-1). It is tempting to stitch together IO circuit and interconnect models, run the simulations, check the results, and be done with the exercise. This may prove that the network will function under a given set of conditions, but will it continue to function reliably over the range of manufacturing and operating conditions it will encounter during the product's useful life? What are the expected primary failure mechanisms, and how do they interact with one another?
Figure 1-1 Operating margin.
These questions are indeed of primary importance, and it may be possible to answer them given unlimited resources and time. Unfortunately, most signal integrity engineers operate under somewhat different conditions. A contemporary PC board design may have upwards of a thousand nets that belong to two or three dozen different buses. The power spectrum will likely have significant content above 5 GHz. Supplying power and cooling to high-performance processors can place challenging constraints on layout and routing. On top of these technical challenges, the customer may require that the product be ready for manufacturing in a time period that severely stresses the team's ability to carry out the level of analysis required to ensure reliable operation. This is the irony of the business: the relevant physical effects become ever more difficult and expensive to analyze, while the market relentlessly exerts downward pressure on cost and schedule. These two freight trains are running full speed toward each other on the same track.
Given these technical and business challenges, is it still possible to achieve the goal of reliable operation of a system filled with dozens of digital IO buses over the product's lifetime? At times it may appear that the solution to this difficult problem is the empty set. In the heat of battle, the level of complexity can be so overwhelming that it seems impossible to satisfy all the constraints simultaneously. Nevertheless, it is the opinion of the authors that it is possible to successfully manage the signal integrity of a complex contemporary design if the lead engineers keep two important principles in mind. First, the signal integrity engineer must be involved at the very beginning of the design cycle, when the team is making critical architectural decisions and selecting component technology. Second, the team must develop a comprehensive simulation and measurement strategy that applies the appropriate level of analysis to each bus in the system.
1.1.1 The Benefits of Early Teamwork
It must be tempting for those who make weighty architectural decisions during the earliest stages of a new product to avoid addressing implementation details. After all, it is easier to define a product while cruising along at the top of the troposphere. The product must offer innovative, distinguishing features. It must also offer Earth-shattering performance at a lower cost than the competition. Most important, it must be ready for manufacturing before the infamous marketing window closes. If this is a consumer product, this means having the product shrink-wrapped and on the shelves for Christmas shoppers. These are all difficult goals to achieve, even without having to worry about how to manufacture the product and make it reliable.
As any experienced engineer will attest, usually a price must be paid for making architectural decisions without input from those whose job it is to implement the architecture. Probably the worst possible scenario is a product that is marginally functional, because the marginal part does not become apparent until production is in full swing. This is even more treacherous than a product that overruns its budget, misses its milestones, and never makes it to market. Assembly lines come down. Companies—more than one—lose large quantities of money each day. There may be recalls. There will certainly be redesigns under intense pressure. At the end of the whole experience lies a painful loss of reputation. This is certainly a scenario that no Vice President of Technology or Chief Financial Officer would choose to put in motion if the choice were made clear. One thing is crystal clear: the company greatly enhances it chances of success by building a cross-disciplinary team in the project's early architectural phases. A minimum team would include a board layout designer, firmware programmer, and engineers from the disciplines of logic, mechanical, thermal, power, manufacturing, electromagnetic compliance, and signal integrity. A large company may have separate engineers to represent each of these disciplines, while in most other companies one person wears several hats. In either case, it is important that each discipline be represented on the team.
As an illustration of the importance of early teamwork, consider the following fictional scenario that is close enough to reality to be disturbing. Project X took off like a rocket from the very beginning. Conceived in a boardroom behind closed doors, it was already well-defined before anyone in development engineering heard of it. The senior engineering staff voiced their opinions about the unrealistic schedule, especially during a time when other high-profile projects were consuming most of the company's resources. However, commitments to customers had already been made, a marketing plan was in the works, finances were allocated, and the wheels were in motion—the first domino in a sequence of related events.
The second domino was in place before PC board placement and routing began. The Vice President of Engineering had defined a budget that was consistent with the price point that Marketing deemed competitive. This budget called for a four-layer PC board: signal-ground-power-signal. Upon seeing the form factor for the first time, the PC board designer expressed concern about routability in certain areas he perceived to be bottlenecks. The lead signal integrity engineer expressed concern about high forward crosstalk due to microstrip transmission lines and high edge rates in the PCI Express (PCIe) signals. However, neither the board designer nor the signal integrity engineer could prove that a solution did not exist, so the design progressed as planned.
An exceptionally aggressive schedule became domino number three. To meet the schedule, the board design shop had two of their top designers work back-to-back twelve-hour shifts for three weeks, with one day off on the weekends. While it is true that an auto-router can save many hours of human labor, there is no substitute for an experienced designer when channels are fully allocated. The company's design process called for routing constraints to be in place before routing could begin. Again, schedule won the day. Routing began before the signal integrity team could assign constraints to the 800 nets on the board that required attention out of a total of 1,000 nets.
In a remarkable feat of skill and sweat, the design team generated Gerber files on schedule. Much to their credit, they reserved one day for a complete design review, with mandatory attendance by everyone on the team. They even invited a few seasoned veterans who had since gone on to jobs in other parts of the company. One of these veterans spotted the problem: a PCIe differential pair routed next to a Gigabit Media Independent Interface (GMII) clock signal. The edge rate of the 2.5 gigabits per second (Gbps) PCIe signal was clearly defined in the specification as 10 V/ns. The edge rate of the 125 MHz GMII clock was only 1 V/ns, making it vulnerable to crosstalk from aggressors with higher edge rates, but this information was unavailable in the component datasheet. The signal integrity engineer took an action item to acquire an IBIS datasheet for both the PHY chip and the IO controller. She was successful for the PHY chip, but the vendor for the IO controller required the company to sign a nondisclosure agreement before releasing the IBIS datasheet. This legal process took much longer than the one day allocated for reviewing the design, and the team sent the Gerber files to the PC board manufacturer. Domino number four.
The first set of boards came back from assembly and performed admirably on the benchtop. The software team did their job of loading new code, debugging, recompiling, and loading again. The hardware team did their job of measuring thermal characteristics, calculating power draw, dumping registers, and capturing traces for bus transactions on the logic analyzer. Knowing that there was exposure to crosstalk problems, the signal integrity engineer spent a lot of time probing the board, looking for them. She found some quiet line noise on the order of 200 to 300 mV, but the nets had enough noise and timing margin to tolerate it. After several weeks of intense work, the development team gave management the green light for production, and each team member spent a few days contributing to the final report before moving on to the next project.
Unbeknownst to the heroes of our story, the fifth and final domino was starting to fall at a semiconductor manufacturing plant on the other side of the planet. Although the IO controller chip had been in production for two years and the process had remained stable for most of this time, a recent drop in yield prompted the process engineers to make some tweaks that ultimately resulted in slightly lower edge rates. The IBIS datasheet for the IO controller contained edge rate information, but the component datasheet did not. The manufacturer did not feel a need to notify its customers, because all specified parameters remained within their limits.
Shortly after these new components hit the assembly floor, the boards began experiencing high levels of fallout in the form of intermittent failures that were associated with traffic on the PCI Express and Gigabit Ethernet buses. When a few of the boards made it to customer installations, the crisis was officially under way. Management told the team to drop what they were doing and focus on resolving the crisis. One sleepless night, the neurons in the brain of the signal integrity engineer rewired themselves to remind her of the comment during the design review about crosstalk between PCI Express data and the GMII clock. The next day she decided to probe these signals, using a logic analyzer to trigger the oscilloscope when both were active at the same time. This did not happen often, because the two buses were asynchronous to each other. After three days of persistent testing she captured a waveform that showed a slope reversal—double clocking—right in the threshold region of the GMII clock receiver on the PHY chip and coincident with a transaction on the PCI Express bus.
Not many pleasant alternatives presented themselves to the team. It was not possible to slow down a PCI Express signal and still expect the bus to function. While an FET switch would sharpen the edge of the GMII clock signal, the IO controller was in a BGA package, and there was no way to install the FET switch between the pin and the net. In the end, it was decided to stop production and rush a new six-layer PC board through design.
During the lull between releasing the new design and shipping the new boards, upper management held an all-day process review. The veteran engineer from the original design review retraced the trail of circumstances that led to the failure. First and most important, the signal integrity team did not have a representative at the table when the architecture was being defined. He pointed out that a cross-disciplinary design team is the cornerstone of a healthy, solid design process. Second, there was no cost-performance analysis of the PC board stack-up. This is admittedly one of most difficult things any engineer has to do. There was no excuse for the third contribution to the failure; schedule should never trump assigning design constraints unless the risk of nonfunctional hardware is deemed acceptable. Item number four was an unavailable model. This is understandable, because high-quality models are hard to come by, but it is possible to have the necessary models when they are needed by planning ahead. Finally, if a design is sensitive to edge rate, the component specification should call out edge rate as a parameter, because it is not possible to anticipate the evolution of a silicon fabrication process.
The recommendations that came from the design process review were to establish a cross-disciplinary team for all new designs and to develop a comprehensive process for applying the appropriate level of analysis to each net in a PC board or system design.
Most of the time, circumstances do not present themselves to us in such an obvious, logical progression. Only careful retrospection reveals the sequence of events that led to a particular conclusion. To some degree, the engineer's job is to play the role of the prophet who can foresee these circumstances and avoid them without becoming the Jeremiah to whom nobody pays much attention.
1.1.2 Defining the Boundaries of Simulation Space
Avoiding situations like the one just described requires a sound understanding of the physics involved. You also need the insight to know what level of analysis is required for a given net or bus. Although it is certainly possible to acquire models for a thousand nets and simulate every one of them before releasing a design to manufacturing, the company that practices such a philosophy may not be in business very long. It would appear that one of the more critical tasks of designing any digital IO interface is establishing the boundaries of simulation space. These are the criteria you use to decide whether a net needs to be simulated or whether some other method of analysis is more appropriate. Make no mistake: simulation is expensive and should be used only when there are strong economic and technical motivations for doing so. Once this critical question of whether to simulate is answered, you can go about the tasks of actually running the simulations and interpreting their results.
An excellent way to begin the decision-making process is to compile a comprehensive list of all nets in the design and some relevant information associated with each bus. This begs for a definition of a bus, which is a word that is frequently used but seldom defined. For the purposes of this discussion, a bus is defined as a collection of data and control signals that have a common functional purpose and are synchronized to the same clock or strobe signal.
One example of a bus is the traditional 33 MHz PCI bus. It is composed of 32 address-data signals and a set of control signals, each synchronized by a common clock signal that originates from a clock source chip. DDR memory is a source-synchronous bus in which the transmitting chip sends a clock or strobe signal along with the data. The source synchronous bus facilitates faster data rates by eliminating skew between multiple copies of the same clock and transmitter launch time from the timing budget (see Chapter 4, "DDR2 Case Study"). PCI Express is another example of a bus, although it is not necessary for each chip on a given PCI Express bus to share the same reference clock. PCI Express uses a clock-data recovery circuit. This means the receiving chip uses the same low-frequency reference clock as the transmitting chip. It boosts the clock to the data rate and infers the optimum sample point from the incoming data stream.
The analysis decision matrix shown in Table 1-1 allows the signal integrity engineer to view all the relevant electrical parameters of each bus at the same time. The engineer also can decide what level of analysis is necessary to ensure that each bus functions reliably over the product's lifetime. The simplest case might be a bus that a trusted colleague has analyzed in the past and that others have used successfully time and again. In this case no simulation is required—provided that all of the bus's electrical parameters are identical to the ones that were analyzed in the past. The next, more complicated case is the bus for which a designers' guide or specification exists. If a third party analyzed the bus and published a set of rules that, when followed, guarantee sufficient operating margins, simulation is not necessary. The job of the signal engineer defaults to describing design constraints to the CAD system and checking that they are met. Of course, the reliability of the source must be beyond reproach! Some buses may not require simulation but do require rudimentary hand calculation, such as the value of termination resistors, stub length as a function of rise time, or RC time constant of a heavily loaded reset net. Finally, if a bus passes through each of these three filters, it is time to assemble the models and fire up the simulator. The closer this process occurs to the beginning of the project, the higher the likelihood of success. The bus parameter spreadsheet should include the following items:
Table 1-1. Analysis Decision Matrix
Parameter |
I2C |
PCI-X |
DDR2 |
PCIe |
Units |
Engineer |
|||||
Net count |
|||||
Data rate |
Gbps |
||||
IO power supply voltage |
V |
||||
IO circuit technology |
|||||
Input setup time |
ps |
||||
Input hold time |
ps |
||||
Input minimum edge rate |
V/ns |
||||
Input high threshold |
V |
||||
Input low threshold |
V |
||||
Output rise time |
ps |
||||
Output fall time |
ps |
||||
Output maximum edge rate |
V/ns |
||||
Output impedance |
ohm |
||||
Output high level |
V |
||||
Output low level |
V |
||||
Pin capacitance |
pF |
||||
System clock skew and jitter |
ps |
||||
Net characteristic impedance |
ohm |
||||
Termination |
ohm |
||||
Maximum net length |
in. |
||||
Number of loads |
It's helpful if the signal integrity engineer and the person who draws the schematics can agree on a naming convention that involves adding a prefix to the net name of each net in a bus. This will facilitate tracking coverage of all nets in a design. Someone who is good with programming can write a simple script that sorts and counts the nets from the "all nets" file, the best source of which is either the schematic entry or layout CAD tool. The goal is to make a decision about each net in each bus in the system: what level of analysis does it require? You can then keep track of the number of nets simulated and constrained and have an up-to-date measure of how close a project is to completion. Totaling the net count column will give you an excellent rough estimate of the work involved at the beginning of the project. Keeping track of who is analyzing each bus prevents unpleasant revelations toward the end of a project, such as "I thought so-and-so was working on that bus!"