SKIP THE SHIPPING
Use code NOSHIP during checkout to save 40% on eligible eBooks, now through January 5. Shop now.
Register your product to gain access to bonus material or receive a coupon.
The state of the art in fault-tolerant RAM development and production.
Next-generation electronic devices require advanced new nanofabrication CMOS technologiesand, in these environments, today's processing techniques simply will not produce adequate yields. To improve RAM reliability without compromising performance, cost, or space requirements, engineers are turning to advanced fault-tolerant techniques. In this book, Kanad Chakraborty and Pinaki Mazumder survey the latest research and field-proven techniques for every form of memory fault tolerance, including manufacturing, online, and field-related fault tolerance. Coverage includes:
Chakraborty and Mazumder focus on practical circuit and design solutions, presenting extensive illustrations and explaining device physics and circuit design theory in a reader-friendly manner. They also provide a compendium of more than 500 research papers on memory fault tolerance and reliability. Whether you're a design engineer, test engineer, manufacturer, or researcher, this is a comprehensive resource for building next-generation RAM with next-generation reliability.
Modern Semiconductor Design Series
Reliability and Fault Tolerance of RAMs
(NOTE: Each chapter begins with an Introduction and concludes with Concluding Remarks and Problems except for Chapter 1.)
Preface.
Acknowledgments.
1. Reliability and Fault Tolerance of RAMs.
Impact of Scaling on Reliability. Defects, Faults, Errors, and Reliability. Reliability and Quality Testing and Measurement. Reliability Characterization. Reliability Prediction Procedures. Reliability Simulation Tools. Mechanisms for Permanent Device Failure. Safeguarding against Failures.
Diagnosis Algorithms. Repair Algorithms. Reconfiguration Techniques. Repair Using Flash Eeprom Switches. Flexible Redundancy. Built-In Self-Diagnosis and Self-Repair. Built-In Redundancy Analysis. Built-In Self-Repair Architectures.
Particles Causing Single-Event Effects. Some Definitions. Basic Mechanisms for Nondestructive Single-Event Effects. RAM Device Operation. Critical Charge and Soft Error Rate. Techniques Used for Mitigation of Single-Event Upsets. Experiments. Modeling and Simulation of Charge Collection. Basic Mechanisms for Destructive Single-Event Effects.
Theory of Error-Correcting Codes. Fault-Tolerant Design Techniques for RAMs. Ecc Implementations. Memory Reliability Evaluation through Error Correction. Simulation of Memory Reliability and Fault Tolerance.
Yield Models. Yield Loss Mechanisms. Importance of Clustering Models. Critical Area Simulation and Yield Calculation. Effect of Redundancy and Error Correction on Yield. Effect of Defect Density on Yield. Effect of Defect Characteristics on Yield. Effect of Device Scaling on Yield. Relationship between Yield and Reliability.
Embedded RAMs. Built-In Self-Repairable Embedded RAM Physical Design. Fault Modeling Based on Inductive Fault Analysis. Circuit Implementation. Characterization of a Custom Design Tool. Multiobjective Optimization Approach for RAM Design. Floorplanning of Parametrized Rectangular Macrocells. BIST/BISR for Other Types of Memories.
This book deals with the study of fault-tolerance and reliability techniques for semiconductor random-access memories. Topics in this book include: reliability testing and prediction; diagnosis, repair and reconfiguration; single-event effects and their mitigation; use of error-correcting codes; yield analysis; and physical design issues for built-in self-repairable embedded RAMs. This book is written primarily for academic researchers and practicing engineers working in design and test of high-density random-access memories (RAMs) of the twenty-first century. It provides useful exposure to readers on state-of-the-art diagnosis, repair, redundancy, hardening, and error correction schemes for RAMs. The book may also be used as a supplementary text for undergraduate and graduate courses on VLSI fault tolerance and reliability.
Presently, application-specific integrated circuits (ASICs) and high-performance microprocessors such as Itanium and Compaq Alpha processors use a total of almost 75% of chip real estate for accommodating various types of embedded memories. For example, the Compaq Alpha EV7 chip shown on the front cover employs 135 million transistors for RAMs alone, while the entire chip has 152 million transistors. As the integration level increases to nearly 1 billion transistors within a decade or so, as projected by the Semiconductor Industry Association (SIA) Roadmap, the relative silicon area occupied by embedded memories will tend to be 97% and even more. The ever-increasing need for myriad memory blocks within a VLSI chip with a view to improving the system throughput through larger caches and multilevel caches, indicates that the reliability of a complex VLSI chip will depend largely on the reliability of these embedded memory blocks. With device dimensions moving rapidly toward the ultimate physical limits of device scaling, which is in the regime of feature sizes of 50 nm or so, a host of complex failure modes are expected to occur in memory circuits. The goal of this book is to establish the need for appropriate fault-tolerant and reliable design techniques that cover the entire spectrum of chip design, from system architectures to nanofabrication. We discuss all these techniques in a systematic manner. Future generations of giant VLSI circuits could be manufactured with lower cost and have higher field reliability if these fault-tolerance and reliability techniques were to be incorporated while building embedded memories. Readers of the book will discover with us that for the highest levels of reliability and fault tolerance of such memories in field application, soft error correction and scrubbing are not adequate, since leakage currents produced by deep-submicron process technologies and exacerbated by energetic ions in terrestrial and space environments can cause hard errors to accumulate over time. For reliable operation, such errors need to be repaired in the field using built-in self-repair, the importance of which is growing every day.
The book is organized as follows.
Chapter 1 establishes the need for quality and reliability testing and prediction and describes the mechanisms underlying hard and soft failures. The impact of scaling on reliability has been explained, models for predicting reliability have been described, and techniques for safeguarding against failures and achieving fault tolerance, are discussed.
Chapter 2 deals with manufacturing fault tolerance and examines the work that has been done for the past two decades on diagnosis, repair and reconfiguration of RAMs. We describe diagnosis algorithms, repair algorithms, reconfiguration techniques, repair using flash EEPROM switches, flexible redundancy, built-in self-diagnosis (BISD) and built-in self-repair (BISR), built-in redundancy analysis (BIRA), and case studies of BISR architectures.
Chapter 3 describes radiation-induced single-event effects and their mitigation techniques geared toward reliability enhancement. The topics examined include particles causing single-event effects, basic mechanisms for nondestructive and destructive single-event effects in RAMs, factors that affect the soft error rate (SER), mitigation and hardening techniques, description of experiments for studying soft error rates and charge collection in memory devices, and modeling and simulation of charge collection. It is shown that radiation can cause not only soft errors but also hard errors, such as single-event gate rupture (SEGR) and single-event burnout (SEB), thereby eventually warranting the need for hard repair and reconfiguration of memory devices.
Chapter 4 introduces the reader to online testing and the techniques used in the implementation of error-correcting codes for RAMs. Such techniques are useful for reliable and fault-tolerant operation during field use. This chapter delves into the theory of error-correction coding (ECC) and describes fault-tolerant design techniques such as bit scattering, sparing, complement/recomplement, consecutive correction and prestorage protection. We also describe ECC implementations (both on-chip and off-chip), and reliability evaluation and simulation of ECC-equipped memory.
Chapter 5 describes yield modeling and analysis techniques for fabrication processes. We describe simple statistical models for yield estimation such as cluster models, yield loss mechanisms, importance of negative binomial cluster models, critical area simulation and yield computation, effects of hardware redundancy, error-correcting codes, defect density, defect characteristics, and device scaling, on yield, and the relationship between yield and reliability. We also describe hardware and software techniques for yield management and improvement.
Chapter 6 describes the issues underlying a structured custom design solution, comprising both circuit design and physical design, for built-in self-testable and self-repairable embedded RAMs. A custom layout generator, BISRAMGEN , has been used to study the characteristics of circuits that would be needed for fast memory access, high bandwidth, and low-overhead (in terms of both area and delay) BIST and BISR. Circuit techniques and BIST/BISR solutions are studied, their usefulness is analyzed, and the ensuing testability, yield, reliability, and cost benefits are investigated. This chapter also includes a new table-driven optimization approach for self-repairable RAM design, and a new algorithm for floorplanning rectangular components of a built-in self-repairable RAM array.
Semiconductor memories, particularly RAMs, have always occupied a very important place in electronic circuits, from memory cards in board-level circuits, and embedded memory modules used in application-specific integrated circuits (ASICs), to microelectronic devices used in spacecraft. Nowadays, large quantitites of embedded RAM cores (including SRAM, DRAM, and flash memories) are being used extensively in systems-on-a-chip (SoCs). The importance of reliability and quality testing, fault tolerance, diagnostic fault coverage, self-repair, reliability and online error correction of such memories is paramount, because embedded memories have pins that are difficult to probe externally for test and repair. These topics are described in Chapters 1, 2, 4, and 6. Accurate analysis of processing yields and effective yield management techniques, described in Chapter 5, are very important in reducing the manufacturing cost and in increasing the field reliability of memory devices. A vast majority of field-related problems nowadays are caused by ionizing radiation, for memory devices used in both spacecraft as well as terrestrial electronics. We describe in Chapter 3 the basic mechanisms for these problems, and the techniques used for mitigating them and hardening memory devices.
An article published last year (September 5, 2001) by Vincent Ratford of Virage Logic Corp., in EE Design (2001 CMP Media Inc.), provides an interesting perspective on BIST and BISR. While BIST has been called the future of SoC technology that will save SoC (also FPGA and ASIC) from the ruin of inferior yields, BISR is being hailed as a substantial cost saver in the near future. Ratford gave a typical example as follows: suppose that a company builds an xDSL modem chip in a 0.18 A m process incorporating 5 Mb of SRAM on an 8 A 8 mm die, and manufactures 1 million units in the first year. Let us further assume an average selling price of $25.00 per unit and a per-unit wafer cost of $2200. The wafer defect density is projected at 0.4 for memory and 0.3 for logic (the greater defect density for memory can be attributed to a higher density of transistors in the memory). Without BIST/BISR, die yield would be approximately 64%, compared to 82% yield with BISR. Also, use of BIST/BISR instead of external testing and repair could produce total cost savings of about $500,000. The yield increase due to BISR alone can create an additional $2.4 million in savings. Such a project, estimated at $25 million, would therefore witness up to 12% cost savings (about $3 million) with BIST and BISR technologies.
With deep-submicron CMOS processing technologies, feature sizes are shrinking below 0.1 A m. In such technologies, static and dynamic RAM devices are operating at much lower supply voltages (e.g., 1 V) and have much smaller capacitances (e.g., a few fF) than in the past. As a result, these memory devices are very vulnerable to radiation-induced problems affecting data storage (described in Chapter 3) and low manufacturing yields (described in Chapters 5 and 6) due to even minor process variations. Therefore, a design engineer would want to learn about state-of-the-art processing and circuit techniques for RAMs that would produce fault tolerance, both at the time of manufacture (i.e., high processing yield) and during field use (i.e., high reliability). A test engineer would be interested in learning about fault diagnosis algorithms that would aid in self-repair, and circuit techniques that would produce practical self-test and self-repair solutions. These topics are described in Chapters 2 and 6. The book focuses on design issues and circuit techniques. The style of presentation is simple and is devoid of intricate details of device physics or circuit design theory. Our objective is to provide guidance to design and test engineers, manufacturers, and researchers on practical ways of implementing high-yielding and high-reliability RAM architectures, without overwhelming them with a lot of theoretical issues.
Each chapter is provided with a comprehensive set of problems designed to stimulate readers to delve into research papers that go beyond the scope of the book. A sample solution to one problem is provided in each chapter. These problems are intended to provide a reinforcing experience to the reader. Most problems are accompanied by hints in the form of pointers to published articles. Also, this book has a lot of illustrations, most of which have been borrowed from recent publications, some with modifications, for improved clarity.
This book presents a compendium of the state-of-the-art literature on diverse aspects of fault tolerance and reliability of random-access memories, spanning about 500 research papers published in the last few decades. Although considerable effort has been invested to make sure that the book is devoid of glaring errors, we do not claim infallibility. The reader is requested to report any error to either or both of us.
Kanad Chakraborty (kanadc@agere.com
), Murray Hill, New Jersey
Pinaki Mazumder (mazum@eecs.umich.edu
), Ann Arbor, Michigan