This book is a unique and essential reference that focuses upon the reading and comprehension of existing software code. While code reading is an important task faced by the vast majority of students, it has been virtually ignored as a discipline by existing references. The book fills this need with a practical presentation of all important code concepts, form, structure, and syntax that a student is likely to encounter. The concepts are supported by examples taken from real-world open source software projects. The focus upon reading code (rather than developing and implementing programs from scratch) provides for a vastly increased breadth of coverage.



Downloads

CD Contents

Download the complete source code base (134 MB) for the book: Code Reading: The Open Source Perspective.



Sample Content

Downloadable Sample Chapter

Download the Sample Chapter related to this title.

Sample Pages

Download the sample pages (includes Chapter 2 and Index)

Foreword.

Preface.

1. Introduction.

Why and How to Read Code.

Code as Literature.

Code as Exemplar.

Maintenance.

Evolution.

Reuse.

Inspections.

How to Read this Book.

Typographical Conventions.

Diagrams.

Exercises.

Supplementary Material.

Tools.

Outline.

The Great Language Debate.

2. Basic Programming Elements.

A Complete Program.

Functions and Global Variables.

While, Conditions, Blocks.

Switch.

For.

Break, Continue.

Character and Boolean Expressions.

Goto.

Refactoring in the Small.

Do, Integer Expressions.

Control Structures Revisited.

3. Advanced C Data Types.

Pointers.

Linked Data Structures.

Dynamic Allocation of Data Structures.

Call by Reference.

Data Element Access.

Arrays as Arguments and Results.

Function Pointers.

Pointer as an Alias.

Pointers and Strings.

Direct Memory Access.

Structures.

Grouping Together Data Elements.

Returning Multiple Data Elements from a Function.

Mapping the Organization of Data.

Programming in an Object-Oriented Fashion.

Unions.

Efficient Use of Storage.

Implementing Polymorphism.

Accessing Different Internal Representations.

Dynamic Memory Allocation.

Managing Free Memory.

Structures with Dynamically-Allocated Arrays.

Typedef Declarations.

4. C Data Structures.

Vector.

Matrix and Table.

Stack.

Queue.

Map.

Hash Tables.

Set.

Linked List.

Tree.

Graph.

Node Storage.

Edge Representation.

Edge Storage.

Graph Properties.

Hidden Structures.

Other Representations.

5. Advanced Control Flow.

Recursion.

Exceptions.

Parallelism.

Hardware and Software Parallelism.

Control Models.

Thread Implementations.

Signals.

Nonlocal Jumps.

Macro Substitution.

6. Tackling Large Projects.

Design and Implementation Techniques.

Project Organization.

The Build Process and Makefiles.

Configuration.

Revision Control.

Project-Specific Tools.

Testing.

7. Coding Standards and Conventions.

File Names and Organization.

Indentation.

Formatting.

Naming Conventions.

Programming Practices.

Process Standards.

8. Documentation.

Documentation Types.

Reading Documentation.

Documentation Problems.

Additional Documentation Sources.

Common Open-Source Documentation Formats.

9. Architecture.

System Structures.

Centralized Repository and Distributed Approaches.

Data-Flow.

Object-Oriented.

Layered.

Hierarchies.

Slicing.

Control Models.

Event-Driven Systems.

System Manager.

State Transition.

Element Packaging.

Module.

Namespace.

Object.

Generic Implementation.

Abstract Data Type.

Library.

Process and Filter.

Component.

Data Repository.

Architecture Reuse.

Frameworks.

Code Wizards.

Design Patterns.

Domain-Specific Architectures.

10. Code-Reading Tools.

Regular Expressions.

The Editor as a Code Browser.

Code Searching With Grep.

Locating File Differences.

Roll your Own Tool.

The Compiler as a Code-Reading Tool.

Code Browsers and Beautifiers.

Run-Time Tools.

Non-software Tools.

11. A Complete Example.

Overview.

Attack Plan.

Code Reuse.

Testing and Debugging.

Documentation.

Observations.

Appendix A. Outline of the Code Provided.

Appendix B. Source Code Credits.

Appendix C. Referenced Source Files.

Appendix D. Source Code Licenses.

BSD.

ACE.

Apache.

DemoGL.

OpenCL.

ArgoUML.

Perl.

Appendix E. Maxims for Reading Code.

Bibliography.

Index.

Author Index. 0201799405T01162003

Preface

What do we ever get nowadays from reading to equal the excitement and the revelation in those first fourteen years? -- Graham Greene

The reading of code is likely to be one the most common activities of a computing professional, yet it is seldom taught as a subject, or formally used as a method for learning how to design and program. One reason for this sad situation may have been the lack of high-quality code to read. Companies often protect source code as a trade secret and rarely allow others to read, comment, experiment, and learn from it. In the few cases where important proprietary code was allowed out of a company's closet, it has spurred enormous interest and creative advancements. As an example, a generation of programmers benefited from John Lions' Commentary on the UNIX Operating System that listed and annotated the complete source code of the sixth edition UNIX kernel. Although Lions' book was originally written under a grant from AT&T for use in an operating system course and was not available to the general public, copies of it circulated for years as bootleg nth-generation photocopies.

In the last few years however, the popularity of open-source software has provided us with a large body of code that we can all freely read. Some of the most popular software systems used today, such as the Apache Web server, the Perl language, the gnu/Linux operating system, the BIND domain-name server, and the sendmail mail-transfer agent are in fact available in open-source form. I was thus fortunate to be able to use open-source software, such as the above to write this book as a primer and reader for software code. My goal was to provide background knowledge and techniques for reading code written by others. By using real-life examples taken out of working, open-source projects I tried to cover most concepts related to code that are likely to appear before a software developer's eyes including programming constructs, data types, data structures, control flow, project organization, coding standards, documentation, and architectures. A companion title to this book will cover interfacing, and application-oriented code including the issues of internationalization and portability, the elements of commonly used libraries and operating systems, low-level code, domain-specific and declarative languages, scripting languages, and mixed language systems.

This book is--as far as I know--the first one to exclusively deal with code-reading as a distinct activity, one worthy on its own. As such I am sure that there will be inevitable shortcomings, better ways some of its contents could have been treated, and important material I have missed. I firmly believe that the reading of code should both be properly taught, and used as a method for improving one's programming abilities. I therefore hope that this book will spur interest to include code reading courses, activities, and exercises into the computing education curriculum so that in a few years our students will learn from existing open-source systems, just as their peers studying a language learn from the great literature.

Supplementary Material

Many of the source code examples provided come from the source distribution of NetBSD. NetBSD is a free, highly portable UNIX-like operating system available for many platforms, from 64-bit AlphaServers to handheld devices. Its clean design and advanced features make it an excellent choice for both production and research environments. I selected NetBSD over other similarly admirable and very popular free UNIX-like systems such as GNU/Linux, FreeBSD, and OpenBSD, because the primary goal of the NetBSD project is to emphasize correct design and well written code thus making it a superb choice for providing example source code. According to its developers, some systems seem to have the philosophy of "if it works, it's right" whereas NetBSD could be described as "it doesn't work unless it's right." In addition, some other NetBSD goals fitted particularly well with the objectives of this book. Specifically, the NetBSD project avoids encumbering licenses, provides a portable system running on many hardware platforms, interoperates well with other systems, and conforms to open systems standards as much as is practical. The code used in this book is a (now historic) export-19980407 snapshot. A few examples refer to errors I found in the code; as the NetBSD code continuously evolves, presenting examples from a more recent version would mean risking that those realistic gems would have been corrected.

I chose the rest of the systems I used in the book's examples for similar reasons: code quality, structure, design, utility, popularity, and a license that would not make my publisher nervous. I strived to balance the selection of languages, actively looking for suitable Java and C++ code. However, where similar concepts could be demonstrated using different languages I chose to use C as the least common denominator.

I sometimes used real code examples to illustrate unsafe, non-portable, unreadable, or otherwise condemnable coding practices. I appreciate that I can be accused of disparaging code that was contributed by its authors in good faith to further the open-source movement and to be improved upon rather than be merely criticized. I sincerely apologize in advance if my comments cause any offence to a source code author. In defense I argue that in most cases the comments do not target the particular code excerpt, but rather use it to illustrate a practice that should be avoided. Often the code I am using as a counter example is a lame duck, as it was written at a time where technological and other restrictions justified the particular coding practice, or the particular practice is criticized out of the context. In any case, I hope that the comments will be received good-humouredly, and openly admit that my own code contains similar, and probably worse, misdeeds.

0201799405P01162003

Foreword

We're programmers. Our job (and in many cases our passion) is to make things happen by writing code. We don't meet our user's requirements with acres of diagrams, with detailed project schedules, with four-foot-high piles of design documentation. These are all wishes--expressions of what we'd like to be true. No, we deliver by writing code: code is reality.

So that's what we're taught. Seems reasonable. Our job is to write code, so we need to learn how to write code. College courses teach us to to write programs. Training courses tell us how to code to new libraries and APIs. And that's one of the biggest tragedies in the industry.

Because the way to learn to write great code is by reading code. Lots of code. High-quality code, low-quality code. Code in assembler, code in Haskell. Code written by strangers ten thousand miles away, and code written by ourselves last week. Because unless we do that, we're continually reinventing what has already been done, repeating both the successes and mistakes of the past.

I wonder how many great novelists have never read someone else's work, how many great painters never studied another's brush strokes, how many skilled surgeons never learned by looking over a colleague's shoulder, how many 767 captains didn't first spend time in the copilot's seat watching how it's really done.

And yet that's what we expect programmers to do. "This week's assignment is to write. ..." We teach developers the rules of syntax and construction, and then we expect them to be able to write the software equivalent of a great novel.

The irony is that there's never been a better time to read code. Thanks to the huge contributions of the open-source community, we now have gigabytes of source code floating around the 'net just waiting to be read. Choose any language, and you'll be able to find source code. Select a problem domain, and there'll be source code. Pick a level, from microcode up to high-level business functions, and you'll be able to look at a wide body of source code.

Code reading is fun. I love to read others' code. I read it to learn tricks and to study traps. Sometimes I come across small but precious gems. I still remember the pleasure I got when I came across a binary-to-octal conversion routine in PDP-11 assembler that managed to output the six octal digits in a tight loop with no loop counter.

I sometimes read code for the narrative, like a book you'd pick up at an airport before a long flight. I expect to be entertained by clever plotting and unexpected symmetries. Jame Clark's gpic program (part of his GNU groff package) is a wonderful example of this kind of code. It implements something that's apparently very complex (a declarative, device-independent picture-drawing language) in a compact and elegant structure. I came away feeling inspired to try to structure my own code as tidily.

Sometimes I read code more critically. This is slower going. While I'm reading, I'm asking myself questions such as "Why is this written this way?" or "What in the author's background would lead her to this choice?" Often I'm in this mode because I'm reviewing code for problems. I'm looking for patterns and clues that might give me pointers. If I see that the author failed to take a lock on a shared data structure in one part of the code, I might suspect that the same might hold elsewhere and then wonder if that mistake could account for the problem I'm seeing. I also use the incongruities I find as a double check on my understanding; often I find what I think is a problem, but it on closer examination it turns out to be perfectly good code. Thus I learn something.

In fact, code reading is one of the most effective ways to eliminate problems in programs. Robert Glass, one of this book's reviewers, says, "by using (code) inspections properly, more than 90 percent of the errors can be removed from a software product before its first test. In the same article he cites research that shows "Code-focused inspectors were finding 90 percent more errors than process-focused inspectors." Interestingly, while reading the code snippets quoted in this book I came across a couple of bugs and a couple of dubious coding practices. These are problems in code that's running at tens of thousands of sites worldwide. None were critical in nature, but the exercise shows that there's always room to improve the code we write. Code-reading skills clearly have a great practical benefit, something you already know if you've ever been in a code review with folks who clearly don't know how to read code.

And then there's maintenance, the ugly cousin of software development. There are no accurate statistics, but most researchers agree that more than half of the time we spend on software is used looking at existing code: adding new functionality, fixing bugs, integrating it into new environments, and so on. Code-reading skills are crucial. There's a bug in a 100,000-line program, and you've got an hour to find it. How do you start? How do you know what you're looking at? And how can you assess the impact of a change you're thinking of making?

For all these reasons, and many more, I like this book. At its heart it is pragmatic. Rather than taking an abstract, academic approach, it instead focuses on the code itself. It analyzes hundreds of code fragments, pointing out tricks, traps and (as importantly) idioms. It talks about code in its environment and discusses how that environment affects the code. It highlights the important tools of the code reader's trade, from common tools such as grep and find to the more exotic. And it stresses the importance of tool building: write code to help you read code. And, being pragmatic, it comes with all the code it discusses, conveniently cross-referenced on a CD-ROM.

This book should be included in every programming course and should be on every developer's bookshelf. If as a community we pay more attention to the art of code reading we'll save ourselves both time and pain. We'll save our industry money. And we'll have more fun while we're doing it.

Dave Thomas
The Pragmatic Programmers, LLC
http://www.pragmaticprogrammer.com

Index

Download the Index file related to this title.



Updates

Errata

Click for the Errata related to this title.

Submit Errata



More Information



InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address

Code Reading: The Open Source Perspective

Book

About

Features

Description