- Part I: Introduction
- Part II: Software
- Part III: Intellectual Property
- Part IV: Source Code Differentiation
- Part V: Source Code Correlation
- Part VI: Object and Source/Object Code Correlation
- Part VII: Source Code Cross-Correlation
- Part VIII: Detecting Software IP Theft and Infringement
- Part IX: Miscellaneous Topics
- Part X: Past, Present, and Future
Part V: Source Code Correlation
This part starts by exploring the various methods and algorithms for "software plagiarism detection" that have been developed over the last few decades. I describe the origins of these methods and algorithms, and I explain their limitations. In particular, there have been no standard definitions and no supporting theory for this work, so I introduce the theory of source code correlation and definitions for characterizing source code. This characterization of software source code is practical for determining correlation and, ultimately, for determining whether copying occurred. While the theory and definitions are broad enough to be useful in various areas of computer science, they are particularly valuable in litigation.
In this Part I also describe practical implementations of the theory for those programmers who want to understand how to implement the algorithms. Additionally, I describe applications of the theory in the real world. This part is highly mathematical, though the chapter on source code characterization will be useful for lawyers in understanding how elements of software source code can be categorized, how these various elements relate, and how the elements can affect a software copyright infringement, software trade secret, or software patent case.