What is a software catalog, and why should we have one?
Oliver Goldman explains how a software catalog can improve efficiency, defines four requirements for creating one, and shares a popular open-source tool you may be able to use right out of the box.
Modern software systems are complex. They often contain millions of lines of code and hundreds of applications and services. It's a lot to keep track of, and so many teams don't. But when you don't keep track of your software portfolio, you may find yourself wastefully recreating what you've already built, missing the opportunity for re-use. Such "needless re-creation" is both a drag on development velocity and a waste of limited resources.
You can identify and promote opportunities for re-use by maintaining a software catalog. A software catalog is just what it sounds like: a list of each service, application, library, and other component in your software portfolio. With a software catalog in hand, it is easy to identify opportunities to reuse what already exists by searching the catalog for what you need.
Generally, creating a software catalog requires four things:
- A system model that describes the types of things in the catalog, and their relationships. A comprehensive model might include applications, services, libraries, and more. A simpler model might focus on, say, just services. The appropriate scope depends on what's in your portfolio.
- A metadata standard to define the metadata collected for each entry. Basic metadata, such as a name for the entry, the name of the contract, and the code repository can apply to every entry. Other metadata, such as perhaps programming language or API endpoints, might vary based on the entry's type.
- An entry for each item that you want to catalog. The format of these entries will depend on your repository, but could, for example, be a YAML file that's checked in along with the component's source code.
- A repository that stores the entries and supports browsing and search capabilities. Your repository will define how entries are authored, added, updated, and so on.
A good system model will capture different types of relationships, the most important being "part of" and "depends on". The "part of" relationships form a logical hierarchy of entries that typically correspond to the structure of the software itself. For example, consider a code library that is compiled into a service; the entry for the library will have a "part of" relationship on the entry for the service.
Whereas the "part of" relationship captures a build-time relationship, "depends on" captures runtime dependencies. Examples include applications that depend on services they invoke, and services that depend on data stores they use for persistence. When you're considering re-using an existing component, understanding and evaluating its dependencies is just as important as knowing what it does.
While that might sound like a lot of effort, there's no need to start from scratch. Backstage (https://backstage.io) is a popular open-source software catalog implementation that includes all four elements above and is a viable "out of the box" solution for many projects. Teams can also use more generic content repositories, such as Confluence or SharePoint, to manage a software catalog.
The core use case for a software catalog is to answer a simple question: Do we already have a component that does [x]? But once you've built a software catalog, you can use it to gain additional insight into your portfolio. For example, if you record metadata about which programming languages are used by each cataloged item, you can easily report on the use and adoption of those languages throughout your portfolio. If you record the data store used by each service—DynamoDB, MySQL, and so on—you can report on the adoption of those different data stores by name or type. And so on.
Most software development organizations are always striving to produce more with less. In the pursuit of velocity, teams won't always take the time to evaluate whether the capability they need already exists—and as a result, will sometimes needlessly re-create software that's already part of their portfolio. While establishing a software catalog takes time and effort, it can pay for itself by identifying and facilitating reuse. As an added benefit, it helps provide the team with greater insight into the nature and shape of their software portfolio.