Platforms
A platform, in the context of model-driven DevOps, is a consolidation and simplification point for the devices in a network. Without platforms, you would need to configure each device individually, making IT much more cumbersome and time-consuming. The best examples of a platform are cloud infrastructure providers like AWS, Azure, and Google. They created platforms that provide abstracted services, such as compute, storage, or networking to an IT organization. No longer does an organization need to think about acquiring hardware, configuring that hardware into a system to support applications, and maintaining that hardware over time. This process is all abstracted as a service and provided via API. For example, a customer does not care what types of nodes are used for their compute service or how to configure them; that customer just wants the service to work. Platforms come in different forms with different capabilities, but they all aim to simplify IT and generally contain many of the attributes we describe in the remainder of this chapter.
Physical Hardware Provisioning
The idea of a platform also extends to the physical, on-premises network. Unlike cloud, where you don’t have to know or care about physical hardware, with on-premises infrastructure, it is a significant concern. To ease the deployment and provisioning of hardware, many platforms support technologies such as “plug and play” or “zero-touch provisioning.” This is also commonly known as Day 0 provisioning, where a minimal configuration is automatically applied to a piece of physical hardware on bootup so that it can communicate with the platform to get a more complete, or Day 1, configuration. This book focuses mainly on Day 1 configuration and Day 2 operations, but it is useful to know that most platforms also ease the Day 0 provisioning of physical hardware.
Consolidated Control Point
Whether it is cloud or on-premises, the main benefit of a platform is the consolidated control point. The idea of a consolidated control point in a network came to prominence with the advent of software-defined networking (SDN). SDN decouples the control plane (the part that decides where to send packets) of a network from the data plane (the part that does the forwarding of the packet). The devices in a pure SDN network have little to no ability to operate autonomously, rendering them inoperable without a central controller. Over time, however, this pure approach largely found equilibrium in devices that can either operate autonomously or as part of a controller-based fabric. A more pragmatic approach to SDN evolved whereby devices in the network operated as part of a distributed control plane with a centralized controller managing network configuration and policy. This approach took the best parts of pure SDN (consolidated control point) and married it with the best parts of distributed control planes (scale and resiliency). In the context of networking, this more pragmatic approach to an SDN controller is what we refer to as a platform.
Northbound vs. Southbound APIs
In the IT infrastructure space, it is often useful to think of platform APIs as either northbound or southbound. A typical controller platform exposes a “northbound” API that is intended to provide functionality for other applications. A good example is the UI for the controller itself. Often the UI for the controller uses this same northbound API to retrieve data and make configuration changes. When a request is received via the northbound API, and the controller software determines that it needs to make changes to one or more devices, it uses a “southbound” API to talk with the various devices. These southbound APIs are specific to whatever API is exposed by the devices in the network. Different device vendors often have different APIs for their devices. As shown in Figure 3-3, a controller platform can consolidate many disparate vendor or device APIs into a single, unified, northbound API.
FIGURE 3-3 Northbound vs. Southbound APIs
API and Feature Normalization
One important role that a platform can play is to normalize the API across a set of dissimilar devices. From our previous example, the platform would perform any data transformations internally and transparently while presenting a single API to the user (see Figure 3-4).
FIGURE 3-4 Platform API Normalization
This normalization greatly reduces the complexity of automating the network by allowing the automation tooling to work on a single data model against a single API. Without this regularization, the tooling would have to do the data model conversion and then call the correct API for each type of device available on the network. Recall that when Bob from ACME Corp realized that each type of device on the network had a different command line, requiring its own custom code, it seemed unmanageable. A platform with the ability to normalize to a single data model with a single API solves this issue.
Platforms further help by normalizing features across many dissimilar devices of varying capabilities. For example, many network devices do not have the capability to roll back configuration changes to a previous state. If state is added to the platform, then the platform can track changes as they occur so that it can return a device’s state back to its configuration before a change occurred. Furthermore, the storing of state in the platform allows for the comparison of what the state of a device should be in case out-of-band changes are made. If the device is out of sync with the platform, then the local change can either be adopted or overridden.
Fabricwide Services
In addition to normalization, a platform can provide fabricwide services to the network. One common fabricwide service is Ethernet Virtual Private Network (EVPN). EVPN is used to extend Ethernet Layer 2 services across a large campus or between sites over a Layer 3 routed network. It is considered a fabric technology because it relies on a central control plane based on BGP to distribute MAC addresses and other information that enables connectivity between end nodes. Without the control plane, the fabric does not function even though the individual boxes can function autonomously.
A platform can provide both a fabricwide view of the network and the services necessary to run that fabric (see Figure 3-5). These capabilities result in a substantial simplification of the network and enable services that would not be possible without this central function.
FIGURE 3-5 Fabricwide Services
Scalability
Platforms also enable a greater scale in automating networks. Without a platform, the control node used for automation needs to communicate directly with each device instead of being able to optimize communications by taking advantage of state in the platform (see Figure 3-6). As an example, let’s look at the way that tools like Ansible make changes to a device. Ansible’s goal is to get the devices to a desired end state. To do that, it needs to check the current state of the device, compare that state to the desired end state, and then send the changes to the device. This operation doubles the communication between the control node and the end devices. To further complicate the issue, many operations work only on discrete parts of the configuration, meaning multiple operations need to occur to make one change.
FIGURE 3-6 Scaling Control Communications
When we introduce a platform, the communication between the platform and the device can be reduced to a minimized set of consolidated changes (see Figure 3-7).
FIGURE 3-7 Better Scale Through Platforms
Figure 3-8 illustrates how this architecture can also scale geographically. For geographically dispersed networks, these intermediary platforms can provide regional aggregation and other control plane services.
FIGURE 3-8 Scaling Platforms Geographically