4.4 Amazons Rules for Teams
As we mentioned in Chapter 1, Amazon has a rule that no team should be larger than can be fed with two pizzas; in the early years of this century they adopted an internal microservice architecture. Associated with the adoption was a list of rules to follow about how to use the services:
- All teams will henceforth expose their data and functionality through service interfaces.
- Teams must communicate with each other through these interfaces.
- There will be no other form of inter-service/team communication allowed: no direct linking, no direct reads of another teams datastore, no shared-memory model, no backdoors whatsoever. The only communication allowed is via service interface calls over the network.
- It doesnt matter what technology they [other services] use.
- All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world.
Each team produces some number of services. Every service is totally encapsulated except for its public interface. If another team wishes to use a service, it must discover the interface. The documentation for the interface must include enough semantic information to enable the user of a service to determine appropriate definitions for items such as customer or address. These concepts can sometimes have differing meanings within different portions of an organization. The semantic information about an interface can be kept in the registry/load balancer that we described earlier, assuming that the semantic information is machine interpretable.
By making every service potentially externally available, whether or not to offer a service globally or keep it local becomes a business decision, not a technical one. External services can be hidden behind an application programming interface (API) bound through a library, and so this requirement is not prejudging the technology used for the interface.
A consequence of these rules is that Amazon has an extensive collection of services. A web page from their sales business makes use of over 150 services. Scalability is managed by each service individually and is included in its SLA in the form of a guaranteed response time given a particular load. The contract covers what the service promises against certain demand levels. The SLA binds both the client side and the service side. If the clients demand exceeds the load promised in the SLA, then slow response times become the clients problem, not the services.