- Introduction
- Why Do We Need Naming and Addressing?
- How the Problem Arose
- Background on Naming and Addressing
- Conclusions
How the Problem Arose
Naming and addressing had never been a major concern in data communications. The networks were sufficiently simple and of sufficiently limited scope that it wasn't a problem. Most early networks were point-to-point or multidrop lines, for which addressing can be done by simple enumeration. Even for large SNA networks, it was not really an issue. Because SNA is hierarchical with only a single path from the leaves (terminals) to the root (mainframe), enumerating the leaves of the hierarchy (tree) again suffices.2 In fact, addressing in a decentralized network with multiple paths, like the early ARPANET or even the early Internet, can be accommodated by enumeration and was. But everyone knew the addressing problem was lurking out there and eventually it would have to be dealt with.
The ARPANET was a research project that wasn't expected by many to succeed. No one expected the ARPANET to ever be large enough for addressing to be a major problem, so why worry about an esoteric problem for which at the time we had no answers. As it was, there were an overwhelming number of major technical problems to solve which were a lot more crucial. Just being able to route packets, let alone do useful work with it, would be a major achievement. After all, it was research. It was more important to be focused on the few specific problems that were central to making the project work. Addressing was distinctly a lesser issue. Of course, to everyone's surprise the ARPANET was almost immediately useful.
Because the initial design called for no more than a few tens of switches connecting a few hosts each, addressing could be kept simple. Consequently, there were only 8 bits of address on the Interface Message Processors (IMP). Host addresses were the IMP number (6 bits) and the IMP port numbers (2 bits). Each IMP could have a maximum of 4 hosts attached (and four 56K trunks). IMP numbers were assigned sequentially as they were deployed.
Although a maximum of 64 IMPs might seem a severe limitation, it seemed like more than enough for a research network. There was not much reason for concern about addressing. Once the success of the ARPANET was accepted, the address size of NCP was expanded in the late 1970s to 16 bits to accommodate the growth of the network. (Network Control Program implemented the Host-to-Host Protocol, the early ARPANET equivalent of TCP/IP.)
It was clear that the one aspect of naming and addressing that would be needed was some sort of directory. ARPA was under a lot of pressure to demonstrate that the network could do useful work; there certainly was not time to figure out what a directory was and design, and implement such a thing. And for the time being, a directory really wasn't necessary. There were only three applications (Telnet, FTP, and RJE), and only one each per host. Just kludge something for the short term. A simple expedient was taken of simply declaring that everyone use the same socket for each application: Telnet on socket 1, FTP on 3, and RJE on 5.3 Every host would have the same application on the same address. This would do until there was an opportunity to design and build a cleaner, more general solution. Hence, well-known sockets were born. (Strangely enough, while many of us saw this as a kludge, discussions among the people involved revealed that others never saw it that way. An unscientific survey indicates that it may depend on those who had early imprinting with operating systems and those that didn't.)
If there was any interest in naming and addressing during that period, it was more concerned with locating resources in a distributed network. How does a user find an application in the network? By the mid-1970s, several efforts were underway to build sophisticated resource sharing systems on top of the ARPANET (the original justification) or on smaller networks attached to the ARPANET. David Farber was experimenting with a system at UC Irvine that allowed applications to migrate from host to host (Farber and Larson, 1972); and another ARPA project, the National Software Works, was trying to build an elaborate distributed collaboration system on top of the ARPANET (Millstein, 1977). These projects raised questions about what should be named at the application layer and how it related to network addresses, but outstripped the capability of systems of the day.
The problem of naming and addressing had been a factor in the development of operating systems. The complexity of process structure in some operating systems provided a good basis for considering the problem (Saltzer, 1977). Operating system theory at the time drew a distinction between location-independent names and the logical and physical levels of addresses. This distinction was carried into networking and generalized as two levels of names: 1) location-independent names for applications and 2) location-dependent addresses for hosts.
The general concept was that the network should seem like an extension of the user's interface. The user should not have to know where a facility was to use it. Also, because some applications might migrate from host to host, their names should not change just because they moved. Thus, applications must have names that are location independent or as commonly called today, portable. The binding of application names to processes would change infrequently. These applications would map to location-dependent addresses, a mapping that might change from time to time. Network addresses would map to routes that could change fairly frequently with changing conditions of the network. That was the general understanding.
Using switch port numbers for addresses was not uncommon. After all, this is basically what the telephone system did (as did nearly all communication equipment at that time). However, although this might have been acceptable for a telephone system, it causes problems in a computer network. It didn't take long to realize that perhaps more investigation might be necessary. Very quickly, the ARPANET became a utility to be relied on as much or more than an object of research. This not only impairs the kind of research that can be done, it also prevents changes from being made. (On the other hand, there is a distinct advantage to having a network with real users as an object of study.) But it also led to requirements that hadn't really been considered so early in the development. When Tinker Air Force Base in Oklahoma joined the Net, they very reasonably wanted two connections to different IMPs for reliability. (A major claim [although not why it was built] for the ARPANET in those days of the Cold War was reliability and survivability.) But it doesn't work quite so easily. For the ARPANET, two lines running to the same host from two different IMPs, have two different addresses and appear as two different hosts. (See Figure 5-1.) The routing algorithm in the network has no way of knowing they go to the same place. Clearly, the addressing model needed to be reconsidered. (Because not many hosts had this requirement, it was never fixed, and various workarounds were found for specific situations.) Mostly, the old guard argued that it didn't really happen often enough to be worth solving. But we were operating system guys; we had seen this problem before. We needed a logical address space over the physical address space! The answer was obvious; although it would be another ten years before anyone wrote it down and published it. But military bases were rare on the Net, so it was not seen as a high-priority problem. Also, we all knew that this was a hard subtle problem, and we needed to understand it better before we tried to solve it. Getting it wrong could be very bad.
Figure 5-1 Because ARPANET host addresses were the port numbers of the IMPs (routers), a host with redundant network connections appears to the network as two separate hosts. Routing can't tell the two lines go to the same place.