Distributed Computing
When many similar systems are on a network, it is often desirable to share common files and utilities among them. For example, a system administrator might choose to keep a copy of the system documentation on one computer's disk and to make those files available for all remote systems. In this case the system administrator configures the files so that users who need to access the online documentation are not aware that the files are stored on a remote system. This type of setup, which is an example of distributed computing, not only conserves disk space but also allows you to update one central copy of the documentation rather than tracking down and updating copies scattered throughout the network on many different systems.
Figure 9-2 illustrates a fileserver that stores the system manual pages and users' home directories. With this arrangement, a user's files are always available to that userno matter which system the user is on. Each system's disk might contain a directory to hold temporary files, as well as a copy of the operating system. For more information refer to "exportfs: Stores Permissions to Mount Local Filesystems" on page 1018 and "autofs: Automatically Mounts Filesystems" on page 979.
Figure 9-2. A fileserver
The Client/Server Model
Although there are many ways to distribute computing tasks on hosts attached to a network, the client/server model dominates UNIX and GNU/Linux system networking. A server system offers services to its clients and is usually a central resource. In Figure 9-2 the system that acts as the documentation repository is a server, and all the systems that contact it to display information are clients. Some servers are designed to interact with specific utilities, such as Web servers and browser clients. Other servers, such as those supporting DNS, communicate with one another in addition to answering queries from a variety of clients; in other words, a server can act as a client when it queries another server.
The client/server terminology also applies to processes that may be running on one or more systems. A server process may control a central database while client processes send queries to the server and collect replies. In this case the client and server processes may be running on the same computer. The client/server model underlies most of the network services described in this chapter.
DNS: Domain Name Service
DNS is a distributed service: Name servers on thousands of machines around the world cooperate to keep the database up-to-date. The database itself, which contains the information that maps hundreds of thousands of alphanumeric hostnames into numeric IP addresses, does not exist in one place. That is, no system has a complete copy of the database. Instead each system that runs DNS knows about the hosts that are local to that site and how to contact other name servers to learn about other, nonlocal hosts.
Like the GNU/Linux filesystem, DNS is organized hierarchically. Each country has an ISO (International Standards Organization) country code designation as its domain name, (For example, AU represents Australia, IL is Israel, and JP is Japan; see www.iana.org/cctld/cctld.htm for a complete list.) Although the United States is represented in the same way (US) and uses the standard two-letter Postal Service abbreviations to identify the next level of the domain, only governments and a few organization use these codes. Schools in the US domain are represented by a third- (and sometimes second-) level domain: k12. For example, the domain name for Myschool in New York state could be www.myschool.k12.ny.us.
Following is a list of the six original, common, top-level domains. These domains are used extensively within the United States and, to a lesser degree, by users in other countries:
COM Commercial enterprises
EDU Educational institutions
GOV Nonmilitary government agencies
MIL Military government agencies
NET Networking organizations
ORG Other (often nonprofit) organizations
As this book was being written, the following additional top-level domains had been approved for use:
AERO Air-transport industry
BIZ Business
COOP Cooperatives
INFO Unrestricted use
MUSEUM Museums
NAME Name registries
As with Internet addresses, domain names used to be assigned by the Network Information Center (NIC [page 362]). Now they are assigned by several companies. A system's full name, referred to as its fully qualified domain name (FQDN), is unambiguous in the way that a simple hostname cannot be. The system okeeffe.berkeley.edu at the University of California, Berkeley (Figure 9-3) is not the same as one named okeeffe.moma.org, which might represent a host at the Museum of Modern Art. The domain name not only tells you something about where the system is located but also adds enough diversity to the name space to avoid confusion when different sites choose similar names for their systems.
Figure 9-3. United States top-level domains
Unlike the filesystem hierarchy, the top-level domain name in the United States appears last (reading from left to right). Also, domain names are not case sensitive. The names okeeffe.berkeley.edu, okeeffe.Berkeley.edu, and okeeffe.Berkeley.EDU refer to the same computer. Once a domain has been assigned, the local site is free to extend the hierarchy to meet local needs.
With DNS, mail addressed to user@tcorp.com can be delivered to the tcorp.com computer that handles the corporate mail and knows how to forward messages to user mailboxes on individual machines. As the company grows, the site administrator might decide to create organizational or geographical subdomains. The name tcorp.ca.tcorp.com might refer to a system that supports California offices, with alpha.co.tcorp.com dedicated to Colorado. Functional subdomains might be another choice, with tcorp.sales.tcorp.com and alpha.dev.tcorp.com representing the sales and development divisions, respectively.
On GNU/Linux systems the most common interface to the DNS is BIND (Berkeley Internet Name Domain) software. BIND follows the client/server model. On any given local network, one or more systems may be running a name server, supporting all the local hosts as clients. When it wants to send a message to another host, a system queries the nearest name server to learn the remote host's IP address. The client, called a resolver, may be a process running on the same computer as the name server, or it may pass the request over the network to reach a server. To reduce network traffic and accelerate name lookups, the local name server has some knowledge of distant hosts. If the local server has to contact a remote server to pick up an address, when the answer comes back, the local server adds that to its internal table and reuses it for a while. The name server deletes the nonlocal information before it can become outdated. Refer to "TTL" on page 1499.
How the system translates symbolic hostnames into addresses is transparent to most users; only the system administrator of a networked system needs to be concerned with the details of name resolution. Systems that use DNS for name resolution are generally capable of communicating with the greatest number of hostsmore than would be practical to maintain in an /etc/hosts file or private NIS database.
Four common sources are used for host name resolution: NIS, NIS+, DNS, and system files (such as /etc/hosts). GNU/Linux does not ask you to choose among these sources; rather, the nsswitch.conf file (page 962) allows you to choose any of these sources, in any combination, and in any order.
NIS: Network Information Service
NIS is another example of the client/server paradigm. Sun Microsystems developed NIS to simplify the administration of certain common administrative files by maintaining them in a central database and having clients contact the database server to retrieve information. Just as the DNS addresses the problem of keeping multiple copies of the hosts file up-to-date, NIS keeps system-independent configuration files (such as /etc/passwd) current. Most networks today are heterogeneous (page 1470), and even though they run different varieties of UNIX or GNU/Linux, they have certain common attributes (such as the passwd file).
NIS was formerly named the Yellow Pages, and people still refer to it by this name. Sun renamed the service because another corporation holds the trademark to that name. The names of NIS utilities, however, are still reminiscent of the old name: ypcat displays an NIS database, ypmatch searches, and so on.
Consider the /etc/group file, which maps symbolic names to group ID numbers. If NIS is administering this configuration file on your system, you might see the following single entry instead of a list of group names and numbers:
$ cat /etc/group +:*:* ...
When it needs to map a number to the corresponding group name, a utility encounters the plus sign (+) and knows to query the NIS server at that point for the answer. You can display the group database with the ypcat utility:
$ ypcat group pubs::141:alex,jenny,scott,hls,barbara ...
Or you can search for a particular group name by using ypmatch:
$ ypmatch pubs group pubs::141:alex,jenny,scott,hls,barbara
You can retrieve the same information by filtering the output of ypcat through grep, but ypmatch is more efficient because it searches the database directly, using a single process. The database name is not the full pathname of the file it replaces; the NIS database name is the same as the simple filename (group, not /etc/group). The ypmatch utility works only on the key for the table (the group name in the case of groups). When you want to match members of the group, the group number, or other fields of a map (such as the full name in the passwd map), you need to use ypcat with grep.
As with DNS, ordinary users need not be aware that NIS is managing system configuration files. Setting up and maintaining the NIS databases is a task for the system administrator; individual users and users on single-user GNU/Linux systems rarely need to work directly with NIS.
NFS: Network Filesystem
NFS lets you can work locally with files that are stored on a remote computer's disks. These files appear as if they are present on the local computer. The remote system is the fileserver (server); the local system is the client. The client makes requests of the server.
Unfortunately NFS is based on the trusted-host paradigm (page 373) and therefore has all the security shortcomings that plague services based on this paradigm.
NFS is configured by the person responsible for the system. When you work with a file, you may not be aware of where the file is physically stored. In many computer facilities today, user files are commonly stored on a central fileserver equipped with many large-capacity disk drives and devices that quickly and easily make backup copies of the data. A GNU/Linux system may be diskless, where a floppy disk (or CD-ROM) is used to start GNU/Linux and load system software from another machine on the network. The Linux Terminal Server Project (LTSP.org) Web site says it all: "Linux makes a great platform for deploying diskless workstations that boot from a network server. The LTSP is all about running thin client computers in a GNU/Linux environment." Because a diskless workstation does not require a lot of computing power, you can give older, retired computers a second life by using them as diskless systems.
Another type of GNU/Linux system is the dataless system, in which the client has a disk but stores no user data (only GNU/Linux and the applications are kept on the disk). Setting up this type of system is a matter of choosing which filesystems are mounted remotely.
You can even netboot (page 1481) some machines. Red Hat includes the PXE (Preboot Execution Environment) server package for netbooting Intel machines. Older machines with netcard-mounted boot ROMs sometimes use tftp (trivial file transfer protocol) for netbooting. Non-Intel architectures have historically included netboot capabilities that Red Hat Linux supports. The Linux kernel contains the capability to be built to mount root (/ ), using NFS.
Of the many ways to set up your system, the one you choose depends on what you want to do. Setting up these specialized boot configurations is not a trivial task. See the Remote-boot mini HOWTO for more information.
The df utility displays a list of the filesystems available on your system, along with the amount of disk space, free and used, on each. Filesystem names that are prepended with hostname: are available to you through NFS.
[bravo]$ pwd /kudos/home/jenny [bravo]$ df Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda1 311027 189038 105926 64% / ... /dev/sdc3 1336804 13 1267712 0% /c3 zach:/c 2096160 1896704 199456 90% /zach_c zach:/d 2096450 1865761 230689 89% /zach_d panda:/c 1542016 433568 1108448 28% /panda_c panda:/d 1542208 1189026 353182 77% /panda_d kudos:/home 198275 68408 119612 36% /kudos/home
In this example Jenny's home directory is stored on the remote system kudos. The /home filesystem on kudos is mounted on bravo, using NFS; as a reminder of its physical location, the system administrator has made it available using a pathname that includes the remote server's name. Filesystems on zach and panda are also available on bravo: These are the C: and D: drives on two MS Windows machines. Use the h (human) option to df to make the output more intelligible. Refer to page 1147 in Part III for more information on df.
The physical location of your files should not matter to you; all the standard GNU/Linux utilities work with NFS-remote files the same way as they operate with local files. At times, however, you may lose access to your remote files: Your computer may be up and running, but a network problem or a remote system crash may make these files temporarily unavailable: When you try to access a remote file, you get an error message, such as NFS server kudos not responding. When your system can contact the remote server again, you see a message, such as NFS server kudos OK.
automount: Mounts Filesystems Automatically
With distributed computing you can log in on any machine on the network, and all your files, including startup scripts, will be easily available. A distributed computing environment commonly has all machines able to mount all filesystems on all servers: Whichever machine you log in on, your home directory will be waiting for you.
Having all machines mount all servers all the time can be problematic. Suppose that machine A mounts some filesystems from machine B and machine B mounts some from machine A. What happens when you bring one of these machines down for maintenance or it crashes? In what order do you reboot them when they depend on each other to be up? In a large network you can have one machine mounting disks from tens or hundreds of others for software files and home directories.
One way around this problem is to mount filesystems only on demand. On GNU/Linux machines demand mounting is handled by the autofs system (using the automount daemon), which is replacing the older, less efficient amd (automounting daemon). Because autofs runs in kernel space (amd runs in user space), you need to have support for it in the kernel (Filesystems/Kernel automounter support). For example, when you issue the command ls /home/alex, autofs goes to work: It looks in the /etc/auto.home map, finds that alex is a key that says to mount franklin:/export/homes/alex, and mounts the remote filesystem.
Once the filesystem is mounted, ls displays the list of files you want to see. If after this mounting sequence you give the command ls /home, ls shows that alex is present within the /home directory. The df utility shows that alex is mounted from franklin. By default the automount daemon automatically unmounts this filesystem after five minutes of inactivity.
Automounting filesystems is similar in concept to MS Windows 9x network neighborhood. When you know there are NFS servers named franklin, adams, and madison, you can see all the filesystems that are exported by each by using ls to display (for example) /net/franklin, /net/adams, and /net/madison. Once these filesystems are mounted, you can browse through them if you have permission.
The GNU/Linux automount facility is flexible and powerful. Refer to "autofs: Automatically Mounts Filesystems" on page 979 and the automount man page for more information.
Optional
Internet Services
GNU/Linux Internet services are provided by daemons that run continuously or by a daemon that is started automatically by the xinetd daemon (page 397) when a service request comes in. The /etc/services file lists network services (for example telnet, ftp, ssh) and their associated numbers. Any service that uses TCP/IP or UDP/IP uses an entry in this file. IANA (Internet Assigned Numbers Authority) maintains a database of all permanent, registered services. The /etc/services file usually lists a small, commonly used subset of services. Go to www.rfc.net/rfc1700.html for more information and a complete list of registered services.
Most of the daemons (the executable files) are stored in /usr/sbin. By convention the names of many daemons end with the letter d to distinguish them from utilities.20 The prefix in. or rpc. is often used for daemon names. Look at /usr/sbin/*d to see a list of many of the daemon programs on your system. Refer to "rc Scripts: Start and Stop System Services" on page 944 and service: Configures Services I on page 945 for information about starting and stopping these daemons.
For example, when you run ssh, your local system contacts the ssh daemon (sshd) on the remote system to establish the connection. The two systems negotiate the connection according to a fixed protocol. Each system identifies itself to the other, and then they take turns asking each other specific questions and waiting for valid replies. Each network service follows its own protocol.
In addition to the daemons that support the utilities described up to this point, many other daemons support system-level network services that you will not typically interact with. Some of these daemons are listed in Table 9-4.
Table 9-4.
Daemon |
Used For or By |
Function |
---|---|---|
apmd |
Advanced power management |
Reports and takes action on specified changes in system power, including shutdowns. Very useful with machines, such as laptops, that run on batteries. |
atd |
at |
Executes a command once at a specific time and date. See crond for periodic execution of a command. |
automount |
Automatic mounting |
Automatically mounts filesystems when they are accessed. Automatic mounting is a way of demand-mounting remote directories without having to hard-configure them into /etc/fstab. |
comsat |
Notifies users of new mail |
Used by biff, a utility that notifies users of incoming mail. If the user is logged on and has run biff y, comsat sends a message to the user's shell, saying that there is new mail (at an appropriate time). Security-conscious sites may want to disable this service, as it has a history of security holes. Launched by xinetd. |
crond |
cron |
Used for periodic execution of tasks, this daemon looks in the /var/spool/cron/ directory for files that have filenames that correspond to users' login names. It also looks at the /etc/crontab file and at files in the /etc/cron.d directory. When a task comes up for execution, crond executes it as the user who owns the file that describes the task. |
dhcpcd |
DHCP |
Client daemon. Refer to "DHCP Client" on page 1028. |
dhcpd |
DHCP |
Assigns Internet address, subnet mask, default gateway, DNS, and other information to hosts. This protocol answers DHCP requests and, optionally, BOOTP requests. See DHCP on page 1465. |
fingerd |
finger |
Handles requests for user information from the finger utility. Launched by xinetd. |
ftpd |
FTP |
Handles FTP requests. Refer to "ftp: Transfers Files over a Network" on page 378. Launched by xinetd. |
gpm |
General-purpose mouse or GNU paste manager |
Allows you to use a mouse to cut and paste text on console applications. |
httpd |
HTTP |
A Web server daemon. See HTTP on page 1472. |
inetd |
Deprecated in favor of xinetd. |
|
lpd |
line printer spooler daemon |
Launched by xinetd when printing requests come to the machine. |
named |
DNS |
Supports DNS (page 1465), which has replaced the use of the /etc/hosts table for hostname-to-IP address mapping on most networked UNIX/Linux systems. |
nfsd, statd, lockd, mountd, rquotad |
NFS |
These five daemons operate together to handle NFS (page 1482) operations. The nfsd daemon handles file and directory requests. The statd and lockd daemons implement network file and record locking. The mountd daemon takes care of converting a filesystem name request from the mount utility into an NFS handle and checks access permissions. Finally, if disk quotas are enabled, rquotad handles those. |
ntpd |
NTP |
Synchronizes time on network computers. Requires a /etc/ntp.conf file. For more information go to www.eecis.udel.edu/~mills/ntp/servers.htm and www.eecis.udel.edu/~ntp. |
portmap |
RPC |
Maps incoming requests for RPC service numbers to a TCP or UDP port numbers on the local machine. Refer to "RPC Network Services" on page 398. |
pppd |
PPP |
For a modem this protocol controls the pseudointerface represented by the IP connection between your computer and a remote computer. Refer to "PPP: Point-to-Point Protocol" on page 362. |
rexecd |
rexec |
Allows a remote user with a valid username and password to run programs on a machine. Its use is generally deprecated because of security, but certain programs, such as PC-based X servers, may still have it as an option. Launched by xinetd. |
routed |
Routing tables |
Manages the routing tables so that your system knows where to send messages that are destined for remote networks. If your system does not have a /etc/defaultrouter file, routed is started automatically to listen to incoming routing messages and to advertise outgoing routes to other systems on your network. A newer daemon, the Gateway daemon (gated), offers enhanced configurability and support for more routing protocols and is proportionally more complex. |
sendmail |
Mail programs |
The sendmail daemon came from Berkeley and has been available for a long time. The de facto mail transfer program on the Internet, the sendmail daemon always listens on port 25 for incoming mail connections and then calls a local delivery agent, such as /bin/mail. Mail user agents, such as pine and Mozilla mail, typically use sendmail to deliver mail messages. |
smbd, nmbd |
Samba |
Allow MS Windows PCs to share files and printers with UNIX/Linux computers. |
sshd |
ssh, scp |
Enables secure logins between remote machines (page 374). |
syslogd |
System log |
Transcribes important system events and stores them in files and/or forwards them to users or another host running the syslogd daemon. This daemon is configured with /etc/syslog.conf and used with the syslog utility. |
talkd |
talk |
Allows you to have a conversation with another user on the same or a remote machine. The talkd daemon handles the connections between the machines. The talk utility on each machine contacts the talkd daemon on the other machine for a bidirectional conversation. Launched by xinetd. |
telnetd |
TELNET |
One of the original Internet remote access protocols (page 376). Launched by xinetd. |
tftpd |
TFTP |
Used to boot a system or get information from a network. Examples include network computers, routers, and some printers. Launched by xinetd. |
timed |
Time server |
On a LAN synchronizes time with other computers that are also running timed. |
xinetd |
Internet Superserver |
Listens for service requests on network connections and starts up the appropriate daemon to respond to any particular request. Because of xinetd, your system does not need to have all the daemons running all the time in order to handle various network requests. The configuration file for xinetd is /etc/xinetd.conf, which frequently includes all the files in the /etc/xinetd.d directory with the line includedir /etc/xinetd.d Each of the files in xinetd.d is named after a service that it controls. Each file contains a line that starts with disable = and finishes with yes or no. This line determines whether the service can run. |
Proxy Server
A proxy is a network service that is authorized to act for a system while not being part of that system. A proxy server or proxy gateway provides proxy services; it is a transparent intermediary, relaying communications back and forth between an application, such as a browser and a server, usually outside of your LAN and frequently on the Internet. When more than one process uses the proxy gateway/server, it must keep track of which processes are connecting to which hosts/servers so that it can route the return messages to the proper process. The most common proxies that a user encounters are e-mail and Web proxies.
A proxy server/gateway insulates the local computer from all other computers or from specified domains by using at least two IP addresses: one to communicate with your local computer and one to communicate with a server. The proxy server/gateway examines and changes the header information on all packets it handles so that it can encode, route, and decode them properly. The difference between a proxy gateway and a proxy server is that the proxy server usually includes cache (page 1459) to store frequently used Web pages so that the next request for that page is available locally and quickly whereas a proxy gateway usually does not use cache. The terms proxy server and proxy gateway are frequently interchanged.
Proxy servers/gateways are available for such common Internet services as HTTP, HTTPS, FTP, SMTP, and SNMP. When an HTTP proxy sends queries from local machines, it presents a single organization-wide IP address (the external IP address of the proxy server/gateway) to all servers. It funnels all user requests to servers and keeps track of them. When the responses come back, it fans them out to the appropriate applications, using each machine's unique IP address, protecting local addresses from remote/specified servers. Proxy servers/gateways are generally just one part of an overall firewall strategy to prevent intruders from stealing information or damaging an internal network. Other functions, which can be combined with or be separate from the proxy server/gateway, are packet filtering, which blocks traffic based on origin and type, and user activity reporting, which helps management learn how the Internet is being used.
Refer to "Proxies" on page 1016 for practical information on setting up a proxy.
RPC Network Services
An RPC (remote procedure call) is a call to a procedure (page 1486) that acts transparently across a network. The procedure itself is responsible for accessing and using the network. The RPC libraries make sure that network access is transparent to the application. RPC runs on top of TCP/IP or UDP/IP.
The /etc/rpc file lists servers for RPCs.21 This file has three columns: the name of the server for the RPC program, the RPC program number, and the names of programs that use the RPC program.
When an RPC server is initialized, it picks an arbitrary port (page 1485) that it communicates over. The server then registers this port with the RPC portmapper on the same machine, using the portmap utility. The portmap utility always listens on port 111 for both TCP and UDP.
When it wishes to execute an RPC against an RPC server, a client contacts portmap on the remote machine and asks which port the RPC server (for example rpc.rstatd) is listening on. The portmapper looks in its tables and returns a UDP or TCP port number. The client then contacts the server on that port.
The client sends arguments, just as a local function call or procedure would; the RPC libraries take care of transmission; the remote procedure executes with the arguments and generates a result; the RPC libraries encode the result and return it over the network to the client.