- From Model to Reality
- Architectural Models
- Specific Architectures
Architectural Models
The three architectural models for p2p considered here are
- Atomistic
- User-centric
- Data-centric
The last two are similar in that they usually rely on centralized servers to mediate connectivity based on directories of users or resources and data.
Each model is examined to show its main and distinctive characteristics.
Atomistic P2P
In the atomistic model (AP2P), all nodes are equally server and client. There is no central administration or connection arbiter, although such might be implemented as distributed services within the network. For purists, this model is the original and only "true" p2p architecture.
Each node in the atomistic model is autonomous and fully manages its own resources and connectivity. Figure 2.2 shows the schematic connectivity model.
FIGURE 2.2
An atomistic p2p network is constructed from an arbitrary number of nodes. Each
typically maintains contact with several others, although virtual direct
connectivity can be established between any two.
The atomistic model contains a fundamental bootstrap problem: how to join. The prospective member must somehow first determine which other nodes in the network it can establish connections with. Without a central server, no easy way of determining resource or user availability is apparent in advance.
Two traditional answers are seen to this peer-discovery situation:
Use a broadcast protocol to send a query to join and await a response.
Attempt to connect to suitable "known" node addresses that will either accept new nodes or provide further node addresses to try.
When the client has physical connectivity to the p2p network infrastructure, for example in a LAN, it's feasible to use the broadcast method, effectively calling into the dark, "Hello, I'm here. Who's available out there?" The implemented protocol determines how this message is framed to be detected by other client nodes and how these nodes should respond. The query can be a request to join a designated group.
Multiplayer games often use this method to establish gaming sessions over a local network. Ad hoc sessions are created when several clients detect each other on the basis of broadcast messages plus a matching selection of game, scenario, and other criteria. In these cases, one system becomes the host server for the session.
While broadcast methods can sometimes be used on a larger, general network such as the Internet, the likelihood of successfully reaching active p2p nodes is then much smaller. More practical is to probe known addresses or address blocks for responding clients. The new client can attempt to connect directly to known addresses and through one such node create a response list of other nodes. Alternatively, it can first connect to a published "service provider" node that maintains dynamic lists of active nodes within their horizon. The client downloads this list and proceeds to work through it, attempting to establish a predetermined number of connections.
Once a successful connection is established to at least one node, a client (or other software) can listen to messages passing through the network and build its own list of active nodes for later connection attempts. Furthermore, if the client is configured to accept incoming connections, the user might find that much of the subsequent connectivity is maintained by "incoming" requests from other nodes.
While the formal lack of central administration in AP2P can cause a significant and frustrating threshold to joining the network, it also means that the network is essentially self-maintaining and resilient within the bounds of its capacity. AP2P is therefore the preferred architecture for systems that wish to ensure maximum availability and persistence of distributed data, despite the acknowledged vulnerability in the common practice of distributing nodelists externally.
One or more trusted nodes (or Web sites, or IRC channels) with guaranteed access might in some networks be effectively declared a service provider, mainly to alleviate peer discovery. A fixed address, perhaps of last resort, is then often part of the client distribution and available at first start-up. The fundamental all-nodes-are-equal paradigm is essentially unchanged, except with regard to node discovery.
Lately, some Internet p2p networks are for various reasons moving towards an extension of the service provider concept: a formal two-tier variation of the atomistic model. Particular nodes are then elevated to "super-peer" status because of their proven capacity, connectivity, and reliability. This process is made easier by automatic history and reputation tracking. Reliable or trusted super-nodes can act as list repositories, primary connection nodes, and sometimes search hubs. They provide a sort of backbone network in much the same way that backbone servers came to do for the Internet in general.
Later implementation chapters detail some of the discovery strategies in relation to the specific design goals and security concerns in each case.
User-Centric P2P
The user-centric p2p (UCP2P) model adds to the basic atomistic one a form of central server mediation. In its simplest form, the server component adds a directory based on (usually permanent) unique user identity to simplify how nodes can find each other. Identities tied to client or other entities are also possible, if sometimes less intuitive. Strictly speaking, one should probably refer to "directory-centric" for proper scope, but the term "user-centric" is the one most commonly used.
The directory provides persistent logical links that uniquely (and possibly within a specific context) identify each node, and a translation mechanism to enable direct connection by the user over the underlying network infrastructure. In an Internet context, this translates to the current IP numbers corresponding to the registered and available active nodes. Figure 2.3 shows a simple geometry of this model. In reality, the ongoing direct connections between individual clients would be a much richer web-like structure than this figure suggests.
FIGURE 2.3
Simple illustration of either user-centric or data-centric p2p. Even though
clients usually connect using the traditional infrastructure, the user sees a
different, peer-oriented namespace with new services unavailable in the
underlying physical network.
Clients register with the directory service to announce their availability to the network. Some form of periodic update to this status is tracked so that the server knows when a node is no longer availableeither "pings" sent from the client at regular intervals, which is sometimes referred to as its "heartbeat", or responses to service queries. Users might select a specific transponder status such as "busy" or "extended away" that will be reported to the server, thereby distinguishing between simple node availability and a more nuanced user-selected availability.
A user can scan/search the directory for other users who meet particular criteria, and from the listing they can determine the current connection address. With a known address, the user can then establish a direct client-to-client connection. The target node registered itself as active and known to be online, so it usually responds right away. Depending on the scope of the server mediation, the connection with the directory server might remain, either to exchange supplementary information or to track current p2p transfer status.
User-centric p2p has proven to be the most popular architecture. The most publicized implementation is surely Napster file sharing. This popularity is despite the fact that the UCP2P model has a far greater deployment in the older instant messaging technologies: Miribilis ICQ ("I seek you", now owned by AOL), AOL Instant Messaging (AIM), and MSN Messenger, to name the best known. Napster as a public representative of UCP2P is doubly ironic, because the primary interest of the users in the system is not to find users, but to find particular content on some remote computer. One would therefore have expected an explicitly data-centric focus, where the MP3-tracks constitute the permanent index of contentafter all, to all practical intents, users are mutually anonymous in the system, often away, and their identities (which are arbitrarily user-chosen at sign-up) need not be tracked at all. On the other hand, UCP2P makes the current transition attempts from a free service to a registered subscriber service much easier from the technical point of view.
One issue of concern with UCP2P networks is this reliance on central directories, which introduces specific vulnerabilities and raises privacy issues. A user directory can track user behavior and generate personal profiles. Such profiles can be worth considerable sums of money to the compiler if sold to advertisers. They also invoke the specter of user monitoring by various overt or covert agencies.
If we take the example of instant messaging, these solutions also illustrate another downside of centralized server solutions: closed, proprietary standards deliberately kept incompatible by the companies that control them. Independent services like Jabber attempt to work around the barriers by providing modular support for the different standards and so allow users access from a single client. It remains to be seen whether this strategy will become more than just experimental.
Ownership and control of directory services is perceived as increasingly important in the business community the larger the aggregate UCP2P user base grows. This conjecture is proven by how Napster clones were bought up by commercial interests even as they were being closed down for alleged music piracy.
All these issues are discussed later in more detail in both the general (Chapter 4) and the application-particular (Chapter 6) contexts.
Data-Centric P2P
Data-centric p2p (DCP2P) is very similar to the user-centric model, and it can thus be illustrated by the same figure used earlier (Figure 2.3). The distinction made in DCP2P is that the central server maintains an index over available resources, not individual users. The index is a collection of logical links that map the resources to registered nodes in the same way that UCP2P maps identified users.
Again, the term should really be "index-centric" or "resource-centric", but prevailing usage prefers "data-centric". However, just as a UCP2P directory can indirectly access content (think Napster), it's possible to indirectly access individuals from a properly structured DCP2P resource index. In this respect at least, UCP2P and DCP2P can be seen as interchangeable and the choice somewhat arbitrary.
When clients register with the DCP2P server, they mainly provide a list of currently available resources on that node. Although this is usually assumed to be data content of one kind or anotherthat is, files or documentsnothing requires it must be. Resources can include more abstract entities, such as client-managed services, storage, gateway interfaces, and other intangibles.
Users can in DCP2P architecture search for and access data (or content) held on other nodes. With data in primary focus, it's understandable that different forms of content management solutions turn mainly to DCP2P. It is the area of greatest excitement and promise for p2p in business, as opposed to private use.
Access in DCP2P tends to be more governed by rules than in UCP2P, especially in corporate contexts. Not everyone is allowed to access everything, at any time. Exclusion/admittance policy requires more "intelligence" in server index and client management, so this kind of architecture is still very much under development, targeting enterprise solutions. Furthermore, security issues are paramount because of the deep access into registered nodesboth in terms of data and functionality, and the often sensitive nature of the content.
Leveraged P2P
When dealing with the last two entries in Table 2.1, computation-centric and next-generation Web models, I would like to use the term "leveraged p2p" (LP2P), insofar as such implementations are to be considered in purely peer-to-peer contexts. Each can be combined with the other, and also with elements from the previous three p2p architectures. They might also be fully incorporated into these respective p2p contexts to achieve synergies that can dramatically improve network performance. However, it didn't feel appropriate to include deeper discussions of these models in this book.
Take for example distributed processing. Traditional DP implementations tend to be highly server-centric, as shown in Figure 2.4, because the primary interest has been for a single owner of large amounts of raw data to process it using distributed and otherwise idle resources. Owners of idle capacity, say a PC on a network, install client software and register with the server. They then let the client autonomously request and process data for the server owner, but generally have no insight or control of the process, or the data. The best known example is the SETI@home Project, which tries to analyze the vast reams of data available from radio telescopes in order to detect evidence of intelligent life in the universe.
FIGURE 2.4
Simple illustration of traditional distributed processing. Typically little or
no communication occurs between nodes. Data is owned by the central server(s),
and tasks are sent out to nodes that sent back results.
The thorny issue of data ownership and potential rights to processed results in a public DP setting arose with a comparable distributed effort to identify genes in human DNA. Ostentatiously hosted by a university for cancer research, any results gleaned turned out to be patentable and exclusively owned by a private company. When this fact was made public, many users left that DP network in disgust.
Using DP in a proper p2p context is comparatively rare as yet. In fact, the usual definition of "large processing tasks segmented into component blocks, delivered to a number of machines where these blocks are processed, and reassembled by a central machine" is pretty clear about the strict hierarchy model.
One potential network-specific application of DP based on p2p is distributed indexing, where nodes in a DCP2P network assume responsibility for a portion of the total indexing task and work together in fulfilling search requests. Various distributed search schemes are possible, index-based or instance-compiled on varying scope.
A combination of DP and the new Web, sometimes called "Web Mk2", offers other prospects. In the p2p context, one can envision autonomous, roaming Web agents opportunistically using p2p nodes as temporary hosts as they go about their assigned information-gathering tasksa kind of benign Internet "worm". These and other visions of future potential are dealt with in Part III.