- Introduction
- How Does the PVM Library Work?
- Simplifying Access Through Interface Classes
Simplifying Access Through Interface Classes
While the PVM library is powerful and straightforward to use, the message-passing protocol can be a bit tedious. Before a data element can be sent, it must be packed. Before a data element can be received, it must be unpacked. pvm_pk and pvm_upk routines are available for most of the built-in datatypes, as shown in the following table.
Pack Routine |
Unpack Routine |
pvm_pkbyte() |
pvm_upkbyte() |
pvm_pkdouble() |
pvm_upkdouble() |
pvm_pkfloat() |
pvm_upkfloat() |
pvm_pkint() |
pvm_upkint() |
pvm_pklong() |
pvm_upklong() |
pvm_pkshort() |
pvm_upkshort() |
pvm_pkstr() |
pvm_upkstr() |
The appropriate pvm_pk and pvm_upk routines are called and then a pvm_send() or pvm_recv() can be called. The pvm_send() process requires the send buffer to be initialized, which means that pvm_initsend() is called prior to any pvm_pk routines. This process becomes even more demanding when user-defined types need to be passed between PVM processes.
We want to simplify the PVM send-and-receive process by adapting it to the C++ stream metaphor of I/O. Further, we want to take advantage of the C++ information-hiding and encapsulation facilities to shield us from some of the PVM library syntax and to provide a more familiar interface for message passing and error handling. Molding the interface to the PVM routines will make using the PVM library easier and will help to clarify the logic of our parallel programs. As is often the case, the encapsulation and interface adaptation process requires more work up front, but the payoff is immediate and long-lasting. The supplier of the class does the heavy lifting, and the user of the class reaps the immediate benefits. Let's get started.
Listing 1 has two simple PVM programs: Sending Worker sends an integer and a char array, and Receiving Worker receives the integer and char array.
Listing 1 Two simple PVM programs: a sending worker and a receiving worker.
// Sending Worker #include "pvm3.h" #include <string.h> #include <iostream> int main(int argc,char *argv[]) { int NumTasks = 1; int Tid,Workers,MTag,Size,Value1; char Value2[100]; long Result; Workers = pvm_spawn("worker",NULL,PvmTaskDefault,NULL,NumTasks,&Tid); if(Workers == NumTasks){ MTag = 1; strcpy(Value2,"cluster application"); pvm_initsend(PvmDataDefault); Size = strlen(Value2); pvm_pkint(&Size,1,1); pvm_send(Tid,MTag); pvm_initsend(PvmDataDefault); pvm_pkstr(Value2); pvm_send(Tid,MTag); pvm_recv(Tid,MTag); pvm_upklong(&Result,1,1); } else{ cerr << "Some Appropriate Error Message" << endl; } pvm_exit(); return(Workers); } // Receiving Worker #include "pvm3.h" #include <string.h> #include <iostream> int main(int argc,char *argv[]) { int NumTasks = 1; int Pid,MTag,Value1; char Value2[100]; long RandomNum; Pid = pvm_parent(); MTag = 1; strcpy(Value2,""); pvm_recv(Pid,MTag); pvm_upkint(&Value1,1,1); pvm_recv(Pid,MTag); pvm_upkstr(Value2); // do some stuff RandomNum = 981928191; pvm_initsend(PvmDataDefault); pvm_pklong(&RandomNum,1,1); pvm_send(Pid,MTag); pvm_exit(); return(0); }
Notice the pvm_pk and pvm_upk routines. These would be required for every datatype involved in a send or receive operation. Each datatype that's sent or received by a PVM program has its own set of pack and unpack functions. Also notice the use of the pvm_initsend() routine. This routine is required before most send operations. As the number and type of data elements involved in send and receive operations increases, the tedium sets in. Also, we're far removed from our familiar iostream metaphor. We can transform this situation by providing interface classes to the PVM routines. We have several goals:
The interface class hides one interface while providing another (more convenient or appropriate) interface. In this case, we want to adapt the pvm_initsend(), pvm_pk, pvm_upk, pvm_send(), and pvm_recv() interfaces to the more familiar C++ iostream interface. Once these interfaces are adapted, we can concentrate more on the challenges of parallel and distributed programming without being bogged down by the syntax of the PVM library.
Further, if we design and implement an interface class that's consistent with the familiar istream and ostream interfaces, we remove the learning curve for other developers who are involved in our project or who might use our interface class. To pull off this switcheroo, our interface class must provide definitions for all of the datatypes that the PVM routines handle.
Finally, we should make the class easy to use with user-defined types.
Listing 2 shows skeleton class declarations for ipvm_stream and opvm_stream classes.
Listing 2 Skeleton class declarations for ipvm_stream and opvm_stream.
class ipvm_stream{ public: ipvm_stream(int Tid, int Mid); void taskId(int Tid); void messageId(int Mid); void reset(void); ipvm_stream(void); ipvm_stream &operator>>(int &Data); ipvm_stream &operator>>(string &Data); ipvm_stream &operator>>(vector<string> &X); ipvm_stream &operator>>(list<string> &X); ipvm_stream &operator>>(analysis &X); ... private: int TaskId; int MessageId; }; class opvm_stream{ public: opvm_stream(void); opvm_stream(int Tid, int Mid); void taskId(int Tid); void messageId(int Mid); void reset(void); opvm_stream &operator<<(string Data); opvm_stream &operator<<(int &Data); opvm_stream &operator<<(vector<string> &X); opvm_stream &operator<<(list<string> &X); opvm_stream &operator<<(analysis &X); private: int TaskId; int MessageId; ... };
The proper design and implementation of these classes will require more work up front, but the resulting code is considerably simplified, easier to understand, and easier to maintain. The ipvm_stream and opvm_stream classes are scaled-down versions of what we use. Keep in mind that these classes wouldn't be complete without an error-handling and exception-handling policy. Also, because we're using them in a parallel programming environment, there are opportunities for data race; therefore, locking and synchronization policies come into play. The ipvm_stream and opvm_stream classes don't inherit any of the istream or ostream family of classes; instead, they define a similar interface. So, while the ipvm_stream and opvm_stream classes are not related by inheritance, they are related by interface. The ipvm_stream class is used for receiving objects from other PVM workers, and the opvm_stream class is used to send objects to other PVM workers.
Listing 3 shows some skeleton definitions for these classes. Notice that buffers involved and PVM functional requirements must be met by data members that are defined by the ipvm_stream and opvm_stream classes.
Listing 3 Definitions of class methods for ipvm_stream and opvm_stream.
void ipvm_stream::reset(void) { pvm_initsend(PvmDataDefault); } ipvm_stream::ipvm_stream(int Tid, int Mid) { ... TaskId = Tid; MessageId = Mid; } ipvm_stream &ipvm_stream::operator>>(string &Data) { ... char Buffer[2048]; pvm_recv(TaskId,MessageId); pvm_upkstr(Buffer); Data.assign(Buffer); return(*this); } ipvm_stream &ipvm_stream::operator>>(int &Data) { pvm_recv(TaskId,MessageId); pvm_upkint(&Data,1,1); return(*this); } ipvm_stream &ipvm_stream::operator>>(vector<string> &X) { ... char Buffer[2048]; int NumWords; string Data; pvm_recv(TaskId,MessageId); pvm_upkint(&NumWords,1,1); int N; for(N = 0;N < NumWords; N++) { pvm_upkstr(Buffer); Data.assign(Buffer); X.push_back(Data); } return(*this); } opvm_stream &opvm_stream::operator<<(string Data) { reset(); pvm_pkstr(const_cast<char *>(Data.c_str())); pvm_send(TaskId,MessageId); return(*this); } opvm_stream &opvm_stream::operator<<(int &Data) { reset(); pvm_pkint(&Data,1,1); pvm_send(TaskId,MessageId); return(*this); } opvm_stream &opvm_stream::operator<<(vector<string> &X) { ... reset(); int N = 0; for(N = 0;N < X.size(); N++) { reset(); pvm_pkstr(const_cast<char*>(X[N].c_str())); pvm_send(TaskId,MessageId); } return(*this); } opvm_stream &opvm_stream::operator<<(list<string> &X) { ... reset(); string Token; while(X.size() > 0) { Token.assign(X.front()); X.pop_front(); pvm_pkstr(const_cast<char*>(Token.c_str())); pvm_send(TaskId,MessageId); } ... }
Notice that the ipvm_stream and opvm_stream methods simply wrap the pvm_pk, pvm_upk, pvm_send(), pvm_recv(), and pvm_initsend() routines. This is why interface classes are sometimes referred to as wrapper classes. Our interface to the PVM routines is now complete. Listing 4 is a rewrite of the program in Listing 1.
Listing 4 Listing 1 rewritten to use the ipvm_stream and opvm_stream classes.
// Sending Worker #include "pvm3.h" #include <string> #include <iostream> #include "pvm_stream.h" int main(int argc,char *argv[]) { int NumTasks = 1; int Tid,Workers,MTag,Size,Value1; string Value2("cluster application"); long Result; Workers = pvm_spawn("worker",NULL,PvmTaskDefault,NULL,NumTasks,&Tid); if(Workers == NumTasks){ MTag = 1; opvm_stream Destination(Tid,MTag); ipvm_stream Source(Tid,MTag); Size = Value2.size(); Destination << Size << Value2.c_str(); Source >> Result; } else{ cerr << "Some Appropriate Error Message" << endl; } pvm_exit(); return(Workers); } // Receiving Worker #include "pvm3.h" #include <string.h> #include <iostream> #include "pvm_stream.h" int main(int argc,char *argv[]) { int NumTasks = 1; int Pid,MTag,Value1; string Value2; long RandomNum; Pid = pvm_parent(); MTag = 1; opvm_stream Destination(Pid,MTag); ipvm_stream Source(Pid,MTag); Source >> Value1 >> Value2; // do some stuff RandomNum = 981928191; Destination << RandomNum; pvm_exit(); return(0); }
The PVM program in Listing 4 uses our iostream metaphor and hides the details of the PVM send-and-receive process. The sender can be on any computer in the virtual machine, and the receiver can be on any computer in the virtual machine, and this code will work.
We reap even greater benefits when we send complex user-defined datatypes between PVM processes. For instance, at Ctest Labs, we've developed a cluster-based text-file analysis utility that takes advantage of the multiple processors in a cluster environment. The utility is used to perform detailed content analysis of text files in real time. The utility is given hundredspossibly thousandsof text files to analyze, and it must return the analysis in a relatively short period of time. Because the analysis involves set manipulations, transformations of some text elements to horn clause form, skolemization (another type of transformation), and graph-traversal techniques, this utility is processor-intensive, and the more processors available, the better the utility performs. Using this utility, we routinely send and receive to and from pvmd's deques, multisets, lists, and vectors of built-in types and user-defined objects using our ipvm_stream and opvm_stream classes. Listing 5 shows the implementation of the inserter that's used to insert a vector of strings into an opvm_stream.
Listing 5 Inserts a vector of strings into the opvm_stream.
opvm_stream &opvm_stream::operator<<(vector<string> &X) { ... reset(); int N = 0; for(N = 0;N < X.size(); N++) { reset(); pvm_pkstr(const_cast<char*>(X[N].c_str())); pvm_send(TaskId,MessageId); } ... return(*this); }
Keep in mind that the inserter defined in Listing 5 is responsible for setting an error state of the pvm_stream, throwing an exception, or both. Our text-based analysis utility relies on custom-designed error classes in most cases, and on exception-handling techniques in the case of extreme failure. We have developed our text-based analysis utility for the PVM and the MPI environment, with ipvm_stream, impi_stream, opvm_stream, and ompi_stream interface classes. In the second article of this three-part series, we'll take a closer look at the implementation of the pvm_stream and mpi_stream classes and how they're used in the context of this cluster-based text-file analysis utility.