Network Programming with Perl: Input/Output Basics
Perl and network programming were made for each other. Perl's strong text-processing abilities combine with a flexible I/O subsystem to create an environment that is ideal for interprocess communication. This, combined with its native support for the Berkeley Sockets protocol, make Perl an excellent choice for network applications.
In this chapter, you'll get the background information you'll need to write TCP/IP applications in Perl. Review Perl's input/output (I/O) system using the language's built-in function calls, and then using the object-oriented (OO) extensions of Perl5. Filehandles are the fundamental object used for Perl input/output operations, and offer both line-oriented and byte-stream-oriented modes.
Perl and Networking
Why would you want to write networking applications in Perl?
The Internet is based on Transmission Control Protocol/Internet Protocol (TCP/IP), and most networking applications are based on a straightforward application programming interface (API) to the protocol known as Berkeley sockets. The success of TCP/IP is due partly to the ubiquity of the sockets API, which is available for all major languages including C, C++, Java, BASIC, Python, COBOL, Pascal, FORTRAN, and, of course, Perl. The sockets API is similar in all these languages. There may be a lot of work involved porting a networking application from one computer language to another, but porting the part that does the socket communications is usually the least of your problems.
For dedicated Perl programmers, the answer to the question that starts this chapter is clear—because you can! But for those who are not already members of the choir, one can make a convincing argument that not only is networking good for Perl, but Perl is good for networking.
A Language Built for Interprocess Communication
Perl was built from the ground up to make it easy to do interprocess communication (the thing that happens when one program talks to another). As we shall see later in this chapter, in Perl there is very little difference between opening up a local file for reading and opening up a communications channel to read data from another local program. With only a little more work, you can open up a socket to read data from a program running remotely on another machine somewhere on the Internet. Once the communications channel is open, it matters little whether the thing at the other end is a file, a program running on the same machine, or a program running on a remote machine. Perl's input/output functions work in the same way for all three types of connections.
A Language Built for Text Processing
Another Perl feature that makes it good for network applications is its powerful integrated regular expression-matching and text-processing facilities. Much of the data on the Internet is text based (the Web, for instance), and a good portion of that is unpredictable, line-oriented data. Perl excels at manipulating this type of data, and is not vulnerable to the type of buffer overflow and memory overrun errors that make networking applications difficult to write (and possibly insecure) in languages like C and C++.
An Open Source Project
Perl is an Open Source project, one of the earliest. Examining other people's source code is the best way to figure out how to do something. Not only is the source code for all of Perl's networking modules available, but the whole source tree for the interpreter itself is available for your perusal. Another benefit of Perl's openness is that the project is open to any developer who wishes to contribute to the library modules or to the interpreter source code. This means that Perl adds features very rapidly, yet is stable and relatively bug free.
The universe of third-party Perl modules is available via a distributed Web-based archive called CPAN, for Comprehensive Perl Archive Network. You can search CPAN for modules of interest, download and install them, and contribute your own modules to the archive. The preface to this book describes CPAN and how to reach it.
Object-Oriented Networking Extensions
Perl5 has object-oriented extensions, and although OO purists may express dismay over the fast and loose way in which Perl has implemented these features, it is inarguable that the OO syntax can dramatically increase the readability and maintainability of certain applications. Nowhere is this more evident than in the library modules that provide a high-level interface to networking protocols. Among many others, the IO::Socket modules provide a clean and elegant interface to Berkeley sockets; Mail::Internet provides cross-platform access to Internet mail; LWP gives you everything you need to write Web clients; and the Net::FTP and Net::Telnet modules let you write interfaces to these important protocols.
Security
Security is an important aspect of network application development, because by definition a network application allows a process running on a remote machine to affect its execution. Perl has some features that increase the security of network applications relative to other languages. Because of its dynamic memory management, Perl avoids the buffer overflows that lead to most of the security holes in C and other compiled languages. Of equal importance, Perl implements a powerful "taint" check system that prevents tainted data obtained from the network from being used in operations such as opening files for writing and executing system commands, which could be dangerous.
Performance
A last issue is performance. As an interpreted language, Perl applications run several times more slowly than C and other compiled languages, and about par with Java and Python. In most networking applications, however, raw performance is not the issue; the I/O bottleneck is. On I/O-bound applications Perl runs just as fast (or as slowly) as a compiled program. In fact, it's possible for the performance of a Perl script to exceed that of a compiled program. Benchmarks of a simple Perl-based Web server that we develop in Chapter 12 are several times better than the C-based Apache Web server.
If execution speed does become an issue, Perl provides a facility for rewriting time-critical portions of your application in C, using the XS extension system. Or you can treat Perl as a prototyping language, and implement the real application in C or C++ after you've worked out the architectural and protocol details.