3.2 Internet Telephony
One of the application areas gathering the most attention is Internet telephony. The global telephone network is increasingly connected to the Internet; this connectivity is providing signaling channels for phone switches, data channels for actual voice calls, and new customer functions, especially ones that involve both the Internet and the phone network.
Two main protocols are used for voice calls, the Session Initiation Protocol (SIP) [Rosen-berg et al., 2002] and H.323. Both can do far more than set up simple phone calls. At a minimum, they can set up conferences (Microsoft's NetMeeting can use both protocols); SIP is also the basis for some Internet/telephone network interactions, and for some instant messaging protocols.
3.2.1 H.323
H.323 is the ITU's Internet telephony protocol. In an effort to get things on the air quickly, the ITU based its design on Q.931, the ISDN signaling protocol. But this has added greatly to the complexity, which is only partially offset by the existence of real ISDN stacks.
The actual call traffic is carried over separate UDP ports. In a firewalled world, this means that the firewall has to parse the ASN.1 messages (see Section 3.6) to figure out what port numbers should be allowed in. This isn't an easy task, and we worry about the complexity of any firewall that is trying to perform it.
H.323 calls are not point-to-point. At least one intermediate servera telephone company?is needed; depending on the configuration and the options used, many more may be employed.
3.2.2 SIP
SIP, though rather complex, is significantly simpler than H.323. Its messages are ASCII; they resemble HTTP, and even use MIME and S/MIME for transporting data.
SIP phones can speak peer-to-peer; however, they can also employ the same sorts of proxies as H.323. Generally, in fact, this will be done. Such proxies can simplify the process of passing SIP through a firewall, though the actual data transport is usually direct between the two (or more) endpoints. SIP also has provisions for very strong securityperhaps too strong, in some cases, as it can interfere with attempts by the firewall to rewrite the messages to make it easier to pass the voice traffic via an application-level gateway.
Some data can be carried in the SIP messages themselves, but as a rule, the actual voice traffic uses a separate transport. This can be UDP, probably carrying Real-Time Transport Protocol (RTP), TCP, or SCTP.
We should note that for both H.323 and SIP, much of the complexity stems from the nature of the problem. For example, telephone users are accustomed to hearing "ringback" when they dial a number and the remote phone is ringing. Internet telephones have to do the same thing, which means that data needs to be transported even before the call is completed. Interconnection to the existing telephone network further complicates the situation.