BGP Peering
What are the mechanics of one BGP speaker peering with another speaker? What substrate protocols does BGP use to transport routing information? This section describes various aspects of BGP peering.
NOTE
While BGP is most often run on routers, which are also responsible for forwarding traffic, in some cases other devices may run BGP as well. Whether to simply gather information about the routing tables being carried in BGP, or to carry routing information between routers. Since this is the case, we will sometimes refer to devices which are running BGP, rather than routers specifically. A device which is running BGP is called a BGP speaker, and two BGP speakers that form a BGP connection for the purpose of exchanging routing information are called BGP peers or neighbors.
BGP Transport
How does BGP carry information about reachable destinations between the devices (routers) running BGP? How is the information encoded when it's transported between peers?
Transporting Data Between Peers
A Transmission Control Protocol (TCP) transport connection is set up between a pair of BGP speakers at the beginning of the peering session, and is maintained throughout the peering session. Using TCP to transport BGP information allows BGP to delegate error control, reliable transport, sequencing, retransmission, and peer aliveness issues to TCP itself, and focus instead on properly processing the routing information exchanged with its peers.
When a BGP speaker first initializes, it uses a local ephemeral TCP port, or random port number greater than 1024, and attempts to contact each configured BGP speaker on TCP port 179 (the well known BGP port). The speaker initiating the session performs an active open, while the peer performs a passive open. It's possible for two speakers to attempt to connect to one another at the same time; this is known as a connection collision. When two speakers collide, each speaker compares the local router ID to the router ID of the colliding neighbor. The BGP speaker with the higher router ID value drops the session on which it is passive, and the BGP speaker with the lower router ID value drops the session on which it is active (i.e., only the session initiated by the BGP speaker with the larger router ID value is preserved).
BGP Routes and Formatting Data
A BGP route is defined as a unit of information that pairs a set of destinations with the attributes of a path to those destinations. The set of destinations is referred to by BGP as the Network Layer Reachability Information (NLRI), and is a set of systems whose IP addresses are represented by one IP prefix.
BGP uses update messages to advertise new routing information, withdraw previously advertised routes, or both. New routing information includes a set a BGP attributes and one or more prefixes with which those attributes are associated. While multiple routes with a common set of attributes can be advertised in a single BGP update message, new routes with different attributes must be advertised in separate update messages.
There are two mechanisms to withdraw routing information in BGP: To withdraw routes explicitly one or more prefixes that are no longer reachable (unfeasible) are included in the withdrawn routes field of an update message (the update message may contain one or more new routes as well). No additional information, such as associated path attributes (e.g., AS Path) is necessary for the routes being withdrawn. Alternatively, because a BGP speaker only advertises a single best route for each reachable destination, a BGP update message that contains a prefix that has already been advertised by the peer, but with a new set of path attributes, serves an implicit withdraw for earlier advertisements of that prefix.
A BGP update message is made up of a series of type-length-vectors (TLVs). Attributes carried within the BGP message provide information about one or more prefixes that follow; attributes are described in the BGP Attributes section later in this chapter.
BGP data, as it's transported between peers, is formatted as shown in Figure 1-5.
Figure 1-5: Encoding Information in a BGP Packet
As previously noted, one interesting aspect of this packet format is that while only a single set of attributes may be carried in each update message, many prefixes sharing that common set of attributes may be carried in a single update. This leads to the concept of update packing, which simply means placing two or more prefixes with the same attributes in a single BGP update message.
Interior and Exterior Peering
Beyond the mechanics of building peering relationships, and transporting data between two BGP speakers, there are two types of peering relationships within BGP: interior peering and exterior peering. BGP sessions between peers within a single autonomous system are referred to as interior BGP, or iBGP, sessions, while BGP running between peers in different autonomous system are referred to as exterior BGP, or eBGP sessions.
There are four primary differences between iBGP and eBGP peering relationships:
Routes learned from an iBGP peer are not (normally) advertised to other iBGP peers. This prevents routing loops within the autonomous system, as discussed in the section BGP Path Vector Implementation, above.
The attributes of paths learned from iBGP peers are not (normally) changed to impact the path selected to reach some outside network. The best path chosen throughout the autonomous system must be consistent to prevent routing loops within the network.
The AS Path is not manipulated when advertising a route to an iBGP peer; the local AS is added to the AS Path only when advertising a route to an eBGP peer.
The BGP next hop is normally not changed when advertising a route to an iBGP peer; it is always changed to the local peer termination IP address when a route is being advertised to an eBGP peer.
These last two points, the BGP next hop is normally changed when advertising a route to an eBGP peer, while it is left unchanged when advertising a route to an iBGP peer, and the addition of the local autonomous system in the AS Path, are illustrated using Figure 1-6.
Figure 1-6: eBGP and iBGP peering
In Figure 1-6, the 10.1.1.0/24 prefix originates on router A with an empty AS Path list and a BGP next hop of router A. Router A then advertises this prefix to router B. Router B, when advertising the route to router C, adds AS65100 to the AS Path list and sets the BGP next hop to 10.1.3.1, because router C is an exterior peer (a peer outside the autonomous system). Router C then advertises the 10.1.1.0/24 prefix to router D without changing the AS Path or the BGP next hop, since router D is an interior peer (a peer within the same autonomous system). Router D will need a path to router B in order to consider this prefix reachable; generally, the BGP next hop reachability information is provided by advertising the link between B and C through an interior gateway protocol, or through iBGP originating the link as a prefix from C into AS65100.
All BGP peers are connected over a TCP transport session. As such, IP reachability must exist before a pair of BGP speakers can peer with one another. For iBGP sessions, reachability between speakers typically is provided using an interior gateway protocol. EBGP peers are normally directly connected over a single hop (across a single link), with no intervening routers, and therefore require no additional underlying routing information. There are mechanisms for connecting eBGP peers across multiple hops; these are covered in more detail in Multipath section of Chapter 7.
Converting an understanding of BGP into practical, running configurations, isn't always as easy at it seems, so we will often provide sample configurations for networks used as examples. These examples will be shown using Cisco IOS Software as the operating system. For the network in Figure 1-5, the following configurations, along with some explanation of the various parts of the configuration, are provided.
! hostname router-a ! router BGP 65100 ! enables the BGP process and defines the local AS number network 10.1.1.0 mask 255.255.255.0 ! the above line causes router-a to originate the 10.1.1.0/24 ! prefix in BGP neighbor 10.1.2.2 remote-as 65100 ! configures an iBGP session with router-b ! hostname router-b ! router bgp 65100 ! The number following the router bgp command above is ! the local autonomous system number neighbor 10.1.2.1 remote-as 65100 ! configures an iBGP session with router-a neighbor 10.1.3.2 remote-as 65200 ! configures an eBGP session with router-c; note the AS ! number in this command does not match the AS number of ! the local router ! hostname router-c ! router bgp 65200 neighbor 10.1.3.1 remote-as 65100 ! configures an eBGP session with router-b; note the AS ! number in this command does not match the AS number of ! the local router neighbor 10.1.4.2 remote-as 65200 ! configures an iBGP session with router-d network 10.1.3.0 mask 255.255.255.0 ! configures this router to advertise the 10.1.3.0/24 ! prefix to router-d, so router-d will be able to reach the ! BGP nexthop towards 10.1.1.0/24; reachability could also ! be provided through an interior gateway protocol or static ! routing ! hostname router-d ! router bgp 65200 neighbor 10.1.4.1 remote-as 65200 ! configures an iBGP session with router-c
With these configurations in place, router D should learn the 10.1.1.0/24 prefix from router C, and install it as a reachable destination within its routing table.
BGP Notifications
Throughout the duration of a BGP session between two BGP speakers it's possible that one of the two peers will send some data in error, or send malformed data, or data the other speaker doesn't understand. The easiest remedy in any of these situations is to simply shut the BGP session down, but a simple session shutdown doesn't provide any diagnostic information to the speaker that transmitted the information that triggered the peering session to shut down, and therefore no corrective action can be taken. To provide the information needed to take corrective action, BGP includes Notifications, which should be sent by the BGP speaker closing the session.
Notifications consist of three parts:
A notification code
A notification subcode
A variable length data field
The notification code indicates what type of error occurred:
An error occurred in a message header, error code 1
An error occurred in the Open message, error code 2
An error occurred in an Update message, error code 3
The hold timer expired, error code 4
An error occurred in the finite state machine, error code 5
Cease, error code 6
The subcode provides more information about the error. For instance, where in the Open message the error was. The BGP speaker transmitting the Notification can fill in the data field with information such as the actual part of the Open message causing the error. While the data field is variable in length, there is no length field in the Notification code format. This is because the length of the data field is implied by the length of the complete message.
Message Header Errors
Message header errors generally indicate problems in the packet format. Since TCP is a reliable transport service, message header errors should be very rare, although it is possible for an implementation of BGP to malform a packet, causing this type of error. Three subcodes are defined in the base BGP specification:
Connection not synchronized
Bad message length
Bas message type
Open Message Errors
Notifications transmitted while two BGP peers are opening a session are generally the result of misconfiguration, rather than packet level errors or problems in a BGP implementation.
Unsupported version number, which means the BGP peer has transmitted a BGP version this speaker does not support
Bad peer autonomous system; the peer has claimed an autonomous system number which isn't valid
Bad BGP Identifier; the peer has transmitted a BGP router ID which is invalid
Unsupported optional parameter; the peer has indicated it wants to use some optional parameter the receiver doesn't support
Authentication failure; the peer is sending packets which are encrypted or authenticated in some way, but the authentication check is failing
Unacceptable hold time
Update Message Errors
As BGP peers exchange updates, a number of errors can occur which make it impossible for one speaker to process an update transmitted by the other speaker. These include:
Malformed attribute list; the list of attributes included in the update packet has some error which makes it unreadable by the receiver
Unrecognized well-known attribute; the sender is including an attribute the receiver must be able to process, but does not recognize
Missing well-known attribute; the sender is not including a required well known attribute
Attribute flags error; the flags included with an attribute are not formed correctly (generally flags carry various options which apply to the attribute)
Attribute length error; an attribute is either too long or too short
Invalid Origin; the origin code attribute is set to an invalid value
Invalid Next Hop; the Next Hop attribute is set to an invalid value
Optional attribute error; an optional attribute is malformed
Invalid network field; a prefix included in the update is invalid
Malformed AS Path; the AS Path included in the update is invalid
Cease
The Cease code indicates to the receiver that the peer for some reason has chosen to close the BGP connection. The Cease Notification is not sent if a fatal error occurs, but rather, provides a graceful mechanism to shutdown a BGP connection.
BGP Capabilities
There are various extensions to BGP which in order to function correctly, require both BGP speakers in a session to support; how does a BGP speaker know when another BGP speaker it's peering with supports these extensions to BGP? Through BGP capabilities, which are negotiated when a BGP session is started.
NOTE
The ability for one BGP speaker to advertise capabilities to a peer BGP speaker is described in RFC3392, Capabilities Advertisement with BGP-4. draft-ietf-idr-dynamic-cap describes a way in which these capabilities can be advertised dynamically not only on session startup, but after a session is establised.
When first initiating a session, a BGP speaker sends an Open message describing various parameters, including a set of capability codes, one for each optional capability it supports. Capability codes are defined for things such as:
Route refresh, capability code 0 and 2
Multiprotocol extensions, capability code 1
Cooperative route filtering, capability code 3
Dynamic capability exchange, capability code 6
Graceful restart, capability code 64
Four octet autonomous system numbers
The applicability and value of these and other BGP capabilities and extensions with be discussed in later sections.
If a BGP speaker receives a capability code it does not support when enabling a peering with another BGP speaker, it will send a Notification message to its peer, which shuts the session down, with a notification subcode indicating that the peer requested a capability the local BGP speaker doesn't support. The receiving peer can either break off communications altogether on receipt of a notification code indicating an unsupported capability, or it can attempt to peer again without that capability enabled.
The BGP Peering Process
There are a lot of elements to the BGP peering process; when a BGP speaker begins a session with a new peer, it must determine if it is peering with an external neighbor or an internal neighbor, it must negotiate capabilities, and do a number of other things. The BGP session state machine in Figure 1-7 illustrates the process in an attempt to bring all these different actions together in one place.
Figure 1-7: The BGP Peering State Machine