Showing posts with label load balancing. Show all posts
Showing posts with label load balancing. Show all posts

Unequal-bandwidth EBGP load balancing

EIGRP was always described as the only routing protocol that can do unequal-cost load sharing. As it turns out, BGP is another one (although it's way more limited than EIGRP). For example, if you have two links into a neighbor AS, you can load-share across them proportionally to their bandwidth. You can find all the details, sample configurations and router printouts in the CT3 wiki.

Eternal question: unequal cost load-balancing

It's amazing how many people have load-balancing-related issues. I get asked the same question over and over:

Is it possible to have unequal-cost load balancing with OSPF?

The answer is invariably: NO, you cannot do it with OSPF. However, you can use MPLS Traffic Engineering to establish two tunnels to the same remote OSPF router. Both tunnels will be used for all destinations reachable through the remote OSPF router (tunnel tail-end) even though OSPF selects only a single best path to it. A simple scenario is described in my IP Corner article “Perfect Load-Balancing: How Close Can You Get?

This article is part of You've asked for it series.

Running BGP across parallel serial links

Whenever I'm describing the idea of running BGP across parallel serial links with duplicate IP addresses (like I did in the November IP Corner article, Load Balancing in BGP Networks, section External BGP Load Balancing), there's always someone asking “does it really work?” … so I'm enclosing a tested working configuration.

AS 11AS 12
interface Serial1/1
 ip address 10.0.1.9 255.255.255.252
 encapsulation ppp
!
interface Serial1/2
 ip address 10.0.1.9 255.255.255.252
 encapsulation ppp
!
router bgp 11
 bgp log-neighbor-changes
 neighbor 10.0.1.10 remote-as 12
interface Serial1/1
 ip address 10.0.1.10 255.255.255.252
 encapsulation ppp
!
interface Serial1/2
 ip address 10.0.1.10 255.255.255.252
 encapsulation ppp
!
router bgp 12
 bgp log-neighbor-changes
 network 172.16.0.0
 neighbor 10.0.1.9 remote-as 11
!
ip route 172.16.0.0 255.255.0.0 Null0
Here are a few printouts. First the BGP neighbors …
AS11#show ip bgp summary ¦ begin Neighbor
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.1.10 4 12 13 12 2 0 0 00:09:02 1
… then the BGP routing table …
R2#show ip bgp | begin Network
   Network Next Hop Metric LocPrf Weight Path
*> 172.16.0.0 10.0.1.10 0 0 12 i
… and finally the internal details of the CEF entry (that's the only way to actually verify that the load balancing is taking place):
AS11#show ip cef 172.16.0.0 internal
172.16.0.0/16, version 35, epoch 0, per-destination sharing
0 packets, 0 bytes
  tag information from 10.0.1.10/32, shared
    local tag: 17
  via 10.0.1.10, 0 dependencies, recursive
    next hop 10.0.1.10, Serial1/1 via 10.0.1.10/32
    valid adjacency
    tag rewrite with Se1/1, point2point, tags imposed: {}
 
  Recursive load sharing using 10.0.1.10/32
  Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 2)
 
  Hash OK Interface Address Packets Tags imposed
  1 Y Serial1/1 point2point 0 none
  2 Y Serial1/2 point2point 0 none
  3 Y Serial1/1 point2point 0 none
  4 Y Serial1/2 point2point 0 none
  5 Y Serial1/1 point2point 0 none
  6 Y Serial1/2 point2point 0 none
  7 Y Serial1/1 point2point 0 none
  8 Y Serial1/2 point2point 0 none
  9 Y Serial1/1 point2point 0 none
  10 Y Serial1/2 point2point 0 none
  11 Y Serial1/1 point2point 0 none
  12 Y Serial1/2 point2point 0 none
  13 Y Serial1/1 point2point 0 none
  14 Y Serial1/2 point2point 0 none
  15 Y Serial1/1 point2point 0 none
  16 Y Serial1/2 point2point 0 none

Load balancing with BGP

A while ago, people believed you cannot do load balancing with BGP (they also believed the Earth was flat a few years before that). While that's no longer true, designing good BGP load balancing is still a complex undertaking. In the November IP Corner article, Load Balancing in BGP Networks I'm describing almost all options you have to implement BGP-based load balancing, both within your autonomous system as well as across an AS boundary.

Unequal load split with static routes

Unequal load-sharing with static routes is almost impossible as there is no configuration command to assign non-default traffic share count to a static route. For example, if you configure two default routes, one pointing to a low-speed interface and another one pointing to a high-speed interface, there is no mechanism to force majority of the traffic onto the high-speed link (IOS ignores interface bandwidth when calculating load sharing ratios).

You can, howerer, use a workaround: if you configure multiple routes for the same prefix pointing to the same interface, that interface will attract proportionally more outbound traffic.

For example, let's assume you have two point-to-point serial subinterfaces, one three times as fast as the other:

interface Serial0/0/0.100 point-to-point
bandwidth 1000
ip address 172.16.1.1 255.255.255.252
!
interface Serial0/0/0.200 point-to-point
ip address 172.16.1.5 255.255.255.252
bandwidth 3000

To shift more traffic onto Serial0/0/0.200, you can create two default routes pointing to the second interface, one pointing to the interface itself, the other one to the next-hop router:

ip route 0.0.0.0 0.0.0.0 Serial0/0/0.100
ip route 0.0.0.0 0.0.0.0 Serial0/0/0.200
ip route 0.0.0.0 0.0.0.0 172.16.1.6

This setup will give you a 1:2 sharing ratio. To shift even more traffic to the higher-speed interface, one has to get more creative

  • Create a bogus host route for a bogus next-hop pointing to the actual next-hop router (and make sure you don't advertise the bogus route into your routing protocols).
  • Configure yet another static route pointing to the bogus next-hop. Due to recursive lookup done by Cisco IOS, the bogus next-hop will be resolved into the actual next-hop IP address.

In our example, you could use:

ip route 10.255.255.1 255.255.255.255 172.16.1.6
ip route 0.0.0.0 0.0.0.0 10.255.255.1

The results are as expected: the traffic split is the desired 1:3 ratio

a1#show ip route 0.0.0.0 0.0.0.0
Routing entry for 0.0.0.0 0.0.0.0, supernet
Known via "static", distance 1, metric 0 (connected), candidate default path
Routing Descriptor Blocks:
172.16.1.6
Route metric is 0, traffic share count is 1
10.255.255.1
Route metric is 0, traffic share count is 1
* directly connected, via Serial0/0/0.100
Route metric is 0, traffic share count is 1
directly connected, via Serial0/0/0.200
Route metric is 0, traffic share count is 1

a1#show ip cef 0.0.0.0 0.0.0.0 internal
0.0.0.0/0, version 43, epoch 0, attached, per-destination sharing
0 packets, 0 bytes
via 172.16.1.6, 0 dependencies, recursive
traffic share 1
valid adjacency
via 10.255.255.1, 0 dependencies, recursive
traffic share 1
valid adjacency
via Serial0/0/0.100, 0 dependencies
traffic share 1
valid adjacency
via Serial0/0/0.200, 0 dependencies
traffic share 1
valid adjacency

0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 (refcount 1)

Hash OK Interface Address Packets
1 Y Serial0/0/0.200 point2point 0
2 Y Serial0/0/0.200 point2point 0
3 Y Serial0/0/0.100 point2point 0
4 Y Serial0/0/0.200 point2point 0
5 Y Serial0/0/0.200 point2point 0
6 Y Serial0/0/0.200 point2point 0
7 Y Serial0/0/0.100 point2point 0
8 Y Serial0/0/0.200 point2point 0
9 Y Serial0/0/0.200 point2point 0
10 Y Serial0/0/0.200 point2point 0
11 Y Serial0/0/0.100 point2point 0
12 Y Serial0/0/0.200 point2point 0
13 Y Serial0/0/0.200 point2point 0
14 Y Serial0/0/0.200 point2point 0
15 Y Serial0/0/0.100 point2point 0
16 Y Serial0/0/0.200 point2point 0

Note: this article is part of You've asked for it series.

Unequal cost load-sharing

One of the most commonly asked load-sharing-related questions is "can I load-share traffic across unequal-cost links". In general, the answer is no. In order to load-share the traffic, you need more than one path to the destination and the only way to get multiple routes toward a destination in the IP routing table is to make them equal-cost (the only notable exception being EIGRP that supports unequal-cost load-sharing with the variance parameter).

There are, however, two cases where you can force unequal traffic split across equal-cost paths toward a destination: when using inter-AS BGP with the link bandwidth parameter and when using unequal-bandwidth traffic-engineering tunnels.

Note: You can read more about load sharing with MPLS TE in my IP Corner article Perfect Load-Balancing: How Close Can You Get?Due to the way MPLS TE autoroute is implemented in Cisco IOS, all tunnels toward the same destination appear as equal-cost paths, even when their TE bandwidths are not the same. For example, using a simple TE configuration ...

interface Tunnel0
ip unnumbered Loopback0
tunnel destination 172.16.0.21
tunnel mode mpls traffic-eng
tunnel mpls traffic-eng autoroute announce
tunnel mpls traffic-eng priority 7 7
tunnel mpls traffic-eng bandwidth 300
tunnel mpls traffic-eng path-option 1 explicit identifier 1
no routing dynamic
!
interface Tunnel1
ip unnumbered Loopback0
tunnel destination 172.16.0.21
tunnel mode mpls traffic-eng
tunnel mpls traffic-eng autoroute announce
tunnel mpls traffic-eng priority 7 7
tunnel mpls traffic-eng bandwidth 500
tunnel mpls traffic-eng path-option 1 explicit identifier 2
no routing dynamic
... you get two equal-cost paths in your IP routing table even though the tunnel mpls traffic-eng bandwidths are different:
a1#show ip route ospf
172.16.0.0 255.255.0.0 is variably subnetted, 6 subnets, 2 masks
O 172.16.0.21 255.255.255.255 [110/51] via 0.0.0.0, 00:11:06, Tunnel0
[110/51] via 0.0.0.0, 00:11:06, Tunnel1
O 172.16.0.22 255.255.255.255 [110/52] via 0.0.0.0, 00:11:06, Tunnel0
[110/52] via 0.0.0.0, 00:11:06, Tunnel1
When transferring the IP routing table into the CEF table, the router takes MPLS TE bandwidth in consideration, resulting in unequal traffic split proportional to the MPLS TE bandwidth:
a1#show ip cef 172.16.0.21 internal
172.16.0.21/32, version 55, epoch 1, per-destination sharing
0 packets, 0 bytes
tag information set
local tag: tunnel-head
via 0.0.0.0, Tunnel0, 0 dependencies
traffic share 3
next hop 0.0.0.0, Tunnel0
valid adjacency
tag rewrite with Tu0, point2point, tags imposed: {}
via 0.0.0.0, Tunnel1, 0 dependencies
traffic share 5
next hop 0.0.0.0, Tunnel1
valid adjacency
tag rewrite with Tu1, point2point, tags imposed: {}

0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 (refcount 1)

Hash OK Interface Address Packets Tags imposed
1 Y Tunnel0 point2point 0 {}
2 Y Tunnel1 point2point 0 {}
3 Y Tunnel0 point2point 0 {}
4 Y Tunnel1 point2point 0 {}
5 Y Tunnel0 point2point 0 {}
6 Y Tunnel1 point2point 0 {}
7 Y Tunnel0 point2point 0 {}
8 Y Tunnel1 point2point 0 {}
9 Y Tunnel0 point2point 0 {}
10 Y Tunnel1 point2point 0 {}
11 Y Tunnel0 point2point 0 {}
12 Y Tunnel1 point2point 0 {}
13 Y Tunnel1 point2point 0 {}
14 Y Tunnel1 point2point 0 {}
15 Y Tunnel1 point2point 0 {}
16 Y Tunnel1 point2point 0 {}
Note: this article is part of You've asked for it series.

Per-port CEF load sharing

In designs with very low number of IP hosts, no per-destination load-sharing algorithm will work adequately. Consider, for example, an extranet design where a large number of IP hosts are NAT-ed to a single IP address which then accesses a single remote server.


In this design, all the traffic flows between a single pair of IP addresses, making per-destination load-sharing unusable.

Cisco has addressed this problem in IOS release 12.4(11)T with per-port CEF load sharing, which extends the CEF hashing function to include source and/or destination TCP or UDP port.

The global configuration command that enables per-port CEF load-sharing is ip cef load-sharing algorithm [ include-ports [source] [dest] ] seed. To test it, use the show ip cef exact-route command, which now supports source and destination port numbers. For example:
a1(config)#ip cef load-sharing algorithm include-ports source dest 22

a1#show ip cef exact-route 10.0.0.10 src-port 35 192.168.0.2 dest-port 80
10.0.0.10 -> 192.168.0.2 : Serial0/0/0.100 (next hop 172.16.1.2)
a1#show ip cef exact-route 10.0.0.10 src-port 36 192.168.0.2 dest-port 80
10.0.0.10 -> 192.168.0.2 : Serial0/0/0.200 (next hop 172.16.1.6)
a1#show ip cef exact-route 10.0.0.10 src-port 37 192.168.0.2 dest-port 80
10.0.0.10 -> 192.168.0.2 : Serial0/0/0.100 (next hop 172.16.1.2)

Per-destination or per-packet CEF load sharing?

Cisco Express Forwarding (CEF) can perform per-packet or per-destination (actually source/destination IP address pair) load-sharing with no performance degradation (without CEF, per-packet load-sharing requires process switching). Even though there is no performance impact on the router, per-packet load sharing will almost always result in out-of-order packets. The packet reordering might degrade TCP throughput in high-speed environments (in low-speed/few-flows scenarios, per-packet load-sharing actually improves the per-flow throughput) or severely impact applications that cannot survive out-of-order packet delivery, such as Fast Sequenced Transport for SNA over IP or voice/video streams.

To configure per-packet load-sharing, use the ip load-sharing per-packet interface configuration command (default is per-destination). This command has to be configured on all outgoing interfaces over which the traffic is load-shared.

Note: The switch between the load-sharing modes is not immediate; sometimes you have to wait a few seconds for the ip load-sharing command to take effect, worst case a manual clearing of the CEF table (clear ip cef address) is required.

Fine-tuning CEF load-sharing

In environments with a low number of IP hosts you have to fine-tune the CEF load-sharing algorithm to ensure that the traffic is spread between all parallel paths. A typical scenario is a primary-backup data center setup with pairs of replicating servers, as shown in the figure below.


In these cases, you have to try different values of seed parameter of the CEF universal algorithm.
For example, if you have two equal-cost paths between networks 10.0.0.0/24 and 192.168.0.0/24 ...
a1#show ip cef 192.168.0.0 detail
192.168.0.0/24, version 33, epoch 0, per-destination sharing
0 packets, 0 bytes
via 172.16.1.6, Serial0/0/0.200, 0 dependencies
traffic share 1
next hop 172.16.1.6, Serial0/0/0.200
valid adjacency
via 172.16.1.2, Serial0/0/0.100, 0 dependencies
traffic share 1
next hop 172.16.1.2, Serial0/0/0.100
valid adjacency
... you might want the traffic between 10.0.0.1 and 192.168.0.1 to flow over a different link than the traffic between 10.0.0.2 and 192.168.0.2. The command that will help you is the show ip cef exact-route source destination. In our example, both traffic flows would go over the same serial link:
a1#show ip cef exact-route 10.0.0.1 192.168.0.1
10.0.0.1 -> 192.168.0.1 : Serial0/0/0.100 (next hop 172.16.1.2)
a1#show ip cef exact-route 10.0.0.2 192.168.0.2
10.0.0.2 -> 192.168.0.2 : Serial0/0/0.100 (next hop 172.16.1.2)
However, by changing the seed parameter of the ip cef load-sharing algorithm universal command, you can influence the CEF hashing function, eventually reaching a state where the traffic flows are spread between both WAN links:
a1(config)#ip cef load-sharing algorithm universal 1
a1(config)#^Z
a1#show ip cef exact-route 10.0.0.1 192.168.0.1
10.0.0.1 -> 192.168.0.1 : Serial0/0/0.100 (next hop 172.16.1.2)
a1#show ip cef exact-route 10.0.0.2 192.168.0.2
10.0.0.1 -> 192.168.0.2 : Serial0/0/0.200 (next hop 172.16.1.6)

Perfect load balancing with MPLS Traffic Engineering

In the article Perfect Load-Balancing: How Close Can You Get?, you'll find in-depth information on how you can use MPLS traffic engineering to load-balance traffic in highly redundant designs.

CEF per-destination load sharing algorithms

According to the Cisco IOS documentation, you can select between the original and the universal CEF load sharing algorithm with the ip cef load-sharing algorithm name parameter global configuration command (we'll leave the tunnel algorithm aside for the moment). Of course, they don't tell you what you select.

The original algorithm used only the source and destination IP addresses to get the 4-bit hash entry (see the CEF Load Sharing Details for more information), which could result in suboptimal network utilization in some border cases (if anyone wants to know why, leave me a comment). The universal algorithm adds a router-specific value to the hash function, ensuring that the same source-destination pair will hash into a different 4-bit value on different boxes. If you really want to fine-tune the hash function, you can even specify the value to be added with the last option of the ip cef load-sharing algorithm command.

CEF load sharing details

I had to investigate the details of CEF load sharing for one of my upcoming article and found (yet again) that the details are rather undocumented in official documentation. So, this is how it works (in case you ever need to know):

  • For every CEF entry (IP route) where there are multiple paths to the destination, the router creates a 16-row hash table, populating the entries with pointers to individual paths. The hash table can be inspected with the show ip cef prefix internal command.
  • The load balancing ratio is approxiated by number of entries in the hash table belonging to each path. If you have unequal-cost load balancing (EIGRP based on composite metrics and MPLS TE tunnels based on requested bandwidth), individual paths will be associated with different number of rows.
  • If you configure per-destination load balancing, the source and destination IP address in the incoming IP packet are hashed into a 4-bit value that selects the outgoing path in the CEF has table.

If this sounds confusing, here are two examples to make it easier: if you have two equal-cost paths to the same destination, each path will have eight entries in the hash table.

a1#show ip route 192.168.0.0
Routing entry for 192.168.0.0 255.255.255.0
Known via "ospf 1", distance 110, metric 51, type intra area
Last update from 172.16.0.21 on Serial0/0/0.100, 00:00:05 ago
Routing Descriptor Blocks:
* 172.16.0.21, from 172.16.0.22, 00:00:05 ago, via Serial0/0/0.100
Route metric is 51, traffic share count is 1
172.16.0.21, from 172.16.0.22, 00:00:05 ago, via Serial0/0/0.200
Route metric is 51, traffic share count is 1
a1#show ip cef 192.168.0.0 internal
192.168.0.0/24, version 33, epoch 0, per-destination sharing
0 packets, 0 bytes
via 172.16.0.21, Serial0/0/0.100, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.100
valid adjacency
via 172.16.0.21, Serial0/0/0.200, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.200
valid adjacency

0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 1)

Hash OK Interface Address Packets
1 Y Serial0/0/0.100 point2point 0
2 Y Serial0/0/0.200 point2point 0
3 Y Serial0/0/0.100 point2point 0
4 Y Serial0/0/0.200 point2point 0
5 Y Serial0/0/0.100 point2point 0
6 Y Serial0/0/0.200 point2point 0
7 Y Serial0/0/0.100 point2point 0
8 Y Serial0/0/0.200 point2point 0
9 Y Serial0/0/0.100 point2point 0
10 Y Serial0/0/0.200 point2point 0
11 Y Serial0/0/0.100 point2point 0
12 Y Serial0/0/0.200 point2point 0
13 Y Serial0/0/0.100 point2point 0
14 Y Serial0/0/0.200 point2point 0
15 Y Serial0/0/0.100 point2point 0
16 Y Serial0/0/0.200 point2point 0

However, if you have three equal-cost paths to the destination, each path will have only five entries and the hash table will have 15 rows instead of 16.

a1#show ip route 192.168.0.0
Routing entry for 192.168.0.0 255.255.255.0
Known via "ospf 1", distance 110, metric 51, type intra area
Last update from 10.0.0.6 on FastEthernet0/0, 00:00:02 ago
Routing Descriptor Blocks:
* 172.16.0.21, from 172.16.0.22, 00:00:02 ago, via Serial0/0/0.100
Route metric is 51, traffic share count is 1
172.16.0.21, from 172.16.0.22, 00:00:02 ago, via Serial0/0/0.200
Route metric is 51, traffic share count is 1
10.0.0.6, from 172.16.0.22, 00:00:02 ago, via FastEthernet0/0
Route metric is 51, traffic share count is 1
a1#show ip cef 192.168.0.0 internal
192.168.0.0/24, version 44, epoch 0, per-destination sharing
0 packets, 0 bytes
via 172.16.0.21, Serial0/0/0.100, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.100
valid adjacency
via 172.16.0.21, Serial0/0/0.200, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.200
valid adjacency
via 10.0.0.6, FastEthernet0/0, 0 dependencies
traffic share 1
next hop 10.0.0.6, FastEthernet0/0
valid adjacency

0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 (refcount 1)

Hash OK Interface Address Packets
1 Y Serial0/0/0.100 point2point 0
2 Y Serial0/0/0.200 point2point 0
3 Y FastEthernet0/0 10.0.0.6 0
4 Y Serial0/0/0.100 point2point 0
5 Y Serial0/0/0.200 point2point 0
6 Y FastEthernet0/0 10.0.0.6 0
7 Y Serial0/0/0.100 point2point 0
8 Y Serial0/0/0.200 point2point 0
9 Y FastEthernet0/0 10.0.0.6 0
10 Y Serial0/0/0.100 point2point 0
11 Y Serial0/0/0.200 point2point 0
12 Y FastEthernet0/0 10.0.0.6 0
13 Y Serial0/0/0.100 point2point 0
14 Y Serial0/0/0.200 point2point 0
15 Y FastEthernet0/0 10.0.0.6 0