Showing posts with label MPLS. Show all posts
Showing posts with label MPLS. Show all posts

Is a label imposed in case of Penultimate Hop Popping?

Shivlu Jain sent me an interesting question:

I'm wondering whether a router performing penultimate hop popping (PHP) imposes an IGP label or not.The value of implicit null is 3; does it mean the router imposes this label (and adds four bytes to the packet)?
The penultimate router does not impose the IGP label (that's why this behavior is called penultimate hop popping). However, the egress router has to signal to its upstream neighbor (the penultimate router) that it should NOT impose a label, so it uses "implicit null" label (= 3) in TDP/LDP updates to signal that the top label should be popped, not rewriten.
This article is part of You've asked for it series.

Goodbye fast switching & cell-mode MPLS

After leaving us in the dark for almost a year, Cisco finally released new functionality in IOS release 12.4(20)T. Support for a number of hardware platforms has been removed (dynamips fans are left with the 7200’s, everything else is gone). They also removed two switching features: fast switching and label-controlled ATM (cell-mode MPLS-over-ATM) together with Label Switch Controller (LSC).

I have no problem living without LC-ATM or LSC; this technology was primarily a retrofit for the old boxes by the time MPLS really took off with MPLS VPN. Fast switching is a different beast. Whenever you’d encounter bugs in more creative designs involving NAT, IPSec and GRE on low-end routers, you could turn off CEF (assuming you did not run NBAR) and things would (sometimes) miraculously start to work. Without fast switching, turning off CEF would bring you straight into process switching, resulting in an order-of-magnitude (or more) performance loss. On the other hand, it's obvious it makes no sense to maintain an obsolete switching method … and more bugs will probably get reported and fixed now that the easy escape route is gone.

Use MPLS to scale your Internet backbone

The Scale your backbone with core MPLS, BGP on the edge article I wrote for SearchTelecom describes how you can improve the stability and scalability of a Service Provider core network by replacing BGP on the core routers with MPLS switching. You'll also find a typical design scenario and a few recommendations that you should follow to prevent router reloads from introducing black holes in your MPLS network.

Building Core Networks with BGP, OSPF and MPLS

Do you need advanced knowledge and skills needed in designing and implementing core MPLS networks? We have developed exactly the course you need.

The Building Core Networks with BGP, OSPF and MPLS (NIL_BCMPL) course focuses on MPLS applications such as MPLS VPN (with special attention to Internet access from a VPN), Any Transport over MPLS (AToM), Carrier Supporting Carrier (CsC) and MPLS Traffic Engineering (MPLS TE). As a prerequisite for MPLS deployment the routing protocols are explained as well - the Open Shortest Path First (OSPF) and Border Gateway Protocol (BGP). The scalability issues of the protocols as well as multiprotocol BGP are addressed as well. Several practice labs enable you to gain the necessary experience in deploying the MPLS-based core networks.

Use slow IGP startup in LDP-only MPLS environments

If you use LDP-based MPLS as the only means of transporting data across your network core (for example, in MPLS VPN networks or in BGP-free ISP core), a router startup might disrupt your Label Switched Paths (remember: they are always based on IGP best paths) leading to temporary disruption in service.For example, when the router P1 in the network shown in the following diagram is powered on and its IGP advertises its presence, the IGP-derived path from PE1 to PE2 will go over P1. If the LDP on P1 has not exchanged labels with PE1 and PE2, there will be no LSP on the shortest path between PE1 and PE2, resulting in a loss of traffic until the labels are exchanged and LSP is built.The proper router startup timing in this environment is thus:

  • Start IGP and find neighbors.
  • Receive IGP updates and build the network topology.
  • Start LDP and exchange labels for all prefixes in the network.
  • Advertise router's presence in IGP.
You can configure slow OSPF startup with the max-metric router-lsa on-startup seconds router configuration command. The corresponding IS-IS command is set-overload-bit on-startup seconds.

The initial IGP delay has to be configured manually (you cannot use wait-for-bgp option in this scenario) and should take in account the time needed to:
  • Find IGP neighbors (at least the hello timer);
  • Receive LSA updates;
  • Run SPF (at least the spf delay).
  • Find LDP neighbors (at least the discovery hello interval).
  • Exchange labels once the SPF run has completed.

Unless you're under very rigid time constraints, 30 seconds seems like a reasonable delay in most environments.

BGP without MPLS?

Designing and operating large BGP networks has always been a challenge, as you have to deploy BGP on all core routers and design a hierarchy of internal BGP routers to get around the full-mesh limitation. When MPLS was introduced, it gave us means of deploying BGP only on the network edges, with the core routers carrying just the information about the BGP next hops.

As I know some of you run large networks, could you help me understand what you're using (without giving away too much information, of course):

  • Are you running a BGP network without MPLS or are you using BGP on the edges and MPLS transport in the core?
  • If you have a large number of BGP routers, do you have a nice hierarchy of BGP route reflectors (or confederations) or ad-hoc implementation where every router has all neighbors as RR-clients?

Full disclosure: I might use the information you give me in an upcoming article.

The tale of the three MTUs

An IOS device configured for IP+MPLS routing uses three different Maximum Transmission Unit (MTU) values:

  • The hardware MTU configured with the mtu interface configuration command
  • The IP MTU configured with the ip mtu interface configuration command
  • The MPLS MTU configured with the mpls mtu interface configuration command
The hardware MTU specifies the maximum packet length the interface can support … or at least that's the theory behind it. In reality, longer packets can be sent (assuming the hardware interface chipset doesn't complain); therefore you can configure MPLS MTU to be larger than the interface MTU and still have a working network. Oversized packets might not be received correctly if the interface uses fixed-length buffers; platforms with scatter/gather architecture (also called particle buffers) usually survive incoming oversized packets.

IP MTU is used to determine whether a non-labeled IP packet forwarded through an interface has to be fragmented (the IP MTU has no impact on labeled IP packets). It has to be lower or equal to hardware MTU (and this limitation is enforced). If it equals the HW MTU, its value does not appear in the running configuration and it tracks the changes in HW MTU. For example, if you configure ip mtu 1300 on a Serial interface, it will appear in the running configuration as long as the hardware MTU is not equal to 1300 (and will not change as the HW MTU changes). However, as soon as the mtu 1300 is configured, the ip mtu 1300 command disappears from the configuration and the IP MTU yet again tracks the HW MTU.

The MPLS MTU determines the maximum size of a labeled IP packet (MPLS shim header + IP payload size). If the overall length of the labeled packet (including the shim header) is greater than the MPLS MTU, the packet is fragmented. The MPLS MTU can be greater than the HW MTU assuming the hardware architecture and interface chipset support that (and the router will warn you that you might be getting into trouble). Similar to the ip mtu command, the mpls mtu command will only appear in the running configuration if the MPLS MTU is different from the HW MTU. However, contrary to the behavior of the IP MTU, any change in HW MTU with the mtu configuration command also resets the MPLS MTU to HW MTU.

The behavior as described above was tested on a 3725 router running IOS release 12.4(15)T1. Although the MPLS MTU Command Changes document claims that you cannot set MPLS MTU larger than then interface MTU from IOS release 12.4(11)T, I was still able to do it in 12.4(15)T1.

mturoute: A utility that measures hop-by-hop path MTU

I wanted to get in-depth details on how various MTU parameters interact in GRE/IPSec/MPLS environment. Before going into router configuration details, I wanted to have a tool that would reliably measure actual path MTU between the endpoints. After a while, Google gave me a usable link: supposedly the tracepath program on Linux does what I needed. As I'm a purely Windows user (for me, PCs are just a tool), I needed a Windows equivalent … and found mturoute, the utility that does exactly what I was looking for.Unfortunately, the original mturoute had an interesting bug: ICMP unreachable generated due to DF bit on oversized packet was accepted as a successful ping. The second run of the program always reported the correct MTU size (as Windows caches the maximum MTU per destination host), but I wanted to be more precise. Half a day later, after installing Windows SDK and Visual Studio Express on my PC, rediscovering my C programming skills and reading a lot of Winsock documentation, I've managed to fix the bugs and even add a “retry on ping timeout” feature. You can download the fixed version (source + exe) from the Articles area of my web site. It was compiled and tested on Windows XP, if you can test it on other platforms (2000, Vista), please let me know the results.

Here is a sample run of the program (the reduced path MTU is due to an MPLS-enabled GRE tunnel in the path):

$ mturoute -t 10.0.3.3
mturoute to 10.0.3.3, 30 hops max, variable sized packets
* ICMP Fragmentation is not permitted. *
* Maximum payload is 10000 bytes. *
1 --+---+++-+++-++ host: 10.0.0.1 max: 1500 bytes
2 --+---++---+++ host: 10.2.0.2 max: 1476 bytes
3 --+---+-+++++++ host: 10.0.3.3 max: 1472 bytes

MPLS LDP autoconfiguration

Most MPLS books (mine included) and courses tell you that you have to manually enable MPLS on each interface where you want to run it with the mpls ip interface configuration command. However, this task was significantly simplified in IOS release 12.3(14)T with the introduction of MPLS LDP autoconfiguration. If you use OSPF as the routing protocol in your network, you can use the mpls autoconfig ldp [area number] router configuration command to enable LDP on all interfaces running OSPF (optionally limited to an OSPF area).

As the careful readers of my MPLS books know, it's dangerous to run LDP with your customers; the moment you run LDP with them (Carrier's carrier model is an exception), they can insert any labeled packet into your network, bypassing inbound access lists and sending traffic where it's not supposed to go (even into another VPN). It's therefore vital that you consider security implications before deploying MPLS LDP autoconfiguration.

Using this feature on P routers is absolutely safe, as they have no customer links. You have to be more careful on the PE-routers, more so if you run routing protocols with your customers. The safest configuration method would be to configure LDP autoconfiguration inside a single OSPF area, but even then a configuration error (placing PE-CE interface in a wrong area) could open your network to MPLS-based attacks.

Introduction to Traffic Enginnering

In the TechTarget article Traffic engineering's role in next-generation networks I'm describing why one would need traffic engineering in a Service Provider network, what options were available using the “legacy” WAN technologies and what you can do to deploy traffic engineering in a modern end-to-end IP network.

MPLS ping and traceroute

One of the hardest troubleshooting problems within an MPLS VPN network has always been finding a broken LSP. While you could (in theory) use the IP ping or traceroute (assuming all hops support ICMP extensions for MPLS), the results are not always reliable ... and interpreting them is not so easy. For example, after I've disabled LDP on an interface with the no mpls ip configuration command, the routers in the LSP path still reported outgoing MPLS labels in ICMP replies for a few seconds (until the LDP holddown timer expired on both ends of the link). As a side note, would you deduce from the printout that the break in the LSP path happened on the router with the IP address 192.168.201.1?

ro#trace ip 1.0.0.1
Tracing the route to 1.0.0.1

1 192.168.201.1 [MPLS: Label 22 Exp 0] 204 msec 200 msec 212 msec
2 192.168.201.6 [MPLS: Label 16 Exp 0] 112 msec 112 msec 116 msec
3 192.168.0.6 56 msec * 60 msec
ro#trace ip 1.0.0.1
Tracing the route to 1.0.0.1

1 192.168.201.1 [MPLS: Label 22 Exp 0] 56 msec 60 msec 56 msec
2 192.168.201.6 56 msec 56 msec 56 msec
3 192.168.0.6 56 msec * 56 msec
The MPLS ping and traceroute commands, introduced in IOS release 12.0(27)S and integrated in mainstream IOS release 12.4(6)T (at least five years too late in my humble opinion) address this problem: they both use IP packets that are not capable of being IP-switched and thus report the exact failure spot.In our scenario, MPLS ping and traceroute correctly identified the problem (unlabeled output interface) and the MPLS traceroute also provides the correct IP address of the offending router.
ro#ping mpls ip 1.0.0.1 255.255.255.255
Sending 5, 100-byte MPLS Echos to 1.0.0.1/32,
timeout is 2 seconds, send interval is 0 msec:

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
'L' - labeled output interface, 'B' - unlabeled output interface,
'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
'P' - no rx intf label prot, 'p' - premature termination of LSP,
'R' - transit router, 'I' - unknown upstream index,
'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
BBBBB
Success rate is 0 percent (0/5)
ro#trace mpls ip 1.0.0.1 255.255.255.255
Tracing MPLS Label Switched Path to 1.0.0.1/32, timeout is 2 seconds

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
'L' - labeled output interface, 'B' - unlabeled output interface,
'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
'P' - no rx intf label prot, 'p' - premature termination of LSP,
'R' - transit router, 'I' - unknown upstream index,
'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
0 192.168.201.2 MRU 1500 [Labels: 22 Exp: 0]
B 1 192.168.201.1 MRU 1504 [No Label] 64 ms
After the problem has been fixed, both commands report successful tests:
ro#trace mpls ip 1.0.0.1 255.255.255.255
Tracing MPLS Label Switched Path to 1.0.0.1/32, timeout is 2 seconds

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
'L' - labeled output interface, 'B' - unlabeled output interface,
'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
'P' - no rx intf label prot, 'p' - premature termination of LSP,
'R' - transit router, 'I' - unknown upstream index,
'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
0 192.168.201.2 MRU 1500 [Labels: 22 Exp: 0]
I 1 192.168.201.1 MRU 1500 [Labels: 16 Exp: 0] 68 ms
I 2 192.168.201.6 MRU 1504 [Labels: implicit-null Exp: 0] 140 ms
! 3 192.168.0.6 112 ms
ro#ping mpls ip 1.0.0.1 255.255.255.255
Sending 5, 100-byte MPLS Echos to 1.0.0.1/32,
timeout is 2 seconds, send interval is 0 msec:

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
'L' - labeled output interface, 'B' - unlabeled output interface,
'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
'P' - no rx intf label prot, 'p' - premature termination of LSP,
'R' - transit router, 'I' - unknown upstream index,
'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 100/103/104 ms