Commit f059de50 authored by Thomas Holterbach's avatar Thomas Holterbach
Browse files

Added Fast Reroute exercise

parent bcd0d973
# IP Fast Reroute with Loop Free Alternate
In this exercise, we will implement a mechanism to fast reroute traffic upon a failure of an adjacent link towards a Loop Free Alternate (LFA).
First, we'll introduce the problem, then give you an overview about the basic setup and your tasks.
## Fast rerouting and LFAs
Consider the following topology, which we'll use throughout the exercise.
<img src="images/lfa_topo.png" width="400" alt="centered image" />
The four routers are connected to each other and one host each.
The links between routers have different weights, as shown in the image.
We will assume that there is some mechanism to determine the shortest paths and fill the forwarding tables.
In this exercise, we are using a central controller, but this could also be done by a IGP protocol like OSPF.
In the leftmost figure below, you can see the forwarding path towards h4, i.e. the destination prefix ``.
Now consider that the link between `S1` and `S2` fails.
The adjacent routers `S1` and `S2` will be able to detect this failure (almost) immediately, yet it still takes some time until the network can recover:
A central control needs to be notified about the failure, and re-compute the shortest paths; a protocol like OSPF needs to exchange messages and converge.
During this time, all traffic from `R2` and `R3` towards `` is lost.
In our small demo network, this is negligible.
In large high-speed networks however, recovering from a failure can take several hundreds of milliseconds and many GB of lost traffic.
*Fast rerouting* aims to close this gap:
In addition to the next hop towards a destination, we can install a (precomputed) *backup* next hop.
As soon as the router detects a local link failure, it can immediately forward traffic via the backup to ensure connectivity, until the central controller/IGP protocol has time to update all forwarding paths optimally.
The backup next hops must however be chosen with care.
Consider, for example, that `S2` uses `S3` as a backup for the failed link towards `S1` (it's the next shortest path) and starts rerouting all traffic towards ``.
But `S3` does not know about the link failure and consequently just forwards the traffic *back* to `S2`, resulting in a routing loop, as you can see in the middle figure below.
To prevent such loops operators can use IP Fast Reroute with Loop-Free Alternate (LFA). LFAs are backup next hops that *do not result in loops*.
In our case, `R4` is an LFA for traffic towards ``.
Indeed, `S2` can start forwarding traffic to `S4` and it will reach its destination.
Eventually, when the controller/IGP protocol has updated all paths in response to the failure, traffic can safely be sent via `S3`.
<img src="images/lfa_example.png" width="1200" alt="centered image" />
LFAs need to be computed per router for each adjacent link and destination.
As explained, LFAs must not forward traffic back to the source.
This condition can be expressed in terms of distances between nodes.
Let `D(X, Y)` be the distance between node `X` and `Y`. For router `S`, the next hop `N` is a LFA for destination `D`, if:
D(N, D) < D(N, S) + D(S, D)
Note that this condition considers primarily single link failures.
We will not deal with anything else in this exercise.
For additional consideration of node failures, links with shared risk, and more, please refer to [IP Fast Reroute RFC](
## Setup
We provide you with a basic setup that already implements basic forwarding,
and the tools to introduce failures. Concretely, you'll find the following files:
* `p4app.json` configures the topology introduced above with the help of mininet and thep4-utils package. Note that we disabled `pcap` logging to reduce disk usage. In case you want to use it, you will have to set the option to `true`.
* `p4src/fast_reroute.p4`: the p4 program to use as a starting point.
It already contains two register arrays: `primaryNH` allows looking up the port
for a next hop, and `linkState` contains the local link information for a
given port (you only need to read it): `0` if there are no issues, `1` if the
link is down.
* `p4src/includes`: headers and parsers.
* ``: the central controller, already capable of installing forwarding rules and reacting to failures.
* ``: a CLI to introduce and reset link failures.
* ``: a script to generate random topologies.
### Startup
First, execute `sudo p4run`, which will start mininet and all p4 switches.
When it's done, you'll be greeted by the mininet CLI.
When mininet is running, open a second terminal or tab (this will be useful in a moment),
and execute `python`.
This will first start the python controller, which computes all shortest paths
in the network and installs the corresponding forwarding rules.
Afterwards, *reroute CLI* will start.
### Failing links
You can use the mininet CLI to introduce traffic, and the reroute CLI to introduce failures.
For example, run `h2 ping h1` in the mininet CLI.
If your controller has started correctly, you should observe the ping responses.
While the ping is running, switch to the reroute CLI and run `fail s1 s2`.
As the name implies, this will fail the link between `S1` and `S2`.
Have another look at the mininet CLI. The pings should have stopped.
The CLI automatically updates the `linkState` registers of all connected switches
to simulate local failure detection. However, right now our switches do not
leverage this information.
Our controller is already capable of handling failures, but to simulate the delay between a failure and the controller update, the controller *is not automatically notified* of the failure.
In the reroute CLI, run `notify`.
This will call the `failure_notification` method in ``, which updates all forwarding tables.
After you notify the controller, you should observe that the pings start working again.
Finally, you can run `reset` in the reroute CLI, followed by `notify`, to reset both the links and forwarding state.
### Note on IP addresses
For this exercise, we use the IP assignment strategy `l3`, which places each host in a different network.
The IP assigned to host `hX` connected to the switch `SY`
is as follows: `10.Y.X.2`. For example, in the topology above,
`h1` gets `` and `h2` gets ``.
You can find all the documentation about `p4app.json` in the `p4-utils` [documentation](
## Goals
The goal of the exercise is to enable switches to fast reroute the traffic towards a LFA upon a failure of an adjacent link.
You will need to update both the controller and switches to achieve this.
:information_source: You *do not* need to implement failure detection or notification, neither in the controller nor switches. This is out of scope for this exercise and is handled by the CLI ([see above](#failing-links)).
### Control-plane
The controller is already capable of computing the shortest paths, even if some links have failed. You will need to extend it as follows:
- For each switch and link, you must compute an LFA next hop to which the switch can fall back if the link fails.
- You need to install this LFA in the switches.
- While we are working with fixed topology, your controller should be able to work with other topologies as well. Do not hardcode LFAs into your code.
:information_source: Not all topologies allow finding LFAs for any link and destination. In practice, networks are often *designed* such that this is possible.
### Data-plane
The switches are already capable of forwarding traffic to the (primary) next hop,
and contain a register array for the link states at each port (update by the CLI).
Your goal is to immediately reroute traffic upon a link failure.
To achieve this, you will need to extend the switch code as follows:
- Similar to the primary next hop, store the alternative next hop on the switch.
- Read the link state for the primary next hop, and select the alternative next hop if the primary is down, i.e., if the `linkState` is `1`, such that all traffic is immediately rerouted.
:information_source: The controller needs to populate the different primary and backup next hops *prior* to the failure.
## Implementing IP Fast Reroute with LFA
In this section we will give you some implementation guidelines.
### General
In this exercise we try to simplify your life as much as possible, so you can focus on the fast reroute operation.
- The controller knows the MAC address of all hosts, and we have configured all MAC address updates already. We do not use any L2 learning in this exercise.
- You do not have to do load balancing. If two paths have the same cost to reach a destination, just pick any one of the two.
- Your solution does not need to be able to deal with multiple link failures. The LFAs only need to protect against failures of single links at once. We assume that there is sufficient time in between failures for the controller to update the primary and backup next hops. Of course, if you fail too many links, there won't be any LFAs left at some point and even the controller cannot fix this.
- In practice, the network will automatically converge after a failed link. IGP protocols will send out messages automatically, or the controller is automatically notified. In our network, this does not happen, so that you can better observe the effects of link failures. Instead, you can manually `notify` the controller to react to the failed links ([see above](#failing-links)).
### Control-plane
On startup, the provided controller already provides full connectivity.
It configures the next hop indices per destination ([more about that below](data-plane)), and fills the register array for primary next hops.
This next hop index allows the switch to look up the relevant next hop ports.
In this exercise, we assign a unique next hop index to each host, and you do not need to change this. In practice, more efficient solutions are used, such as grouping destinations that take the same path through the network.
You can put your full attention towards the `update_nexthops` method.
This method fills the register arrays with the actual next hops for each index.
For example, `h1` has the next hop id `0`. If there are no failures, the next
hop towards `h1` at switch `S2` is `S1`, located at port `2`.
Thus, the controller writes `2` to `primaryNH[0]` on `S2`.
If you fail the link between `S1` and `S2`, and `notify` the controller, it updates this register with `3`, the port towards `S3` (along with the registers in other switches).
Your task is to extend this function to not only install the primary next hop,
but also a backup next hop.
You will need to coordinate this with your p4 code, that is, you will first need to update your code such that the controller can actually store the backup next hop somewhere.
When computing the backup next hops, keep in mind that they depend on the source router, primary next hop, and the destination.
:information_source: The host itself is also a next hop, although you do not need to compute a backup next hop here, as there is only one link available.
Finally, you need to make sure that you do not install just *any* backup, but an LFA.
To check the LFA condition, you likely need the distances between nodes.
The method `dijkstra` provides you with both the (shortest) distances and paths for each pair of nodes in the network:
failures = (given as input)
distances, paths = self.dijkstra(failures=failures)
distances['s1']['h3'] # Distance from s1 to h3.
paths['s1']['h3'] # Path from s1 to h3.
:information_source: Every time you call `dijsktra`, the shortest paths are recomputed, so make sure to not call it unnecessarily often, and re-use its output.
### Data-plane
The p4 provided program first applies the table `ipv4_lpm`, which matches on the destination prefix using longest-prefix matching (`lpm`).
However, this table does not immediately map to an egress port, but rather to a next hop index.
These indices are installed once when the controller starts, and need not be modified again.
Using this index, the switch can lookup the corresponding next hop egress port in the `primaryNH` register array.
The controller initially populates these registers, and updates them after failures.
In addition to the primary next hop, you need to implement a way to look up a backup next hop (your LFA).
It might be useful to implement another register array similar to `primaryNH`, but other solutions are also possible.
Finally, you need to put everything together and choose the primary if its link is up, and the backup otherwise.
You can find this information in the `linkState` register array.
The link state of port `X` is stored at index `X` in the register array.
It is `0` if there are no errors, and `1` if the link has failed.
Keep in mind that you *first* need to look up the port of the primary, before you can check whether the link at this port is up.
## Testing your solution
Below, we'll give you some additional tips to debug your program.
As an example, we will consider a failure of the link between `S1` and `S2` and will focus on the rerouting in `S2` for the traffic going to ``, similar to the introduction above.
1. Start the topology (this will also compile and load the program).
sudo p4run
2. Verify that you can ping:
mininet> pingall
3. Let's run run the example in the figure above.
We will monitor five links: S1-h1, S4-h4, and the three adjacent links of `S2`.
To visualize these five links altogether, we could open separate tcpdumps, or we can use `speedometer`.
First you need to install `speedometer` with:
sudo apt-get install speedometer
Then you can run the following command.
speedometer -t s2-eth1 -t s2-eth2 -t s2-eth3 -t s2-eth4 -t s4-eth1
:information_source: To see the interface names for all switches you can write `net` in the mininet CLI:
mininet> net
h1 h1-eth0:s1-eth1
h2 h2-eth0:s2-eth1
h3 h3-eth0:s3-eth1
h4 h4-eth0:s4-eth1
s1 lo: s1-eth1:h1-eth0 s1-eth2:s2-eth2 s1-eth3:s4-eth3 s1-eth4:s3-eth4
s2 lo: s2-eth1:h2-eth0 s2-eth2:s1-eth2 s2-eth3:s3-eth2 s2-eth4:s4-eth4
s3 lo: s3-eth1:h3-eth0 s3-eth2:s2-eth3 s3-eth3:s4-eth2 s3-eth4:s1-eth4
s4 lo: s4-eth1:h4-eth0 s4-eth2:s3-eth3 s4-eth3:s1-eth3 s4-eth4:s2-eth4
4. Ping from h1 to h4 with a short interval to see more traffic
mininet> h2 ping h4 -i 0.01
5. Fail the link S1-S2. You can do that from the controller CLI:
link-menu> fail s1 s2
6. Finally, notify the controller about the failure such that it recomputes the new path and update the primary routes in the switches. You can do that from the controller CLI.
link-menu> notify
7. With the default p4 code and controller code, the traffic will be lost
between the failure and the time where you notify the controller and it updates the switches. See the following screenshot.
<img src="images/speedometer_1.png" width="400" alt="centered image" />
As you can see, between the failure and the notification, the traffic is lost.
When you will update the p4 code and the controller to fast reroute the traffic to a LFA,
you should see the following output.
<img src="images/speedometer_2.png" width="400" alt="centered image" />
Here, you can see that `S2` quickly reroutes the traffic to `S4`, which is the LFA.
After the controller recomputes the shortest paths, `S2` forwards to traffic to `S3`, the new primary next hop.
## Testing with another topology
When you complete this exercise, you should have a controller that is able to populate the routing tables and registers of any topology. To test that your solution does work with other topologies, you can use the `` to generate random topologies:
python --output_name <name.json> --topo random -n <number of switches to use> -d <average switch degree>
This will create a random topology with `n` switches that have on average `d` interfaces (depending on `n`, `d` might not be possible). In addition, each switch will have one host directly connected to it (so `n` hosts).
For example, you can create a random topology with `10` switches and an average degree of `4`:
python --output_name 10-switches.json --topo random -n 10 -d 4
Run the random topology:
sudo p4run --config 10-switches.json
Now run the controller, and check that your can send traffic to all the nodes with `pingall`.
mininet> pingall
*** Ping: testing ping reachability
h1 -> h2 h3 h4 h5 h6 h7 h8 h9 h10
h2 -> h1 h3 h4 h5 h6 h7 h8 h9 h10
h3 -> h1 h2 h4 h5 h6 h7 h8 h9 h10
h4 -> h1 h2 h3 h5 h6 h7 h8 h9 h10
h5 -> h1 h2 h3 h4 h6 h7 h8 h9 h10
h6 -> h1 h2 h3 h4 h5 h7 h8 h9 h10
h7 -> h1 h2 h3 h4 h5 h6 h8 h9 h10
h8 -> h1 h2 h3 h4 h5 h6 h7 h9 h10
h9 -> h1 h2 h3 h4 h5 h6 h7 h8 h10
h10 -> h1 h2 h3 h4 h5 h6 h7 h8 h9
*** Results: 0% dropped (90/90 received)
Then fail a link, for instance the link between `s2` and `s9` (you can see all the links when writing `links` in the mininet CLI).
Finally, when you run the default p4 and python code, you should see packet loss.
mininet> pingall
*** Ping: testing ping reachability
h1 -> h2 h3 h4 h5 h6 h7 h8 h9 h10
h2 -> h1 h3 h4 X h6 h7 h8 X h10
h3 -> h1 h2 h4 h5 h6 h7 h8 h9 h10
h4 -> h1 h2 h3 h5 h6 h7 h8 h9 h10
h5 -> h1 X h3 h4 h6 h7 h8 h9 h10
h6 -> h1 h2 h3 h4 h5 h7 h8 h9 h10
h7 -> h1 h2 h3 h4 h5 h6 h8 h9 h10
h8 -> h1 h2 h3 h4 h5 h6 h7 h9 h10
h9 -> h1 X h3 h4 h5 h6 h7 h8 h10
h10 -> h1 h2 h3 h4 h5 h6 h7 h8 h9
*** Results: 4% dropped (86/90 received)
However, with IP fast reroute to LFA implemented, you should not see packet loss.
:information_source: Keep in mind that in a random topology, it might be impossible to find LFAs for some links. If your solution does not work, first check whether it is even possible to find an LFA for the link you failed.
Inspired in the mininet CLI.
# pylint: disable=keyword-arg-before-vararg,invalid-name
import atexit
import os
import subprocess
import sys
from cmd import Cmd
from select import poll
from textwrap import dedent
class CLI(Cmd):
"Simple command-line interface to talk to nodes."
prompt = 'link-menu> '
def __init__(self, controller, stdin=sys.stdin, *args, **kwargs):
self.controller = controller
# Local variable bindings for py command
self.locals = {'controller': controller}
# Attempt to handle input
self.inPoller = poll()
Cmd.__init__(self, *args, stdin=stdin, **kwargs)
print "Checking links and synchronizing with switches..."
failed_links = self.check_all_links()
if failed_links:
formatted = ["%s-%s" % link for link in failed_links]
print "Currently failed links:", ", ".join(formatted)
# Notify the controller so the network will work after boot.
print "Currently failed links: None."
readlineInited = False
helpStr = dedent("""
Mangage linkstate with the following commands:
fail node1 node2 Fail link between node1 and node2.
reset Reset all link failures.
The switch linkstate registers are automatically updated. The controller
is only notified on demand. You can use the commands:
synchronize Manually synchronize linkstate registers.
notify Notify controller about failure.
header = dedent("""
Welcome to the Reroute CLI
def hello_msg(self):
"""Greet user."""
print self.header
print self.helpStr
def initReadline(cls): # pylint: disable=invalid-name
"Set up history if readline is available"
# Only set up readline once to prevent multiplying the history file
if cls.readlineInited:
cls.readlineInited = True
from readline import (read_history_file, set_history_length,
except ImportError:
history_path = os.path.expanduser('~/.rsvp_controller_history')
if os.path.isfile(history_path):
atexit.register(lambda: write_history_file(history_path))
def run(self):
"Run our cmdloop(), catching KeyboardInterrupt"
while True:
if self.isatty():'stty echo sane intr ^C', shell=True)
except KeyboardInterrupt:
# Output a message - unless it's also interrupted
print '\nInterrupt\n'
except Exception: # pylint: disable=broad-except
def emptyline(self):
"Don't repeat last command when you hit return."
def do_help(self, arg):
"Describe available CLI commands."
Cmd.do_help(self, arg)
if arg == '':
print self.helpStr
def do_exit(self, _line):
assert self # satisfy pylint and allow override
return 'exited by user command'
def do_quit(self, line):
return self.do_exit(line)
def do_EOF(self, line): # pylint: disable=invalid-name
print '\n'
return self.do_exit(line)
def isatty(self):
"Is our standard input a tty?"
return os.isatty(self.stdin.fileno())
# Link management commands.
# =========================
def do_fail(self, line=""):
"""Fail a link between two nodes.
Usage: fail_link node1 node2
node1, node2 = line.split()
link = (node1, node2)
except ValueError:
print "Provide exactly two arguments: node1 node2"
for node in (node1, node2):
if node not in self.controller.controllers:
print "%s is not a valid node!" % node, \
"You can only fail links between switches"
if node2 not in self.controller.topo[node1]:
print "The link %s-%s does not exist." % link
failed_links = self.check_all_links()
for failed_link in failed_links:
if failed_link in [(node1, node2), (node2, node1)]:
print "The link %s-%s is already down!" % (node1, node2)
print "Failing link %s-%s." % link
self.update_interfaces(link, "down")
self.update_linkstate(link, "down")
def do_reset(self, line=""): # pylint: disable=unused-argument
"""Set all interfaces back up."""
failed_links = self.check_all_links()
for link in failed_links:
print "Resetting failure for link %s-%s." % link
self.update_interfaces(link, "up")
self.update_linkstate(link, "up")
def do_notify(self, line=""): # pylint: disable=unused-argument
"""Notify controller of failures (or lack thereof)."""
failed = self.check_all_links()
def do_synchronize(self, line=""): # pylint: disable=unused-argument
"""Ensure that all linkstate registers match the interface state."""
print "Synchronizing link state registers with link state..."
switchgraph = self.controller.topo.network_graph.subgraph(
for link in switchgraph.edges:
ifs = self.get_interfaces(link)
ports = self.get_ports(link)
for node, intf, port in zip(link, ifs, ports):
state = "0" if self.if_up(intf) else "1"
print("%s: set port %s (%s) to %s." %
(node, port, intf, state))
self.update_switch_linkstate(node, port, state)
# Link management helpers.
# ========================
def check_all_links(self):
"""Check the state for all link interfaces."""
failed_links = []
switchgraph = self.controller.topo.network_graph.subgraph(
for link in switchgraph.edges:
if1, if2 = self.get_interfaces(link)
if not (self.if_up(if1) and self.if_up(if2)):
return failed_links
def if_up(interface):
"""Return True if interface is up, else False."""
cmd = ["ip", "link", "show", "dev", interface]
return "state UP" in subprocess.check_output(cmd)
def update_interfaces(self, link, state):
"""Set both interfaces on link to state (up or down)."""
if1, if2 = self.get_interfaces(link)
self.update_if(if1, state)
self.update_if(if2, state)
def update_if(interface, state):
"""Set interface to state (up or down)."""
print "Set interface '%s' to '%s'." % (interface, state)
cmd = ["sudo", "ip", "link", "set", "dev", interface, state]
def get_interfaces(self, link):
"""Return tuple of interfaces on both sides of the link."""
node1, node2 = link
if_12 = self.controller.topo[node1][node2]['intf']
if_21 = self.controller.topo[node2][node1]['intf']
return if_12, if_21
def get_ports(self, link):
"""Return tuple of interfaces on both sides of the link."""
node1, node2 = link
if1, if2 = self.get_interfaces(link)
port1 = self.controller.topo[node1]['interfaces_to_port'][if1]
port2 = self.controller.topo[node2]['interfaces_to_port'][if2]
return port1, port2
def update_linkstate(self, link, state):
"""Update switch linkstate register for both link interfaces.
The register array is indexed by the port number, e.g., the state for
port 0 is stored at index 0.
node1, node2 = link
port1, port2 = self.get_ports(link)
_state = "1" if state == "down" else "0"
print("Set linkstate for %s:%s and %s:%s to %s (%s)." %
(node1, port1, node2, port2, _state, state))
self.update_switch_linkstate(node1, port1, _state)
self.update_switch_linkstate(node2, port2, _state)
def update_switch_linkstate(self, switch, port, state):
"""Update the link state register on the device. """
control = self.controller.controllers[switch]
control.register_write('linkState', port, state)
"""A central controller computing and installing shortest paths.
In case of a link failure, paths are recomputed.