Service chaining with NSH and Open vSwitch

This post originally appeared on

Service chaining is a hot topic in the SDN/NFV world. Service chaining is a simple concept but it requires many moving parts. Most SDN solutions use their own technology for chaining network services. The challenge in service chaining is knowing where in the chain a function is located and where the packet should be sent next without sending all traffic through a central node.

Network Service Headers

Most presentations and articles about service chaining position network service headers (NSH) as the solution for network service chaining. Although NSH is still a draft standard, industry seems to agree that NSH is the way to go.

A service chain uses NSH in conjunction with a encapsulation protocol such as VXLAN. NSH adds additional metadata to the encapsulated packet. The most important metadata is the service graph in which the package was classified and the index of the position in that service graph. The protocol also has room for four 4-byte metadata fields.

Open vSwitch

In the OpenStack ecosystem with SDN controllers Open vSwitch is an important component for virtual switching. All available demos and the OPNVF project use Open vSwitch for chaining. However, NSH support is not available in mainline Open vSwitch and the project has rejected the available patch sets. Currently two patch sets (for a 2.3 dev version and a 2.5 dev version) exist and only the latter has DPDK support.

The lack of decent NSH support makes it rather complex but doable to setup demonstrators. However, it clearly shows that NSH in combination with Open vSwitch is not yet ready for pilot or production use, despite all the fuss at conferences.

Network functions

Network functions with NSH support are also an important step for NSH adoption. Last year we built a demonstrator based on NSH and were unable to find any VNF (virtual network function) with NSH support. It does not mean that it does not exist, but it does raise red flags.

For those who read the NSH RFC might wonder: why not use an NSH proxy and non-NSH aware network functions?

We were not able to find a proxy either. The closest thing to a proxy we could find was this reference on the OpenDaylight mailing list.

Eventually we used the NSH python tools referenced in that list. It is functional, but do not expect any performance.


It is clear that NSH is far from ready for prime time. NSH is promising but lacks real support, even for demo purposes! It makes one wonder if there is chicken or egg problem here: there are no VNFs so SDN vendors are not inclined to add support, and because no SDN tech supports NSH, VNF vendors will not add NSH support.

MLAG with Cumulus

Cumulus Linux supports MLAG for dual-connected hosts. How to set this up is in great detail explained in their documentation. However, the role of the inter switch link and how this needs to be configured was not entirely clear to me. Both in a setup with normal bridges and vlan aware bridges, my initial configuration contained a mistake. This mistake caused the kernel to handle the ISL bond (called peerlink in the examples in the docs) as a normal link instead of the ISL. clagctl did not warn me about it. Actually, the only way to find out is by printing out STP port details using mstpctl.

The docs explain that the setup of the ISL is done by defining a SVI (switch virtual interface) on an available vlan (4094 in their docs) and add additional stanzas to generate the configuration for clagd to work. This is correct, but a crucial piece of information was lost to me. This setup is only used for clagd to function and to sync LACP bonding states between the switch members of a bond. The ISL also needs to be programmed with specific rules into the ASIC, so it handles single connected hosts and STP traffic correct. This only became clear to me after reading this paper. The kernel needs to recognize the ISL bond as the ISL in a specific manner: first is by adding the clag stanzas and second by adding the untagged interface (no subinterface) to an untagged bridge. What this means depends on the bridge mode:

Traditional mode
In traditional mode you have a bridge per VLAN that exist in the network. For the ISL to work correctly, the ISL link (peerlink for example) needs to be added to a bridge without a subinterface. So for example:
auto br0
iface br0
bridge-ports peerlink swp1 swp2 swp3 ...

When this is configured correctly, you should get similar output from mstpctl:

root@cumulus1:~# mstpctl showportdetail br0 peerlink 
br0:peerlink CIST info
  enabled            yes                     role                 Designated
  port id            8.002                   state                forwarding
  external port cost 1382                    admin external cost  0
  internal port cost 1382                    admin internal cost  0
  designated root    1.000.44:38:39:FF:00:01 dsgn external cost   0
  dsgn regional root 1.000.44:38:39:FF:00:01 dsgn internal cost   0
  designated bridge  1.000.44:38:39:FF:00:01 designated port      8.002
  admin edge port    no                      auto edge port       yes
  oper edge port     yes                     topology change ack  no
  point-to-point     yes                     admin point-to-point auto
  restricted role    no                      restricted TCN       no
  port hello time    2                       disputed             no
  bpdu guard port    no                      bpdu guard error     no
  network port       no                      BA inconsistent      no
  Num TX BPDU        3                       Num TX TCN           2
  Num RX BPDU        3                       Num RX TCN           2
  Num Transition FWD 2                       Num Transition BLK   1
  bpdufilter port    no                     
  clag ISL           yes                     clag ISL Oper UP     yes
  clag role          primary                 clag dual conn mac   00:00:00:00:00:00
  clag remote portID F.FFF                   clag system mac      44:38:39:FF:00:01

clag ISL: yes is crucial here.

In traditional bridge mode this mistake is easily made (at least in an isolated setup where the cumulus switches only link to servers) when none of the connected devices their untagged traffic is mapped to untagged traffic on the switches.

VLAN aware bridges
In vlan aware mode, only untagged interfaces are added to a bridge. This makes this mistake harder to make, unless you only add the interfaces of connected devices to the bridge and not peerlink (like I did). Again, no warning anywhere. Except for “clag ISL: no” in mstpctl.