STP and Graph Theory

Spanning Tree Protocol (STP) is one of several methods of avoiding loops in bridged local area networks. For most of us, working with the typical corporate LAN, STP is the default configuration of switches and it just works. But it may not always be the most efficient way to connect switches. This post looks at exactly what a spanning tree is, why it is used to solve the problem of loops in a network, and why it may not always be the most efficient way to do that.

Spanning Tree Protocol

When we build a typical bridged local area network, we add multiple physical paths between different LAN’s. If we did not, we would have a single point of failure. But IEEE 802 frame-based networks (like 802.3 Ethernet) cannot tolerate a switching loop. With a loop, a single frame would be endlessly retransmitted. And if a frame had a broadcast destination, then the retransmission would rise exponentially, since each received instance of the frame would be flooded again to every other port.

So we need a dynamic method or protocol to break any logical loops and ensure that there is only a single logical path between LAN’s at any one moment. If a path breaks, then the protocol needs to reconfigure the logical network to open a previously closed alternate path.

The original Spanning Tree protocol was published in 1984 by Radia Perlman, working at the time for Digital Equipment Corporation (DEC). It was adopted and absorbed into IEEE 802.1D in 1990. It was extended with Rapid Spanning Tree Protocol (RSTP) in 802.1D in 2004, and with Multiple Spanning Tree Protocol (MSTP) in IEEE 802.1Q in 2005. Meanwhile vendors implemented proprietary extensions, such as Cisco Rapid per VLAN Spanning Tree + (Rapid PVSTP+). Or they adopted proprietary alternatives.

The clause covering STP (as well as Rapid STP and Multiple STP) in 802.1Q – 2014 is 86 pages long. You will find plenty of sites on the internet telling you how STP works, although these are inevitably a summary or an interpretation. To know how it really works, you need to read the standard, which is available from the IEEE GET Program.

But what is less clear is why it works this way. For that, we need to look at Graph Theory.

Graph Theory

Graph Theory is the branch of mathematics that studies the abstract relationship between objects and the things connecting any two of them. In Graph terminology, the objects are vertices, and the things connecting them are edges.

200px-Petersen

Illustration: a Petersen graph

The best description, for our purpose, is here in the MIT course Mathematics for Computer Science, Chapter 5. "Graphs are ubiquitous in computer science because they provide a handy way to represent a relationship between pairs of objects. The objects represent items of interest such as programs, people, cities, or web pages, and we place an edge between a pair of nodes if they are related in a certain way. For example, an edge between a pair of people might indicate that they like (or, in alternate scenarios, that they don’t like) each other. An edge between a pair of courses might indicate that one needs to be taken before the other."

Graphs have no shape. They are a representation of objects and their connections. A simple graph just has vertices and edges. A weighted graph has a value to the edge (a cost, a distance, a mass: anything with quantity). A directed graph has a direction to the edge. Some further definitions:

  • two vertices are adjacent if they are connected by an edge
  • a path is a non-repeating sequence of adjacent vertices and edges
  • two vertices are connected if there is a path from one to the other
  • a cycle is a path that takes you back where you started (a closed path) with no repeated edge
  • a tree is a set of vertices and edges such that there is only a single path connecting any two. If you add an edge, you create a cycle. If you remove an edge, you disconnect one or more of the vertices.
  • a spanning tree is a tree that spans all the vertices in a graph. It has all the vertices connected by a single path, but not all the edges. A graph can have more than one spanning tree.

The Petersen graph (illustrated above) has 2,000 possible spanning trees.

Spanning Trees

Spanning trees connect all the vertices in a graph, with the fewest number of edges. There are no loops and no repeats. But a graph can have more than one spanning tree, and any given spanning tree is not necessarily the most efficient way of connecting the vertices. If all edges have the same weight (as in a simple graph), then all spanning trees must have the same total weight. But if the edges have different weights, such as a cost or a distance, then some spanning trees would be a more efficient way to connect the vertices than others:

  • A minimum spanning tree connects all the vertices with the minimum total weight of the edges. So, if you were laying a fiber optic cable between towns, this would be an efficient way to do it with the least amount of digging. But it does not give us the shortest path from one specific town to another.
  • A shortest path spanning tree obtains the lowest weight path from a given vertex (the root of the tree) to any other vertex. If you start a journey from one town, this is the shortest route to every other given town. But the same path is not the shortest route if you start from a different town.

IEEE 802.1 Spanning Tree Protocols

With bridged local area networks, the bridges are the vertices and the LAN’s are the edges connecting them. The bridges are usually connected by more than one physical path, for redundancy; or they may be connected in such a way by accident. To avoid a loop we need to switch ports off so that there is only one active path between any pair of bridges at any one time. A fully connected graph with only one path between vertices is a spanning tree. So we need to implement a spanning tree across all the bridges.

The title of Perlman’s paper in 1984 was "An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN". From the paper:

"This approach [transparent bridging] assumes that the topology is a tree (loop-free). However, requiring a topology to be loop-free means there are no backup paths in the case of bridge or LAN failures. Also, because the technology allows network growth so easily, it might be difficult to prevent someone from adding a bridge and creating a loop. A loop in the topology might cause severe performance degradation in the entire extended network due to congestion caused by infinitely circulating packets. It is undesirable to have a network that can be brought down so easily, merely by plugging a cable into the wrong place."

"Thus we have designed an algorithm that allows the extended network to consist of an arbitrary topology. The algorithm is run by the bridges, and computes a subset of the topology that connects all LANs yet is loop-free (a spanning tree). The algorithm is self-configuring. The only a priori information necessary in a bridge is its own unique ID (MAC address), which we are assuming can be attained in some manner, for instance with a hardware ROM containing the value."

The algorithm also meets several other design goals, including low memory and bandwidth regardless of the number of bridges; stabilises in a small multiple of the round trip delay across the network; and completely deterministic behaviour.

This design is carried through to the IEEE 802.1Q standard today, over thirty years later.

Spanning Tree Type

I am interested in what type of spanning tree we obtain with the current IEEE 802.1Q algorithm. The textbook and online guide: Algorithms, by Robert Sedgewick and Kevin Wayne, has a good description and illustration of the standard algorithms for deriving a spanning tree of a graph:

The selection of a starting vertex is common to any spanning tree algorithm. After picking a starting point, the next step is to evaluate the edges, to see which to add to the developing tree. The minimum path algorithms add the next lowest weight edge. The shortest path algorithms add the next lowest root cost path. The IEEE 802.1 STP algorithm uses the root path cost (together with user configurable parameters) to evaluate the vectors it receives, which gives us a shortest path spanning tree algorithm.

However, the standard spanning tree algorithms require somewhere to store the data structures while the tree is constructed. A network of bridges, in an arbitrary topology, in which bridges may be added or removed without pre-configuration, does not have this. Instead, the bridges exchange vectors with their adjacent bridges, discarding those that contain worse information, and replacing their own with any that is better.

This is somewhat similar to the Bellman-Ford algorithm (published in about 1958). This algorithm uses repeated passes, and discards paths that are worse, ending up with the least cost (shortest) path. It is not a distributed algorithm, but I can just about see how the algorithm could be implemented in a distributed fashion. Each bridge only needs to know which adjacent bridge provides the best path to the root. If it receives a lower cost path, it can discard the one it had previously.

In summary, STP (as well as RSTP and MSTP) has the effect of creating a shortest path spanning tree, although the protocol itself does not claim to do so. The tree is rooted on a random bridge, unless one is configured. The spanning tree is guaranteed to be acyclic (loop free) but it is most definitely not the shortest path from any given bridge to another, or to a gateway, or to a core switch (unless you configure the core switch to be the root bridge).

In a network of arbitrary topology, or a simple hierarchy with an obvious root, then STP (and RSTP and MSTP) may be the best way to calculate the spanning tree. But if you have a campus, sites across a metropolitan area, a factory floor or a mesh of sensors, then it may not be.

Two alternatives to STP are:

  • IEEE 802.1Q – 2014 Shortest Path Bridging (SPB): one spanning tree for each entry point to the network
  • IEC 62429 – 2010 Part II Media Redundancy Protocol (MRP): a ring protocol with a topology manager and clients

These have to be the subject of a different post.

IEEE 802.1Q Bridges and Bridged Networks

I have been working recently on the design for an industrial network. The network runs through a physical tunnel, and so requires a ring topology to provide resilience against failure. Because the network may incorporate switches from different vendors I decided to read the current standard for this, the IEEE standard 802.1Q – 2014. This blog post aims to break down the idea of a bridged network to the lowest level, to reflect accurately what is in the standard.

IEEE 802.1Q – 2014

IEEE 802.1Q – 2014 is the current industry standard for Bridges and Bridged Networks. It includes the standards for bridge (or switch) operation, spanning tree protocols to prevent loops, and other aspects of bridging. The standard is a dense document. It is 1768 pages long. It defines 266 terms, not including terms defined by other standards. And it employs 247 acronyms or abbreviations, from ACK (acknowledgement) to VTID (VSI Type Identifier), where VSI is the Virtual Station Interface.

The standard specifies the operation of bridges in Clause 3: Introduction "for the purpose of compatible interconnection of information technology equipment using the IEEE 802 MAC Service supported by interconnected IEEE 802 standard LAN’s using different or identical media access control methods".

It does this by defining the protocols for different aspects of operation, so that vendors can implement them and obtain conformance. It is not intended to provide an explanation or to give examples.

So, by my estimation, it would take at least 5,000 pages to provide an accurate and full description of the standard. It would require far more to provide the specifics in terms of linked standards, like 802.3 Ethernet. This is why, when you read vendor documentation, you are reading a summary or a description, rather than an accurate definition of the protocols. You will find a great deal of inaccurate, misleading or incomplete descriptions in various documentation. What follows is a breakdown of the core components of the standard that, I hope, make it easier to understand other documentation accurately.

IEEE 802 Networks

IEEE 802 is a family of standards for frame-based networks. The family is described in IEEE 802 – 2014 Overview and Architecture. From Clause 4.1 Key Concepts:

"IEEE 802 networks use frame-based communications over a variety of media to connect various digital apparatus regardless of computer technology and data type."

"The basic communications capabilities provided by all IEEE 802 standards are frame based with source and destination addressing and asynchronous timing. In a frame-based system, the format is a variable-length sequence of data octets. By contrast, cell-based communication transmits data in fixed-length units in specified time intervals while isochronous communication transmits data as a steady stream of octets, or groups of octets, at equal time intervals."

"An IEEE 802 LAN is a peer-to-peer communication network that enables stations to communicate directly on a point-to-point, or point-to-multipoint, basis without requiring them to communicate with any intermediate stations that perform forwarding or filtering above the PHY [physical layer]."

It is really quite remarkable that the same family of coherent standards has governed local area networking from the early 10 Mbps Ethernet over coax (802.3 – 1983), to 100 Gbps Ethernet over optical fiber (802.3 – 2015), to Wireless (802.11) and Bluetooth (part of Personal Area Networks in 802.15) today. A key point is that the standard for Bridges and Bridged Network operates with all of them.

Media Access Control Service

IEEE local area networks use a shared medium, like a cable, to transmit and receive frames of data. The Media Access Control (MAC) Service is the service that controls access to the medium, so that signals do not collide. If they were to collide, the resulting signal would be garbage.

Clause 6.2 Provision of the MAC Service "The MAC Service provided in end stations attached to MAC Bridged Networks and Virtual Bridged Networks is the (unconfirmed) connectionless mode MAC Service defined in IEEE Std 802.1AC. The MAC Service is defined as an abstraction of the features common to a number of specific MAC Services".

The definition of the MAC Service from IEEE 802.1AC – 2016: Media Access Control (MAC) Service Definition.

Clause 7.2 "The primitives of the MAC Service comprise a data request and a corresponding data indication, each with MAC destination address, MAC source address, a MAC Service Data Unit (MSDU) comprising one or more octets of data, and priority parameters. Taken together these parameters are conveniently referred to as a frame".

A MAC Service User makes a connection to the medium via a Service Access Point (SAP). The implementation of a SAP is what we usually call a port.

Clause 7.4 "The term port is used to refer to the interface stack for a given SAP. Often the interface stack comprises a single protocol entity attached to a single Local Area Network (LAN), and port can be conveniently used to refer to several aspects of the interface stack, including the physical interface connector for example."

The protocol requires no negotiation or set-up between endpoints. The MAC Service User simply transmits the frame over the medium via the port.

Clause 7.8 "The MAC Service supported by an IEEE 802 LAN provides connectionless connectivity, i.e., communication between attached stations occurs without explicit prior agreement between service users."

Clause 14. "An MSDU transmitted using MAC connectionless-mode transmission is not considered by the MAC Service provider to be related in any way to any previously transmitted MSDU…The MAC Service provider is not required to maintain state information for flow control between specific combinations of MSAPs."

Here is an illustration of the MAC service, from 802.1AC – 2016:

802.1AC-2016 Figure 7.1 MAC Service

Figure 7.1 MAC entities, the MAC Service, and MAC Service users (clients).

Local Area Network

Colloquially, the Local Area Network (LAN) usually refers to the whole collection of cables and switches on a site (i.e. a local as distinct from a wide area network). But the exact meaning of LAN in the IEEE 802.1Q standard is a single segment of a shared medium (for example a cable). The definition of a LAN is given here:

Clause 3.94: " The term “Local Area Network” and the abbreviation LAN are used exclusively to refer to an individual LAN specified by a MAC technology, without the inclusion of Bridges. This precise use of terminology within this specification allows a Bridged Network to be distinguished from an individual LAN that has been bridged to other LANs in the network (a bridged LAN). In more general usage, such precise terminology is not required, as it is an explicit goal of this standard that Bridges are transparent to the users of the MAC Service".

The correct term for the collection of cables and switches on a site is a Bridged Network or a Bridged LAN.

This definition also highlights that different LAN’s can use different media access technologies, but only one technology on one LAN. Ethernet has become so dominant that it is easy to forget the other standards, like Token Ring. But the standards for Bridges and Bridged Networks do not depend on using Ethernet, and do not require the same MAC technology on different LAN’s.

A single LAN, in principle, originally supported multiple stations (PC’s, printers, servers) communicating directly between each other. But in practice now we usually have only end station on a dedicated LAN, together with its switch port. We also have a LAN connecting two ports on separate switches. This is the accurate definition, although we commonly use the term differently.

Bridge

A common question seems to be: "what is the difference between a bridge and a switch?". The answer is sometimes given that, while bridge is the technical term, vendors call them switches for marketing purposes. This is not exactly correct.

The definition of the term in IEEE 802.1Q – 2014 is the following:

Clause 3.22 Bridge: "A system that includes Media Access Control (MAC) Bridge or Virtual Local Area Network (VLAN) Bridge component functionality and that supports a claim of conformance to Clause 5 of IEEE Std 802.1Q-2014 for system behavior."

This is a tautology, so not much help. However, the main function of a bridge is to relay or filter frames between two ports.

Clause 8.1 Bridge operation: "The principal elements of Bridge operation are:

  1. ) Relay and filtering of frames (8.1.1).
  2. ) Maintenance of the information required to make frame filtering and relaying decisions (8.1.2).
  3. ) Management of the above (Clause 12)."

So we could say that the best description of a bridge is a network component that relays or filters frames, among other related functions. But relay between what?

Clause 8.1.1 Relay: "A Bridge relays individual MAC user data frames between the separate MACs of the individual LANs connected to its Ports."

So a bridge is the connecting function between two ports. Each port receives all the signals on its LAN, and so the bridge connects two LAN’s. The term "switch" does not occur in 802.1Q. A Layer 2 switch is simply a vendor implementation of multiple bridged ports.

If you have a PC attached to a port on an access switch, and a server attached to a port on a core switch in the datacentre, then:

  • a frame is transmitted by the PC over its LAN
  • received by the port on the access switch and relayed to another port
  • re-transmitted over the LAN between the bridge ports on different switches
  • then, after a few more relays and re-transmissions, re-transmitted over the LAN of the server where it is received.

VLAN

We think of a Virtual LAN (VLAN) as a subset of the whole network, identified by the same VLAN ID (e.g. all the ports allocated to VLAN 20). But, if the LAN is one segment of a MAC medium only, then the Virtual LAN sounds as though is should be a subset of one segment. As there is normally only one end station and one switch port on a LAN, what does that mean?

Clause 3.258 Virtual Local Area Network (VLAN): "The closure of a set of Media Access Control (MAC) Service Access Points (MSAPs) such that a data request in one MSAP in the set is expected to result in a data indication in another MSAP in the set."

This is a strange definition. The VLAN is a closed subset of MAC Service Access Points. A MAC Service Access Point (MSAP) is referred to as a port. But there is no entity defined as the set of all such access points.

A VLAN cannot be a Virtual Bridged Network, because that refers to a network with multiple VLAN’s.

3.254 Virtual Bridged Network: "A concatenation of individual IEEE 802 Local Area Networks (LANs) interconnected by Bridges, including Virtual Local Area Network (VLAN) Bridges."

This means that a VLAN is not a virtual instance of a LAN, as you might expect from the name. Instead it is a subset of ports in a Virtual Bridged Network. This is the way we use the term, but it is not the literal meaning of Virtual LAN.

Relay Function

The principal element of Bridge operation is to relay or filter (not relay) frames. A Bridge relays frames between the individual LAN’s connected to its Ports.

Here is an illustration of the relay function, showing the MAC Service, LAN, Bridge and VLAN components that I described above.

802.1Q-2014 Figure 8.2 VLAN Bridge Architecture

IEEE 802.1Q – 2014 Figure 8.2 VLAN Bridge architecture.

In the figure above:

  • Two separate LAN’s are connected by a bridge component
  • The relay function occurs between two ports
  • The MAC Service (shown as MS) is the tower of functions on a single port
  • At the bottom of the MAC Service the functions are specific to the media access method being employed, like Ethernet or Token Ring
  • At the top of the MAC Service the functions are independent of the specific access method. This is the Internal Sublayer Service (ISS)
  • The Extended ISS (EISS) is where adding and removing VLAN tags takes place. It is an extension because some bridges are VLAN-aware and others not.
  • The MAC Relay Entity is what does the forwarding or filtering of frames between ports
  • Higher layer entities manage functions across more than two ports. Monitoring port status is an obvious example.

Filtering Database

The basic architecture of IEEE 802 networks is a distributed one. There is no overall controller. No configuration of a bridge is required to enable it to operate on the network. The bridge is transparent to the end stations. A bridge operates successfully without configuration because it follows a set of protocols that are automatically compatible with other bridges doing the same.

So how does a switch know where to send the frame? There is no route, or addressing scheme, to use. The unique ID of a MAC Service entity, the MAC address, does not contain addressing information. It is not really an address! It is like having a unique social security number, but no address.

When an end station transmits a frame, it includes in the frame the source MAC address (its own) and the destination MAC address (which it discovered by another process). It has no idea of where the destination is, or how to reach it. It simply drops the frame onto the LAN.

A bridge port attached to the LAN listens to all the frames. The port learns the MAC address(es) of the device(s) on its LAN, and records them in the bridge’s Filtering Database (FDB). As each port does this, the FDB grows to contain the MAC addresses of all the devices on all the LAN’s attached to the Bridge, and which port they are attached to.

This is illustrated here:

802.1Q-2014 Figure 8.5 Observation of network traffic

IEEE 802.1Q – 2014 Figure 8.5 Observation of network traffic

When a station transmits a frame to another station on the same LAN, the bridge port does nothing except to record the source addresses. When a station transmits a frame to a station that is not on the same LAN, then the bridge port will relay the frame. The bridge looks for the destination MAC address in the FDB, sees which port it is associated with, and relays the frame to that port. The port then drops the frame onto the LAN attached to that port. The destination device recognises its own MAC address in the frame, and receive it.

Clause 8.7 The Learning Process "The Learning Process receives the source MAC addresses and VIDs, or only the source MAC addresses in the case of VLAN-unaware MAC Relays, of received frames from the Forwarding Process, subject to active topology enforcement (8.6.1) and the application of ingress filtering (8.6.2)." "When invoked, the Learning Process shall create or update a Dynamic Filtering Entry (8.8.3) that specifies the reception Port for the frame’s source address and, in the case of VLAN Bridge components, the frame’s VID".

Entries in the FDB are aged out, with a default time of 5 minutes.

Clause 8.7.3 Ageing of Dynamic Filtering Entries "Dynamic Filtering Entries shall be automatically removed after a specified time, the Ageing Time, has elapsed since the entry was created or last updated by the Learning Process. The ageing out of Dynamic Filtering Entries ensures that end stations that have been moved to a different part of the network will not be permanently prevented from receiving frames. It also takes account of changes in the active topology of the network that can cause end stations to appear to move from the point of view of the Bridge; i.e., the path to those end stations subsequently lies through a different Bridge Port."

The FDB is then used to determine to which port frames are relayed.

Clause 8.8 The Filtering Database (FDB) "The FDB supports queries by the Forwarding Process to determine whether received frames, with given values of, destination MAC address, and for VLAN Bridge components, VID, are to be forwarded through a given potential transmission Port". "The FDB contains filtering information in the form of filtering entries that are either a) Static, and explicitly configured by management action; or b) Dynamic, and automatically entered into the FDB by the normal operation of the Bridge and the protocols it supports."

Flooding

Since a MAC address, and its association with a port, is only known in the FDB when that station has transmitted a frame, we need a mechanism for finding a station when it has not yet transmitted. This is done by flooding the frame to every port, except the port the frame came from. Flooding is not a specific operation of a bridge. The terms "flood" or "flooding" are used to describe the result of not filtering.

When the destination MAC address is not held in the FDB, the frame is not filtered and so every port drops the frame onto its attached LAN. When the frame eventually reaches the end station with the destination address of the frame, it recognises the address and responds. When it responds, its address is captured by the bridge port to which it is attached, and stored in the FDB.

If one of the bridge ports is connected to another bridge port (for example connecting two switches), then each of those bridge ports will see all the traffic coming from the other. The FDB of each switch will contain all the source MAC addresses for the entire network arriving by that path.

Loop Prevention

We all know that it is undesirable to have an open loop in a local area network. It is not easy to describe exactly why it is undesirable. The term "broadcast storm" is sometimes used, but that is not an accurate description of the problem. In fact, the standard does not define the problem, although it spends a great deal of time solving it.

The standard aims to maintain the Quality of Service (QoS) by preventing frame duplication.

Clause 6.5.4 Frame duplication "The MAC Service (IEEE Std 802.1AC) permits a negligible rate of duplication of frames. The operation of Bridges introduces a negligible rate of duplication of user data frames. The potential for frame duplication in a bridged network arises through the possibility of the following:

  1. ) Repeated transmission, through a given Bridge Port, of a frame received on another Bridge Port;
  2. ) Multiple paths between source and destination end stations;
  3. ) A loop in a path between source and destination stations."

"When Bridges in a network connect individual LANs in such a way that physical topology is capable of providing multiple paths between any source and destination, a protocol is required to ensure that the active topology comprises a single path."

Frame duplication would happen if a frame were sent out from one port, and relayed back to another port. In this case, the frame would again be sent out. If the frame had more than a single destination (such as a multicast or broadcast frame), then it would be relayed back by multiple ports. Each returning frame would be transmitted, causing an exponential increase in traffic.

A large part of the 802.1Q standard is assigned to preventing this from happening, through Spanning Tree Protocols (STP) and Shortest Path Bridging (SPB).