STP and Graph Theory

Spanning Tree Protocol (STP) is one of several methods of avoiding loops in bridged local area networks. For most of us, working with the typical corporate LAN, STP is the default configuration of switches and it just works. But it may not always be the most efficient way to connect switches. This post looks at exactly what a spanning tree is, why it is used to solve the problem of loops in a network, and why it may not always be the most efficient way to do that.

Spanning Tree Protocol

When we build a typical bridged local area network, we add multiple physical paths between different LAN’s. If we did not, we would have a single point of failure. But IEEE 802 frame-based networks (like 802.3 Ethernet) cannot tolerate a switching loop. With a loop, a single frame would be endlessly retransmitted. And if a frame had a broadcast destination, then the retransmission would rise exponentially, since each received instance of the frame would be flooded again to every other port.

So we need a dynamic method or protocol to break any logical loops and ensure that there is only a single logical path between LAN’s at any one moment. If a path breaks, then the protocol needs to reconfigure the logical network to open a previously closed alternate path.

The original Spanning Tree protocol was published in 1984 by Radia Perlman, working at the time for Digital Equipment Corporation (DEC). It was adopted and absorbed into IEEE 802.1D in 1990. It was extended with Rapid Spanning Tree Protocol (RSTP) in 802.1D in 2004, and with Multiple Spanning Tree Protocol (MSTP) in IEEE 802.1Q in 2005. Meanwhile vendors implemented proprietary extensions, such as Cisco Rapid per VLAN Spanning Tree + (Rapid PVSTP+). Or they adopted proprietary alternatives.

The clause covering STP (as well as Rapid STP and Multiple STP) in 802.1Q – 2014 is 86 pages long. You will find plenty of sites on the internet telling you how STP works, although these are inevitably a summary or an interpretation. To know how it really works, you need to read the standard, which is available from the IEEE GET Program.

But what is less clear is why it works this way. For that, we need to look at Graph Theory.

Graph Theory

Graph Theory is the branch of mathematics that studies the abstract relationship between objects and the things connecting any two of them. In Graph terminology, the objects are vertices, and the things connecting them are edges.

200px-Petersen

Illustration: a Petersen graph

The best description, for our purpose, is here in the MIT course Mathematics for Computer Science, Chapter 5. "Graphs are ubiquitous in computer science because they provide a handy way to represent a relationship between pairs of objects. The objects represent items of interest such as programs, people, cities, or web pages, and we place an edge between a pair of nodes if they are related in a certain way. For example, an edge between a pair of people might indicate that they like (or, in alternate scenarios, that they don’t like) each other. An edge between a pair of courses might indicate that one needs to be taken before the other."

Graphs have no shape. They are a representation of objects and their connections. A simple graph just has vertices and edges. A weighted graph has a value to the edge (a cost, a distance, a mass: anything with quantity). A directed graph has a direction to the edge. Some further definitions:

  • two vertices are adjacent if they are connected by an edge
  • a path is a non-repeating sequence of adjacent vertices and edges
  • two vertices are connected if there is a path from one to the other
  • a cycle is a path that takes you back where you started (a closed path) with no repeated edge
  • a tree is a set of vertices and edges such that there is only a single path connecting any two. If you add an edge, you create a cycle. If you remove an edge, you disconnect one or more of the vertices.
  • a spanning tree is a tree that spans all the vertices in a graph. It has all the vertices connected by a single path, but not all the edges. A graph can have more than one spanning tree.

The Petersen graph (illustrated above) has 2,000 possible spanning trees.

Spanning Trees

Spanning trees connect all the vertices in a graph, with the fewest number of edges. There are no loops and no repeats. But a graph can have more than one spanning tree, and any given spanning tree is not necessarily the most efficient way of connecting the vertices. If all edges have the same weight (as in a simple graph), then all spanning trees must have the same total weight. But if the edges have different weights, such as a cost or a distance, then some spanning trees would be a more efficient way to connect the vertices than others:

  • A minimum spanning tree connects all the vertices with the minimum total weight of the edges. So, if you were laying a fiber optic cable between towns, this would be an efficient way to do it with the least amount of digging. But it does not give us the shortest path from one specific town to another.
  • A shortest path spanning tree obtains the lowest weight path from a given vertex (the root of the tree) to any other vertex. If you start a journey from one town, this is the shortest route to every other given town. But the same path is not the shortest route if you start from a different town.

IEEE 802.1 Spanning Tree Protocols

With bridged local area networks, the bridges are the vertices and the LAN’s are the edges connecting them. The bridges are usually connected by more than one physical path, for redundancy; or they may be connected in such a way by accident. To avoid a loop we need to switch ports off so that there is only one active path between any pair of bridges at any one time. A fully connected graph with only one path between vertices is a spanning tree. So we need to implement a spanning tree across all the bridges.

The title of Perlman’s paper in 1984 was "An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN". From the paper:

"This approach [transparent bridging] assumes that the topology is a tree (loop-free). However, requiring a topology to be loop-free means there are no backup paths in the case of bridge or LAN failures. Also, because the technology allows network growth so easily, it might be difficult to prevent someone from adding a bridge and creating a loop. A loop in the topology might cause severe performance degradation in the entire extended network due to congestion caused by infinitely circulating packets. It is undesirable to have a network that can be brought down so easily, merely by plugging a cable into the wrong place."

"Thus we have designed an algorithm that allows the extended network to consist of an arbitrary topology. The algorithm is run by the bridges, and computes a subset of the topology that connects all LANs yet is loop-free (a spanning tree). The algorithm is self-configuring. The only a priori information necessary in a bridge is its own unique ID (MAC address), which we are assuming can be attained in some manner, for instance with a hardware ROM containing the value."

The algorithm also meets several other design goals, including low memory and bandwidth regardless of the number of bridges; stabilises in a small multiple of the round trip delay across the network; and completely deterministic behaviour.

This design is carried through to the IEEE 802.1Q standard today, over thirty years later.

Spanning Tree Type

I am interested in what type of spanning tree we obtain with the current IEEE 802.1Q algorithm. The textbook and online guide: Algorithms, by Robert Sedgewick and Kevin Wayne, has a good description and illustration of the standard algorithms for deriving a spanning tree of a graph:

The selection of a starting vertex is common to any spanning tree algorithm. After picking a starting point, the next step is to evaluate the edges, to see which to add to the developing tree. The minimum path algorithms add the next lowest weight edge. The shortest path algorithms add the next lowest root cost path. The IEEE 802.1 STP algorithm uses the root path cost (together with user configurable parameters) to evaluate the vectors it receives, which gives us a shortest path spanning tree algorithm.

However, the standard spanning tree algorithms require somewhere to store the data structures while the tree is constructed. A network of bridges, in an arbitrary topology, in which bridges may be added or removed without pre-configuration, does not have this. Instead, the bridges exchange vectors with their adjacent bridges, discarding those that contain worse information, and replacing their own with any that is better.

This is somewhat similar to the Bellman-Ford algorithm (published in about 1958). This algorithm uses repeated passes, and discards paths that are worse, ending up with the least cost (shortest) path. It is not a distributed algorithm, but I can just about see how the algorithm could be implemented in a distributed fashion. Each bridge only needs to know which adjacent bridge provides the best path to the root. If it receives a lower cost path, it can discard the one it had previously.

In summary, STP (as well as RSTP and MSTP) has the effect of creating a shortest path spanning tree, although the protocol itself does not claim to do so. The tree is rooted on a random bridge, unless one is configured. The spanning tree is guaranteed to be acyclic (loop free) but it is most definitely not the shortest path from any given bridge to another, or to a gateway, or to a core switch (unless you configure the core switch to be the root bridge).

In a network of arbitrary topology, or a simple hierarchy with an obvious root, then STP (and RSTP and MSTP) may be the best way to calculate the spanning tree. But if you have a campus, sites across a metropolitan area, a factory floor or a mesh of sensors, then it may not be.

Two alternatives to STP are:

  • IEEE 802.1Q – 2014 Shortest Path Bridging (SPB): one spanning tree for each entry point to the network
  • IEC 62429 – 2010 Part II Media Redundancy Protocol (MRP): a ring protocol with a topology manager and clients

These have to be the subject of a different post.

The Troubleshooting Process

I do a lot of troubleshooting of problems in corporate infrastructure. By that, I mean investigating difficult technical problems in a structured way to find either a solution or a workaround. This post is a few thoughts on the process.

Here is a caricature of the process that often happens. It is imaginary, but I think most people will easily identify real cases that are similar.

  1. A user (or a group of users, for example at one site or using one application) experiences a problem. The help desk tries a few things, not very systematically. Sometimes different people from the help desk try the same thing again. Various updates and changes are made to see if they fix the problem. Sometimes the problem is resolved. Sometimes it goes away. And sometimes it is ignored as "just the way it is until we upgrade xyz".
  2. A user experiences a problem. The user is senior, or the problem affects something that senior people care about. A problem manager takes over the case. The problem manager is someone who co-ordinates actions but does not, themselves, possess the technical skills to resolve the problem. The problem is assigned to whichever team seems most likely e.g. networks, server, storage, application. The team works hard to demonstrate that the fault cannot be with them. The problem continues, with increasing levels of polite acrimony. Eventually a significant change is made which may, or may not, be the cause e.g. a platform component is changed. The problem goes away, but no-one knows what actually caused it.

What is the perfect way? A user experiences a problem. A technical wizard understands every aspect of the problem without needing to be told anything about it, or to do any investigation. The technical wizard knows the solution, whether it is in the network, or server, or storage, or application layer. After some discussion, the fix is implemented and the problem is solved.

This perfect way is absurd, obviously. No-one can know what the problem is until they have seen it, thought about it, asked a few questions, gathered some data. No-one can be expert in enough technologies to know exactly what the fix is, without running some tests, trying a few things.

So we need a process that is not like 1) and 2), but also does not assume there is a perfect way.

First, some context. We already have an awareness of what is normal behaviour and what is not. If a user says that logon is slow, we assess it against a typical logon time. If the behaviour is not normal, then we can assume there is a fault. For the most part (not always!) vendor products do not show obvious faults in isolation. So, if the vendors do not know about this fault, then there must be a fault in our configuration, or an unknown fault in the product. There must be something we can change so that the problem does not occur. Our job is to find what to change.

The way I go about it is divided into five parts. These are not all sequential, but they involve different types of activity:

  1. Incident Handling
  2. Problem Definition
  3. Investigation
  4. Analysis
  5. Resolution.

I am not going to describe each part in detail: just give an idea.

Incident Handling

Being practical, I am going to assume that the great majority of day-to-day help desk calls do not require much rigour. The most important thing is to avoid doing things twice. As soon as we (on the help desk) realise that a problem is not simple, we need to follow a systematic process to gather the incident data in a way that avoids having to do it again.

Problem Definition

As soon as we realise that a fault may be more difficult to resolve than by providing a simple change, we need to prepare a Problem Definition.

Very often I find that, when starting out, it is more difficult to write down an accurate statement of the problem than you might expect. The report might be: "The application crashes". But is that a Not Responding window or an application error? If there is an error, what is the exact message, and what happens when you click on OK? Is there an error shown in the Application Event Log? Does the application log show an error? How often does this occur? How many people are affected? What is the user doing in the moments leading up to the problem? Writing down what is and is not known goes a long way to defining what problem we are trying to solve.

It is not always easy. I have worked on a fault causing Windows to crash with a bug check, where it was difficult to know even whether there was a single problem, or more than one.

Investigation

This is not a distinct step. But I identify it separately because we attempt to gather data systematically to understand the problem better. The more we understand, the more specific the tools we might use to investigate further.

Initially we might configure application debug level logging. Or we might run a network capture. As we narrow it down we might get very specific.

For example, I recently had a problem where Outlook seemed to hang for a few seconds when typing. I first ran a network capture of all traffic to see if, perhaps, a peak was causing a delay in response. There was no significant level of traffic. There was no significant latency. No retransmission. No problems with the network connection in general.

Then I ran a capture only between the Outlook client and the Exchange server (using the relevant ports). I noticed a spike in traffic every 15 minutes, coinciding with the problem. But what was the spike? Was it normal, or a fault?

So then, on the server, I configured Remote Operation (ROP) logging to capture the exact MAPI operations being performed by the client. The problem was caused by a MAPI action to re-read the mail database and refresh the view in Outlook. This occurred exactly when the user experienced the problem.

Analysis

I will mention just two aspects of the analysis.

One is that we must have a mental model of the part of the system we are examining, and all its components. We then need to rule in or out the different components we think could be involved. Often I am told the problem is in Citrix. When I investigate, it is not in Citrix at all. It is in, for example, the configuration of the application on Citrix. Or, in one very difficult case, it was port exhaustion in an unpatched Windows server. Or it might be a configuration error in Citrix after all.

The second is conjecture. I use the term to mean that we need to develop an idea of a single cause of all the different symptoms we have found in the investigation. I don’t mean a guess. For example, sometimes people will say "I think it must be the anti-virus". This is a guess. But a conjecture would be: "A windows driver fault is occurring in the I/O stack. It is a type of fault that does not appear in testing using Driver Verifier". This is exactly what the problem was, but it was extremely difficult to find.

Resolution

An interesting aspect of troubleshooting is that the final part is often really easy. Often, I don’t need to be involved any further, once the exact cause is identified.

You might think that, after investigation and analysis, we may have a good idea but we cannot be certain. Of course, nothing is ever entirely certain. But going back to what I said about the context, the problem must be caused by a specific fault; otherwise the behaviour would be normal. When you find the cause, you just know this is it.