# Theory of the Minimum Spanning Tree Problem [Part 2]

📅May 24, 2021 | ☕️25 minutes read

In this blog post, I will expand the theory developed in the Theory of the Minimum Spanning Tree Problem [Part 1]. I will focus mostly on the relationships between cuts and cycles which turns out to be a powerful way to analyse and even develop MST algorithms. We will analyse a perhaps (in)famous minimum spanning tree algorithm using the theory developed in the post. The last section is about the analysis of the oldest known minimum spanning tree algorithm.

Recently, I have started reading the Algorithms Illuminated Part 3, which is one of DIY books written by Tim Roughgarden, inspired by online courses that are currently running on the Coursera. However, the optional theory problems about minimum spanning trees were quite challenging and I felt the need to explore theory a bit further. For example, Theorem 10 about the similarity of minimum spanning trees is one of them. Both (in)famous minimum spanning tree algorithms mentioned are one of the optional theory problems.

The post is divided in three sections. In the first section, we will explore the relationship between cuts, connectedness and cycles in graphs. We will apply the results developed to prove existence of ‘interesting’ sequence of minimum spanning trees. The second section is about cuts, connectedness and cycles and their application to the analysis of the (in)famous minimum spanning tree algorithm whose correctness is not entirely obvious, in a way fascinating. The algorithm we will analyse constructs the minimum spanning tree by avoiding to make a decision about which edge to include, which is something I find quite remarkable. The last section is devoted to the first (oldest) minimum spanning tree known. We will analyse it using the theoretical results we develop for analysis in sections before.

## Relationship between cuts, connectedness, cycles and sequences of MSTs

To begin, we recall some key ideas from the previous post.

### Definition 1: a spanning subset and a spanning tree

Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Then a subgraph $G' = (V,A)$ where $A \subseteq E$ is a spanning subgraph of the graph G if the graph $G'=(V, A)$ is connected. If $G'$ is a tree, we say $G'$ is a spanning tree of the graph G.

### Definition 3: a graph cut, crossing edge

Let $G=(V,E)$ be a graph having the vertex set $V$ and edge set $E$. A cut $(S, V-S)$ is a partition of $V$. An edge $(u,v)$ crosses the cut if $u \in S, v \in (V-S)$.

### Theorem 6: The fundamental cut property of the minimum spanning tree problem

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Let $(S,V-S)$ be a cut of the graph $G$. Let $A$ be the set of edges which belong to some minimum spanning tree and assume no edge in $A$ crosses the cut $(S, V-S)$. Let $(u,v)$ be the edge crossing the cut $(S, V-S)$ whose cost $w$ is minimal amongst edges crossing the cut $(S, V-S)$. Then an edge $(u,v)$ belongs to some minimum spanning tree of $G$.

However, for the purposes of this post, we need stronger insight into what happens when an edge of a minimum spanning tree is removed.

In this post, we will need a bit more results about the relationship between cuts, cycles and connectivity. Lemma names and proof ideas are taken from Tim Roughgarden’s above mentioned book.

The following lemma gives the characterisation of connectivity of the graph with respect to cuts.

### Lemma 1: Empty cut lemma

Claim Let $G=(V,E)$ be an undirected graph having the vertex set $V$ and edge set $E$. Then $G$ is not connected if and only if there exists a cut $(A,B)$ such that no edge in $E$ crosses the cut $(A,B)$.

Proof:

($\longrightarrow$) Suppose that $G$ is not connected. Then there exist $a, b \in V$ such that there is no path $a \rightsquigarrow b$. Define $A = \{ x \in V, \text{x reachable from a} \}$ and define $B = \{ x \in V, \text{x not reachable from a} \}$. We claim $(A,B)$ is a graph cut. Since $a, b$ are disconnected in $G$, $A \cap B = \emptyset$. Clearly, $a \in A, b \in B$ so $A, B$ are nonempty. Also, $V = A \cup B$. Therefore, $(A,B)$ is indeed a cut. It remains to show no edge in $E$ crosses the cut $(A,B)$. Suppose there were such an edge, say $(u,v) \in E, u \in A, v \in B$. By definition of $A$, there exists a path $a \rightsquigarrow u$. By definition of $B$, there exists no path $v \rightsquigarrow a$. But $a \rightsquigarrow u \rightarrow v$ is a path $a \rightsquigarrow v$, so $v \in A$. This is a contradiction. Therefore, no edge crosses the cut $(A,B)$.

($\longleftarrow$) Suppose that $(A,B)$ is a cut such that no edge crosses it. Then pick $a \in A$, $b \in B$. We claim there is no path $a \rightsquigarrow b$. Suppose there exists such a path. Since no edge crosses the cut $(A,B)$, every edge in the path has both endpoints in $A$m, since $a \in A$. But then, it must be $b \in A$. This is a contradiction, because $b \in B$, and $A \cap B = \emptyset$. Therefore, $a, b$ are disconnected in $G$. $\blacksquare$

In many proofs related to minimum spanning trees, we will perform the ‘exchange argument’. This means we would like to replace some edge with a different edge and we often need to ensure that the replacement will not introduce a cycle. Although obvious, the following lemma is a very convenient theoretical tool to justify the ‘exchange arguments’ we will perform in many following proofs.

### Lemma 2: Double crossing lemma

Claim: Let $G=(V,E)$ be an undirected graph having the vertex set $V$ and edge set $E$. Suppose that the cycle $C \subseteq E$ has an edge $e$ crossing the cut $(A,B)$. Then there exists another edge, $v \in C, v \neq e$ crossing the cut $(A,B)$.

Remark: By drawing a sketch, it is easy to convince yourself this is obviously true.

Suppose that there exists no edge $v$, $v \in C, v \neq e$ crossing the cut $(A,B)$. Let $e = (u, v)$ be the only edge crossing the cut $(A,B)$. Suppose $u \in A, v \in B$. Then write the cycle $C$ as $u \rightarrow v \rightsquigarrow u$, where $v \rightsquigarrow u$ is a path. By assumption, no edge in $v \rightsquigarrow u$ crosses the cut $(A,B)$. Since $v \in B$, all endpoints of edges comprising the path $v \rightsquigarrow u$ belong to $B$. But then, $u \in B$. Therefore $u \in A \cap B$. This is a contradiction since $A \cap B = \emptyset$.

Therefore, there exists another edge, $v \in C, v \neq e$ crossing the cut $(A,B)$. $\blacksquare$

### Lemma 3: Lonely edge corollary

Claim: Let $G=(V,E)$ be an undirected graph having the vertex set $V$ and edge set $E$. Let $(A,B)$ be a cut of $G$. If $e \in E$ is the unique edge crossing the cut $(A,B)$, $e$ cannot participate in any cycle of $G$.

Suppose $e$ does participate in a cycle $C \subseteq E$. By Double crossing lemma, there exists another edge, $v \in C, v \neq e$ crossing the cut $(A,B)$. This is a contradiction, since $e$ was the unique edge crossing the cut $(A,B)$. $\blacksquare$

### Theorem 9: Disconnecting a minimum spanning tree

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Let $T$ be a minimum spanning tree of $G$. Let $e=(u,v)$ be an edge of the minimum spanning tree $T$. Then $T - \{ e \}$ is disconnected into two components $T_U = \{ x \in V, \text{ x reachable from u in T - \{e\}} \}$, $T_V = \{ x \in V, \text{ x reachable from v in T - \{e\}} \}$. Moreover, both $(T_U, T -\{ e\})$ and $(T_V, T -\{ e\})$ are subtrees of $T$, and $(T_U, T_V)$ is a cut. In addition to everything above, no edge in $T - \{e\}$ crosses the cut $(T_U, T_V)$.

Proof: Since $T$ is a tree, every path in $T$ is unique, $e$ is the unique path from $u$ to $v$. If there was another path $u \rightsquigarrow v$, entirely in $T - \{ e \}$, then $T$ would have a cycle $u \rightsquigarrow v \rightarrow u$. Therefore $u, v$ are disconnected in $T - \{ e \}$.

Clearly, $u$ is reachable from $u$ in $T - \{ e \}$ and $v$ is reachable from $v$ in $T - \{ e \}$, so $u \in T_U$, $v \in T_V$.

Now we will show that $V= T_U \cup T_V$. Let $a \in V$. Since $T$ is spanning, there exists a path $a \rightsquigarrow v$ in $T$. Either $a \rightsquigarrow v$ contains e or it does not.

If $a \rightsquigarrow v$ does contain $e$, then $a \rightsquigarrow v$ = $a \rightsquigarrow u \rightarrow v$. Since all edges of $a \rightsquigarrow u$ belong to $T - \{ e \}$, $a$ is reachable from $u$ in $T - \{ e \}$. Hence $a \in U$. If $a \rightsquigarrow v$ does not contain $e$, then all edges of $a \rightsquigarrow v$ belong to $T - \{ e \}$, so $a$ is reachable from $v$. Hence $a \in V$.

Therefore, $V \subseteq T_U \cup T_V$. Trivially, $T_U \cup T_V \subseteq V$ so it must be $V= T_U \cup T_V$.

We claim $T_U \cap T_V = \emptyset$. Assume not. Then there exists $t \in T_U$ and $t \in T_V$. So there exists a path $u \rightsquigarrow t$, $t \rightsquigarrow v$ in $T - \{ e\}$. So $u$ and $v$ are connected in $T - \{ e \}$. This is a contradiction. Hence $T_U \cap T_V = \emptyset$.

Therefore $(T_U, T_V)$ is a cut.

Since $(T_U, T -\{ e\})$ and $(T_V, T -\{ e\})$ are subgraphs of a tree $T$, they must be subtrees. Otherwise, $T$ would not be a tree.

To conclude, we claim that no edge in $T - \{e\}$ crosses the cut. Suppose there were such an edge $(x,y)$. Then $x \in T_U$ and $y \in T_V$. So there are paths $u \rightsquigarrow x$, $y \rightsquigarrow v$ in $T - \{e\}$. Hence $u \rightsquigarrow x \rightarrow y \rightsquigarrow v$ connects $u, v$ so $u,v$ are connected in $T - \{e\}$. This is a contradiction. Therefore no edge in $T - \{e\}$ crosses the cut $(T_U, T_V)$. $\blacksquare$

The following theorem, which is a converse of Theorem 6, will strengthen the relationships between cuts and crossing edges belonging to a minimum spanning tree.

### Theorem 10: All edges comprising a minimum spanning tree are light

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Let $T$ be a minimum spanning tree of $G$. Then for every edge $e$ in $T$, there exists some cut $(A,B)$ of $G$ such that $e$ is a minimum weight edge crossing (A,B).

Proof: We will prove by contradiction.

Suppose that $T$ is a minimum spanning tree of $G$ and there exists an edge $e$ such that for every cut $(A,B)$, amongst edges of $G$ crossing $(A,B)$, $e$ is not amongst those of minimum weight.

Suppose $e = (u, v)$. Consider $T - \{e \}$. By Theorem 9, $T - \{ e \}$ is disconnected into two components $T_U = \{ x \in V, \text{ x reachable from u in T - \{e\}} \}$, $T_V = \{ x \in V, \text{ x reachable from v in T - \{e\}} \}$. Moreover, both $(T_U, T -\{ e\})$ and $(T_V, T -\{ e\})$ are subtrees of $T$, and $(T_U, T_V)$ is a cut. In addition to everything above, no edge in $T - \{e\}$ crosses the cut $(T_U, T_V)$.

By assumption, $e$ is not a minimum weight edge crossing the cut $(T_U, T_V)$ in $G$. Firstly, observe that $e$ cannot be the only edge crossing the cut $(T_U, T_V)$, because then $e$ would be an edge of minimum weight crossing $(T_U, T_V)$. So there exists another edge $(x, y) \in E, (x, y) \neq e$ with $w(x,y) < w(e)$.

Set $T' = (T - \{e\}) \cup \{ (x,y) \}$. We claim $T'$ is a spanning tree. Since no edge in $T - \{e\}$ crosses the cut $(T_U, T_V)$, $(x,y)$ is the only edge in $(T - \{e\}) \cup \{ (x,y) \}$ crossing the cut $(T_U, T_V)$. By Lonely edge corollary, $T'$ is acyclic.

Since $(x,y)$ is crossing the cut $(T_U, T_V)$, we may assume $x \in T_U$, $y \in T_V$.

Consider $a,b \in V$. If both $a, b \in T_U$ then there exist paths $a \rightsquigarrow u, b \rightsquigarrow u$. If $b$ appears in path $a \rightsquigarrow u$, then $a$ and $b$ are connected. Otherwise, $a \rightsquigarrow u \rightsquigarrow b$ is a path connecting $a$ to $b$ in $T'$.

Similarly, if $a, b \in T_V$, then $a, b$ are connected, by the same argument as above.

Suppose that $a \in T_U, b \in T_V$. Since $T_U$ and $T_V$ are subtrees, there exist paths $a \rightsquigarrow x$ entirely in $T_U$ and $y \rightsquigarrow b$ entirely in $T_V$. Then $a \rightsquigarrow x \rightarrow y \rightsquigarrow b$ is a path in $T'$ connecting $a, b$. Therefore, $T'$ is connected. Therefore, $T'$ is a spanning tree.

Since $T$ is a minimum spanning tree, $w(T) \leq w(T')$.

Observe that $w(T') = w(T) - w(e) + w(x,y) < w(T)$. This is a contradiction to the statement above. Therefore, for every edge $e$ in $T$, there exists some cut $(A,B)$ of $G$ such that $e$ is a minimum weight edge crossing (A,B). This completes the proof. $\blacksquare$

Perhaps surprisingly, all minimum spanning trees are ‘similar’. More precisely, for any two minimum spanning trees given, there is a sequence of edge swaps converting one tree into the other one. What is more surprising is the fact that each ‘intermediate’ step tree is also a minimum spanning tree.

### Theorem 10: similarity of minimum spanning trees

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Suppose that $T$ and $T'$ are two minimum spanning trees of $G$. Then there exists a sequence of minimum spanning trees $T=T_0, T_1, ..., T_n = T'$ where each consecutive pair of minimum spanning trees $T_i, T_{i+1}$ differ by only a single edge swap. In other words, there is a sequence of edge swaps converting $T$ into $T'$.

Proof:

To prove this claim, we will firstly introduce a lemma used to set up a proof by induction.

### Lemma 11: similarity of minimum spanning trees

Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Suppose that $T$ and $T'$ are two distinct minimum spanning trees of $G$, differing by $n$ edges, $n \in \mathbb{N}$. Then there exists a minimum spanning tree $T''$ differing from $T$ by a single edge and differing from $T'$ by $n-1$ edges.

Proof:

We will construct such a minimum spanning tree $T''$. Let $e=(u,v)$ be an edge in $T'$ which is not present in $T$. By Theorem 10, there exists a cut $(A,B)$ such that $e$ is a minimum cost edge crossing the cut $(A,B)$.

Since $T$ is a minimum spanning tree, there exists a path $u \rightsquigarrow v$ such that $e$ is not on that path. Consider $T \cup \{ e \}$. Then $u \rightsquigarrow v \rightarrow u$ is a cycle in $T \cup \{ e \}$. Since $e$ crosses the cut $(A,B)$, by Double Crossing lemma, there exists a different edge on the cycle $u \rightsquigarrow v \rightarrow u$ crossing the $(A,B)$, say $e'=(u', v')$. That edge must be an edge of $T$. Without loss of generality, the cycle is of the form $u \rightsquigarrow u' \rightarrow v' \rightsquigarrow v \rightarrow u$. Since $e$ is a minimum cost edge crossing the cut $(A,B)$, we have $w(e) \leq w(e')$.

Consider $T - \{ e' \}$. By Theorem 9, we have that $T - \{ e' \}$ is disconnected into two components $T_U, T_V$, $T_U = \{ x \in V, \text{ x reachable from u' in T - \{e'\}} \}$, $T_V = \{ x \in V, \text{ x reachable from v' in T - \{e'\}} \}$. Moreover, both $(T_U, T -\{ e'\})$ and $(T_V, T -\{ e'\})$ are subtrees of $T$, and $(T_U, T_V)$ is a cut. In addition to everything above, no edge in $T - \{e'\}$ crosses the cut $(T_U, T_V)$.

By the form of the cycle $u \rightsquigarrow u' \rightarrow v' \rightsquigarrow v \rightarrow u$, we may assume $u \in T_U, v \in T_V$. Clearly, $e$ crosses the cut $(T_U, T_V)$.

Now consider $T'' = (T - \{e'\}) \cup \{ e \}$. Since no edge in $T - \{e'\}$ crosses the cut $(T_U, T_V)$, $e$ is the unique edge crossing the cut $(T_U, T_V)$ in $T''$. By Lonely edge corollary, $e$ cannot form a cycle in T”. Since $(T_U, T -\{ e'\})$ and $(T_V, T -\{ e'\})$ are subtrees of $T$, there is no cycle in $T''$.

Consider $a,b \in V$. If both $a, b \in T_U$ then there exist paths $a \rightsquigarrow u', b \rightsquigarrow u'$. If $b$ appears in path $a \rightsquigarrow u'$, then $a$ and $b$ are connected. Otherwise, $a \rightsquigarrow u' \rightsquigarrow b$ is a path connecting $a$ to $b$ in $T'$.

Similarly, if $a, b \in T_V$, then $a, b$ are connected, by the same argument as above.

Suppose that $a \in T_U, b \in T_V$. Since $T_U$ and $T_V$ are subtrees, there exist paths $a \rightsquigarrow u$ entirely in $T_U$ and $v \rightsquigarrow b$ entirely in $T_V$. Then $a \rightsquigarrow u \rightarrow v \rightsquigarrow b$ is a path in $T''$ connecting $a, b$. Therefore, $T''$ is connected. Since $T''$ is acyclic, $T''$ is a spanning tree.

Now $w(T'') = w(T) - w(e') + w(e)$. Since $w(e) \leq w(e')$, $w(T'') \leq w(T)$. Since $T$ is a minimum spanning tree, $w(T'') \geq w(T)$ so $w(T'') = w(T)$. Clearly, $T''$ differs from $T$ by a single edge and includes an edge $e$ of $T'$. Since $T$ differs from $T'$ by $n$ edges and $T''$ consists of the same edge set as $T$ except $e$, $T''$ differs from $T$ by $n-1$ edges.

This completes the proof. $\blacksquare$

Now back to the original claim.

We prove the following statement by induction on the number of edges by which $T$ and $T'$ differ:

Let $T$, $T'$ be two minimum spanning trees, differing by exactly $n$ edges. Then there exists a sequence of minimum spanning trees $T=T_0, T_1, ..., T_n+1 = T'$ where each consecutive pair of minimum spanning trees $T_i, T_{i+1}$ differ by only a single edge.

Basis: Suppose $T = T'$. The claim holds vacously.

Induction Assumption: Let $T$, $T'$ be two minimum spanning trees, differing by exactly $n$ edges. Then there exists a sequence of minimum spanning trees $T=T_0, T_1, ..., T_{n+1} = T'$ where each consecutive pair of minimum spanning trees $T_i, T_{i+1}$ differ by only a single edge.

Induction Step: Now suppose that $T$ differs from $T'$ by exactly $n + 1$ edges. Set $T_0 = T$. By Lemma 11, there exists a minimum spanning tree $T_1$ such that $T_1$ differs from $T$ by a single edge and $T_1$ differs from $T'$ by $n$ edges. By Induction, there exists a sequence of minimum spanning trees $T_1, T_2, \ldots T_{n+1} = T'$ where each consecutive pair of minimum spanning trees $T_i, T_{i+1}$ differ by only a single edge. Now $T_0 = T, T_1, T_2, \ldots T_{n+1}= T'$ is a sequence where each consecutive pair of minimum spanning trees $T_i, T_{i+1}$ differ by only a single edge. This establishes the induction step.

This completes the proof. $\blacksquare$

## “The heaviest on a cycle” algorithm

Consider the following algorithm. The input is a connected undirected graph $G=(V,E)$ with edge costs, not necessarily distinct. The algorithm proceeds in iterations.

The algorithm:

1. Initially, all edges in $T = E$.
2. If the $(V,T)$ is a acyclic, then the algorithm halts and returns the set $T$.
3. Otherwise, it picks an arbitrary cycle $C$ of the current graph and scans all edges of $C$. Let $(u,v)$ be the heaviest edge on $C$. The algorithm removes $(u,v)$ from $T$ and jumps to the step 2.

### Theorem 12: “The heaviest on a cycle” is optimal.

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Then “The heaviest on cycle” computes the minimum spanning tree of $G$.

Remark: Observe that this algorithm constructs a minimum spanning tree by avoiding to make a decision about which edge to include. This is quite interesting.

Proof:

We will show that for every edge the algorithm discards, there exists at least one minimum spanning tree which does not contain that edge.

We will state and prove the lemma formalising the idea above.

### Lemma 13: Every edge removed by “The heaviest on a cycle” is safe to remove.

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Let $(u,v)$ be an edge discarded by “The heaviest on a cycle”. Then there exists a minimum spanning tree of $G$ which does not include $(u,v)$.

Proof: Let $(u, v)$ be an edge that is discarded by the algorithm. If all minimum spanning trees do not contain $(u,v)$, then trivially, there exists at least one minimum spanning tree which does not contain that edge. We may suppose that $T$ is a minimum spanning tree of $G$ which does contain an edge $(u,v)$. Since $(u,v)$ is discarded by the algorithm, there exists some cycle $C$, of the form $u \rightarrow v \rightsquigarrow u$ in $G$.

Consider $T$. By Disconnecting the minimum spanning tree, $T - \{ (u,v) \}$ is disconnected into two components $T_U = \{ x \in V, \text{ x reachable from u in T - \{(u,v)\}} \}$, $T_V = \{ x \in V, \text{ x reachable from v in T - \{(u,v)\}} \}$. Moreover, both $(T_U, T -\{ (u,v) \})$ and $(T_V, T -\{ (u,v) \})$ are subtrees of $T$, and $(T_U, T_V)$ is a cut. In addition to everything above, no edge in $T - \{(u,v)\}$ crosses the cut $(T_U, T_V)$.

By the Double Crossing Lemma, there is an edge $(x,y) \neq (u,v)$ in $v \rightsquigarrow u$ crossing the cut $(T_U, T_V)$. By the choice of the algorithm, the edge $(u,v)$ was the heaviest edge on the cycle $C$, so we have $c(x,y) \leq c(u,v)$.

Set $T' = (T - \{(u,v)\}) \cup \{ (x,y) \}$. We claim $T'$ is a spanning tree. Since no edge in $T - \{(u,v)\}$ crosses the cut $(T_U, T_V)$, $(x,y)$ is the only edge in $(T - \{(u,v)\}) \cup \{ (x,y) \}$ crossing the cut $(T_U, T_V)$. By Lonely edge corollary, $T'$ is acyclic.

Since $(x,y)$ is crossing the cut $(T_U, T_V)$, we may assume $x \in T_U$, $y \in T_V$.

Consider $a,b \in V$. If both $a, b \in T_U$ then there exist paths $a \rightsquigarrow u, b \rightsquigarrow u$. If $b$ appears in path $a \rightsquigarrow u$, then $a$ and $b$ are connected. Otherwise, $a \rightsquigarrow u \rightsquigarrow b$ is a path connecting $a$ to $b$ in $T'$.

Similarly, if $a, b \in T_V$, then $a, b$ are connected, by the same argument as above.

Suppose that $a \in T_U, b \in T_V$. Since $T_U$ and $T_V$ are subtrees, there exist paths $a \rightsquigarrow x$ entirely in $T_U$ and $y \rightsquigarrow b$ entirely in $T_V$. Then $a \rightsquigarrow x \rightarrow y \rightsquigarrow b$ is a path in $T'$ connecting $a, b$. Therefore, $T'$ is connected. Therefore, $T'$ is a spanning tree.

Since $T$ is a minimum spanning tree, $w(T) \leq w(T')$. Observe that $w(T') = w(T) - c(e) + c(x,y) \leq w(T)$. But then, $w(T)=w(T')$ so $T'$ is a minimum spanning tree of $G$, without an edge $(u,v)$ chosen by the algorithm. $\blacksquare$

We argue correctness of the algorithm above by splitting into two cases.

Suppose that the algorithm discards no edges. Since no edges were discarded, $(V,T)$ is acyclic. Since $T=E$ and $G$ is connected, $(V,T)$ is indeed a spanning tree. But then $G$ is a spanning tree, so it is a minimum spanning tree.

Now suppose that the algorithm discards edges $\{ e_1, e_2, \ldots e_k \}$, $k \in \mathbb{N}$. Applying the lemma above to each $e_i, 1 \leq i \leq k$, we conclude that there exists at least one minimum spanning tree of $G$ not including any of $\{ e_1, e_2, \ldots e_k \}$. The algorithm terminates when $(V,T)$ is acyclic. Since the algorithm removed only edges on cycles of $(V,T)$, the graph $(V_T)$ must be connected. Therefore, $(V,T)$ is a spanning tree. We claim it is a minimum spanning tree. Since there exists at least one minimum spanning tree of $G$ not including any of $\{ e_1, e_2, \ldots e_k \}$, $(V,T)$ must be such a spanning tree. $\blacksquare$

## The oldest minimum spanning tree algorithm - Boruvka’s algorithm

The final section of this post is devoted to the oldest minimum spanning tree algorithm - Boruvka’s algorithm. Similarly to Kruskal’s algorithm, Boruvka’s algorithm initially considers all vertices to be separate connected components and starts with an empty set of edges. For every connected component, the algorithm repeatedly finds the cheapest edge connecting that component to some other component. After finding such an edge for each component, Boruvka includes each such an edge in the partial set of edges defining the connected components of the next iteration. Here is a great animation taken from Wikipedia.

Theoretical results shown above are enough to analyse the oldest known minimum spanning tree algorithm.

## Boruvka’s algorithm

Here is a pseudocode of $\texttt{Boruvka's algorithm}$.

### Theorem 14: Boruvka’s algorithm is correct.

Claim: Let $G=(V,E)$ be an undirected connected graph having the vertex set $V$ and edge set $E$. Let $w : E \to \mathbb{R}$ be an edge cost function. Then $\texttt{Boruvka's algorithm}$ computes the minimum spanning tree of $G$.

Proof: We will prove the following invariant.

Prior to each iteration of $\texttt{Boruvka's algorithm}$, the following three conditions hold:

• Every edge in $T$ belongs to some minimum spanning tree of $G$.
• $(V,T)$ is acyclic.
• The set $C$ consists of all and only connected components of $(V,T)$. For every $C_k \in C$, no edge in $T$ crosses the cut $(C_k, V - C_k)$.

Initialisation: Prior to the first iteration, $T$ is empty so the first two conditions hold vacously. Since $T$ is empty, the third condition holds.

Maintenance: Now consider an iteration of $\texttt{Boruvka's algorithm}$.

Let $C_k$ in $C$. Clearly, $(C_k, V - C_k)$ is a cut of $G$. By the innermost loop, the set $S$ consists only of edges crossing the cut $(C_k, V - C_k)$. Observe that an edge $e$ is the cheapest edge crossing the cut $(C_k, V - C_k)$. Let $e = (u,v)$. Then $u \in C_k$, $v \in C_l, k \neq l$. By the fundamental cut property, $T \cup \{ e \}$ belongs to some minimum spanning tree of $G$. Since no edge in $T$ crosses the cut $(C_k, V - C_k)$, $e$ is the only edge in $T \cup \{ e \}$ crossing the cut $(C_k, V - C_k)$. By Lonely cut corollary, $T \cup \{ e \}$ is acyclic. The algorithm adds $e$ to the set of edges $T'$ to be added to $T$. After the line $T \cup T'$, $C_l$ and $C_k$ become the same connected component which establishes the first condition of the invariant.

Let $C_n$ be a connected component in $C$ after update $T = T \cup T'$. We claim that no edge in $T$ crosses the $(C_n, V - C_n)$. Suppose that there exists an edge $(a,b)$ crossing the cut. Then, since connected components of $(V,T)$ partition $V$, there exists $C_m, m \neq n$ such that $b \in C_m$. But since $(C_n, T)$, $(C_m, T)$ are both connected and $a \in C_n$, $b \in C_m$, $a, b$ are connected in $(V,T)$. This means they belong to the same connected component of $(V,T)$. Hence $C_n = C_m$. This is a contradiction.

Termination: Firstly, we show $\texttt{Boruvka's algorithm}$ terminates. By the proof of invariant maintenance, after the line $T \cup T'$, $C_l$ and $C_k$ become the same connected component, for some connected components $C_l, C_k$ of $(V,T)$. Therefore, each edge addition to $T$ decreases the number of connected components of $(V,T)$ by 1. Hence, $\texttt{Boruvka's algorithm}$ terminates.

By pseudocode, $\texttt{Boruvka's algorithm}$ terminates if and only if $C$ has exactly one connected component. But then $(V,T)$ is connected. By invariant, $(V,T)$ is acyclic. Therefore, $(V,T)$ is a spanning tree of $G$. By invariant, $T$ is a subset of edges of some minimum spanning tree. Since $T$ is a subset of edges of some minimum spanning tree and $(V,T)$ is a spanning tree of $G$, $(V,T)$ must be a minimum spanning tree of $G$. $\blacksquare$

Hi👋! I am a final year Computer Science and Mathematics student at 🎓University of Edinburgh. This is a place where I (will hopefully) write about topics I find interesting. I plan to write about my personal projects and learning experiences, as well as some random tech topics.

Built with a lot of