Theory of the Minimum Spanning Tree Problem [Part 2]
📅May 24, 2021 | ☕️25 minutes read
In this blog post, I will expand the theory developed in the Theory of the Minimum Spanning Tree Problem [Part 1]. I will focus mostly on the relationships between cuts and cycles which turns out to be a powerful way to analyse and even develop MST algorithms. We will analyse a perhaps (in)famous minimum spanning tree algorithm using the theory developed in the post. The last section is about the analysis of the oldest known minimum spanning tree algorithm.
Recently, I have started reading the Algorithms Illuminated Part 3, which is one of DIY books written by Tim Roughgarden, inspired by online courses that are currently running on the Coursera. However, the optional theory problems about minimum spanning trees were quite challenging and I felt the need to explore theory a bit further. For example, Theorem 10 about the similarity of minimum spanning trees is one of them. Both (in)famous minimum spanning tree algorithms mentioned are one of the optional theory problems.
The post is divided in three sections. In the first section, we will explore the relationship between cuts, connectedness and cycles in graphs. We will apply the results developed to prove existence of ‘interesting’ sequence of minimum spanning trees. The second section is about cuts, connectedness and cycles and their application to the analysis of the (in)famous minimum spanning tree algorithm whose correctness is not entirely obvious, in a way fascinating. The algorithm we will analyse constructs the minimum spanning tree by avoiding to make a decision about which edge to include, which is something I find quite remarkable. The last section is devoted to the first (oldest) minimum spanning tree known. We will analyse it using the theoretical results we develop for analysis in sections before.
Relationship between cuts, connectedness, cycles and sequences of MSTs
To begin, we recall some key ideas from the previous post.
Definition 1: a spanning subset and a spanning tree
Let be an undirected connected graph having the vertex set and edge set . Then a subgraph where is a spanning subgraph of the graph G if the graph is connected. If is a tree, we say is a spanning tree of the graph G.
Definition 3: a graph cut, crossing edge
Let be a graph having the vertex set and edge set . A cut is a partition of . An edge crosses the cut if .
Theorem 6: The fundamental cut property of the minimum spanning tree problem
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Let be a cut of the graph . Let be the set of edges which belong to some minimum spanning tree and assume no edge in crosses the cut . Let be the edge crossing the cut whose cost is minimal amongst edges crossing the cut . Then an edge belongs to some minimum spanning tree of .
However, for the purposes of this post, we need stronger insight into what happens when an edge of a minimum spanning tree is removed.
In this post, we will need a bit more results about the relationship between cuts, cycles and connectivity. Lemma names and proof ideas are taken from Tim Roughgarden’s above mentioned book.
The following lemma gives the characterisation of connectivity of the graph with respect to cuts.
Lemma 1: Empty cut lemma
Claim Let be an undirected graph having the vertex set and edge set . Then is not connected if and only if there exists a cut such that no edge in crosses the cut .
Proof:
() Suppose that is not connected. Then there exist such that there is no path . Define and define . We claim is a graph cut. Since are disconnected in , . Clearly, so are nonempty. Also, . Therefore, is indeed a cut. It remains to show no edge in crosses the cut . Suppose there were such an edge, say . By definition of , there exists a path . By definition of , there exists no path . But is a path , so . This is a contradiction. Therefore, no edge crosses the cut .
() Suppose that is a cut such that no edge crosses it. Then pick , . We claim there is no path . Suppose there exists such a path. Since no edge crosses the cut , every edge in the path has both endpoints in m, since . But then, it must be . This is a contradiction, because , and . Therefore, are disconnected in .
In many proofs related to minimum spanning trees, we will perform the ‘exchange argument’. This means we would like to replace some edge with a different edge and we often need to ensure that the replacement will not introduce a cycle. Although obvious, the following lemma is a very convenient theoretical tool to justify the ‘exchange arguments’ we will perform in many following proofs.
Lemma 2: Double crossing lemma
Claim: Let be an undirected graph having the vertex set and edge set . Suppose that the cycle has an edge crossing the cut . Then there exists another edge, crossing the cut .
Remark: By drawing a sketch, it is easy to convince yourself this is obviously true.
Proof: (By contradiction)
Suppose that there exists no edge , crossing the cut . Let be the only edge crossing the cut . Suppose . Then write the cycle as , where is a path. By assumption, no edge in crosses the cut . Since , all endpoints of edges comprising the path belong to . But then, . Therefore . This is a contradiction since .
Therefore, there exists another edge, crossing the cut .
Lemma 3: Lonely edge corollary
Claim: Let be an undirected graph having the vertex set and edge set . Let be a cut of . If is the unique edge crossing the cut , cannot participate in any cycle of .
Proof: (By contradiction).
Suppose does participate in a cycle . By Double crossing lemma, there exists another edge, crossing the cut . This is a contradiction, since was the unique edge crossing the cut .
Theorem 9: Disconnecting a minimum spanning tree
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Let be a minimum spanning tree of . Let be an edge of the minimum spanning tree . Then is disconnected into two components , . Moreover, both and are subtrees of , and is a cut. In addition to everything above, no edge in crosses the cut .
Proof: Since is a tree, every path in is unique, is the unique path from to . If there was another path , entirely in , then would have a cycle . Therefore are disconnected in .
Clearly, is reachable from in and is reachable from in , so , .
Now we will show that . Let . Since is spanning, there exists a path in . Either contains e or it does not.
If does contain , then = . Since all edges of belong to , is reachable from in . Hence . If does not contain , then all edges of belong to , so is reachable from . Hence .
Therefore, . Trivially, so it must be .
We claim . Assume not. Then there exists and . So there exists a path , in . So and are connected in . This is a contradiction. Hence .
Therefore is a cut.
Since and are subgraphs of a tree , they must be subtrees. Otherwise, would not be a tree.
To conclude, we claim that no edge in crosses the cut. Suppose there were such an edge . Then and . So there are paths , in . Hence connects so are connected in . This is a contradiction. Therefore no edge in crosses the cut .
The following theorem, which is a converse of Theorem 6, will strengthen the relationships between cuts and crossing edges belonging to a minimum spanning tree.
Theorem 10: All edges comprising a minimum spanning tree are light
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Let be a minimum spanning tree of . Then for every edge in , there exists some cut of such that is a minimum weight edge crossing (A,B).
Proof: We will prove by contradiction.
Suppose that is a minimum spanning tree of and there exists an edge such that for every cut , amongst edges of crossing , is not amongst those of minimum weight.
Suppose . Consider . By Theorem 9, is disconnected into two components , . Moreover, both and are subtrees of , and is a cut. In addition to everything above, no edge in crosses the cut .
By assumption, is not a minimum weight edge crossing the cut in . Firstly, observe that cannot be the only edge crossing the cut , because then would be an edge of minimum weight crossing . So there exists another edge with .
Set . We claim is a spanning tree. Since no edge in crosses the cut , is the only edge in crossing the cut . By Lonely edge corollary, is acyclic.
Since is crossing the cut , we may assume , .
Consider . If both then there exist paths . If appears in path , then and are connected. Otherwise, is a path connecting to in .
Similarly, if , then are connected, by the same argument as above.
Suppose that . Since and are subtrees, there exist paths entirely in and entirely in . Then is a path in connecting . Therefore, is connected. Therefore, is a spanning tree.
Since is a minimum spanning tree, .
Observe that . This is a contradiction to the statement above. Therefore, for every edge in , there exists some cut of such that is a minimum weight edge crossing (A,B). This completes the proof.
Perhaps surprisingly, all minimum spanning trees are ‘similar’. More precisely, for any two minimum spanning trees given, there is a sequence of edge swaps converting one tree into the other one. What is more surprising is the fact that each ‘intermediate’ step tree is also a minimum spanning tree.
Theorem 10: similarity of minimum spanning trees
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Suppose that and are two minimum spanning trees of . Then there exists a sequence of minimum spanning trees where each consecutive pair of minimum spanning trees differ by only a single edge swap. In other words, there is a sequence of edge swaps converting into .
Proof:
To prove this claim, we will firstly introduce a lemma used to set up a proof by induction.
Lemma 11: similarity of minimum spanning trees
Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Suppose that and are two distinct minimum spanning trees of , differing by edges, . Then there exists a minimum spanning tree differing from by a single edge and differing from by edges.
Proof:
We will construct such a minimum spanning tree . Let be an edge in which is not present in . By Theorem 10, there exists a cut such that is a minimum cost edge crossing the cut .
Since is a minimum spanning tree, there exists a path such that is not on that path. Consider . Then is a cycle in . Since crosses the cut , by Double Crossing lemma, there exists a different edge on the cycle crossing the , say . That edge must be an edge of . Without loss of generality, the cycle is of the form . Since is a minimum cost edge crossing the cut , we have .
Consider . By Theorem 9, we have that is disconnected into two components , , . Moreover, both and are subtrees of , and is a cut. In addition to everything above, no edge in crosses the cut .
By the form of the cycle , we may assume . Clearly, crosses the cut .
Now consider . Since no edge in crosses the cut , is the unique edge crossing the cut in . By Lonely edge corollary, cannot form a cycle in T”. Since and are subtrees of , there is no cycle in .
Consider . If both then there exist paths . If appears in path , then and are connected. Otherwise, is a path connecting to in .
Similarly, if , then are connected, by the same argument as above.
Suppose that . Since and are subtrees, there exist paths entirely in and entirely in . Then is a path in connecting . Therefore, is connected. Since is acyclic, is a spanning tree.
Now . Since , . Since is a minimum spanning tree, so . Clearly, differs from by a single edge and includes an edge of . Since differs from by edges and consists of the same edge set as except , differs from by edges.
This completes the proof.
Now back to the original claim.
We prove the following statement by induction on the number of edges by which and differ:
Let , be two minimum spanning trees, differing by exactly edges. Then there exists a sequence of minimum spanning trees where each consecutive pair of minimum spanning trees differ by only a single edge.
Basis: Suppose . The claim holds vacously.
Induction Assumption: Let , be two minimum spanning trees, differing by exactly edges. Then there exists a sequence of minimum spanning trees where each consecutive pair of minimum spanning trees differ by only a single edge.
Induction Step: Now suppose that differs from by exactly edges. Set . By Lemma 11, there exists a minimum spanning tree such that differs from by a single edge and differs from by edges. By Induction, there exists a sequence of minimum spanning trees where each consecutive pair of minimum spanning trees differ by only a single edge. Now is a sequence where each consecutive pair of minimum spanning trees differ by only a single edge. This establishes the induction step.
This completes the proof.
Two (less) famous minimum spanning tree algorithms
“The heaviest on a cycle” algorithm
Consider the following algorithm. The input is a connected undirected graph with edge costs, not necessarily distinct. The algorithm proceeds in iterations.
The algorithm:
- Initially, all edges in .
- If the is a acyclic, then the algorithm halts and returns the set .
- Otherwise, it picks an arbitrary cycle of the current graph and scans all edges of . Let be the heaviest edge on . The algorithm removes from and jumps to the step 2.
Theorem 12: “The heaviest on a cycle” is optimal.
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Then “The heaviest on cycle” computes the minimum spanning tree of .
Remark: Observe that this algorithm constructs a minimum spanning tree by avoiding to make a decision about which edge to include. This is quite interesting.
Proof:
We will show that for every edge the algorithm discards, there exists at least one minimum spanning tree which does not contain that edge.
We will state and prove the lemma formalising the idea above.
Lemma 13: Every edge removed by “The heaviest on a cycle” is safe to remove.
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Let be an edge discarded by “The heaviest on a cycle”. Then there exists a minimum spanning tree of which does not include .
Proof: Let be an edge that is discarded by the algorithm. If all minimum spanning trees do not contain , then trivially, there exists at least one minimum spanning tree which does not contain that edge. We may suppose that is a minimum spanning tree of which does contain an edge . Since is discarded by the algorithm, there exists some cycle , of the form in .
Consider . By Disconnecting the minimum spanning tree, is disconnected into two components , . Moreover, both and are subtrees of , and is a cut. In addition to everything above, no edge in crosses the cut .
By the Double Crossing Lemma, there is an edge in crossing the cut . By the choice of the algorithm, the edge was the heaviest edge on the cycle , so we have .
Set . We claim is a spanning tree. Since no edge in crosses the cut , is the only edge in crossing the cut . By Lonely edge corollary, is acyclic.
Since is crossing the cut , we may assume , .
Consider . If both then there exist paths . If appears in path , then and are connected. Otherwise, is a path connecting to in .
Similarly, if , then are connected, by the same argument as above.
Suppose that . Since and are subtrees, there exist paths entirely in and entirely in . Then is a path in connecting . Therefore, is connected. Therefore, is a spanning tree.
Since is a minimum spanning tree, . Observe that . But then, so is a minimum spanning tree of , without an edge chosen by the algorithm.
We argue correctness of the algorithm above by splitting into two cases.
Suppose that the algorithm discards no edges. Since no edges were discarded, is acyclic. Since and is connected, is indeed a spanning tree. But then is a spanning tree, so it is a minimum spanning tree.
Now suppose that the algorithm discards edges , . Applying the lemma above to each , we conclude that there exists at least one minimum spanning tree of not including any of . The algorithm terminates when is acyclic. Since the algorithm removed only edges on cycles of , the graph must be connected. Therefore, is a spanning tree. We claim it is a minimum spanning tree. Since there exists at least one minimum spanning tree of not including any of , must be such a spanning tree.
The oldest minimum spanning tree algorithm - Boruvka’s algorithm
The final section of this post is devoted to the oldest minimum spanning tree algorithm - Boruvka’s algorithm. Similarly to Kruskal’s algorithm, Boruvka’s algorithm initially considers all vertices to be separate connected components and starts with an empty set of edges. For every connected component, the algorithm repeatedly finds the cheapest edge connecting that component to some other component. After finding such an edge for each component, Boruvka includes each such an edge in the partial set of edges defining the connected components of the next iteration. Here is a great animation taken from Wikipedia.
Theoretical results shown above are enough to analyse the oldest known minimum spanning tree algorithm.
Boruvka’s algorithm
Here is a pseudocode of .
Theorem 14: Boruvka’s algorithm is correct.
Claim: Let be an undirected connected graph having the vertex set and edge set . Let be an edge cost function. Then computes the minimum spanning tree of .
Proof: We will prove the following invariant.
Prior to each iteration of , the following three conditions hold:
- Every edge in belongs to some minimum spanning tree of .
- is acyclic.
- The set consists of all and only connected components of . For every , no edge in crosses the cut .
Initialisation: Prior to the first iteration, is empty so the first two conditions hold vacously. Since is empty, the third condition holds.
Maintenance: Now consider an iteration of .
Let in . Clearly, is a cut of . By the innermost loop, the set consists only of edges crossing the cut . Observe that an edge is the cheapest edge crossing the cut . Let . Then , . By the fundamental cut property, belongs to some minimum spanning tree of . Since no edge in crosses the cut , is the only edge in crossing the cut . By Lonely cut corollary, is acyclic. The algorithm adds to the set of edges to be added to . After the line , and become the same connected component which establishes the first condition of the invariant.
Let be a connected component in after update . We claim that no edge in crosses the . Suppose that there exists an edge crossing the cut. Then, since connected components of partition , there exists such that . But since , are both connected and , , are connected in . This means they belong to the same connected component of . Hence . This is a contradiction.
Termination: Firstly, we show terminates. By the proof of invariant maintenance, after the line , and become the same connected component, for some connected components of . Therefore, each edge addition to decreases the number of connected components of by 1. Hence, terminates.
By pseudocode, terminates if and only if has exactly one connected component. But then is connected. By invariant, is acyclic. Therefore, is a spanning tree of . By invariant, is a subset of edges of some minimum spanning tree. Since is a subset of edges of some minimum spanning tree and is a spanning tree of , must be a minimum spanning tree of .