Gabrijel Boduljak

Theory of the Minimum Spanning Tree Problem [Part 2]

📅May 24, 2021 | ☕️25 minutes read

In this blog post, I will expand the theory developed in the Theory of the Minimum Spanning Tree Problem [Part 1]. I will focus mostly on the relationships between cuts and cycles which turns out to be a powerful way to analyse and even develop MST algorithms. We will analyse a perhaps (in)famous minimum spanning tree algorithm using the theory developed in the post. The last section is about the analysis of the oldest known minimum spanning tree algorithm.

Recently, I have started reading the Algorithms Illuminated Part 3, which is one of DIY books written by Tim Roughgarden, inspired by online courses that are currently running on the Coursera. However, the optional theory problems about minimum spanning trees were quite challenging and I felt the need to explore theory a bit further. For example, Theorem 10 about the similarity of minimum spanning trees is one of them. Both (in)famous minimum spanning tree algorithms mentioned are one of the optional theory problems.

The post is divided in three sections. In the first section, we will explore the relationship between cuts, connectedness and cycles in graphs. We will apply the results developed to prove existence of ‘interesting’ sequence of minimum spanning trees. The second section is about cuts, connectedness and cycles and their application to the analysis of the (in)famous minimum spanning tree algorithm whose correctness is not entirely obvious, in a way fascinating. The algorithm we will analyse constructs the minimum spanning tree by avoiding to make a decision about which edge to include, which is something I find quite remarkable. The last section is devoted to the first (oldest) minimum spanning tree known. We will analyse it using the theoretical results we develop for analysis in sections before.

Relationship between cuts, connectedness, cycles and sequences of MSTs

To begin, we recall some key ideas from the previous post.

Definition 1: a spanning subset and a spanning tree

Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Then a subgraph G=(V,A)G' = (V,A) where AEA \subseteq E is a spanning subgraph of the graph G if the graph G=(V,A)G'=(V, A) is connected. If GG' is a tree, we say GG' is a spanning tree of the graph G.

Definition 3: a graph cut, crossing edge

Let G=(V,E)G=(V,E) be a graph having the vertex set VV and edge set EE. A cut (S,VS)(S, V-S) is a partition of VV. An edge (u,v)(u,v) crosses the cut if uS,v(VS)u \in S, v \in (V-S).

Theorem 6: The fundamental cut property of the minimum spanning tree problem

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Let (S,VS)(S,V-S) be a cut of the graph GG. Let AA be the set of edges which belong to some minimum spanning tree and assume no edge in AA crosses the cut (S,VS)(S, V-S). Let (u,v)(u,v) be the edge crossing the cut (S,VS)(S, V-S) whose cost ww is minimal amongst edges crossing the cut (S,VS)(S, V-S). Then an edge (u,v)(u,v) belongs to some minimum spanning tree of GG.

However, for the purposes of this post, we need stronger insight into what happens when an edge of a minimum spanning tree is removed.

In this post, we will need a bit more results about the relationship between cuts, cycles and connectivity. Lemma names and proof ideas are taken from Tim Roughgarden’s above mentioned book.

The following lemma gives the characterisation of connectivity of the graph with respect to cuts.

Lemma 1: Empty cut lemma

Claim Let G=(V,E)G=(V,E) be an undirected graph having the vertex set VV and edge set EE. Then GG is not connected if and only if there exists a cut (A,B)(A,B) such that no edge in EE crosses the cut (A,B)(A,B).

Proof:

(\longrightarrow) Suppose that GG is not connected. Then there exist a,bVa, b \in V such that there is no path aba \rightsquigarrow b. Define A={xV,x reachable from a}A = \{ x \in V, \text{x reachable from a} \} and define B={xV,x not reachable from a}B = \{ x \in V, \text{x not reachable from a} \}. We claim (A,B)(A,B) is a graph cut. Since a,ba, b are disconnected in GG, AB=A \cap B = \emptyset. Clearly, aA,bBa \in A, b \in B so A,BA, B are nonempty. Also, V=ABV = A \cup B. Therefore, (A,B)(A,B) is indeed a cut. It remains to show no edge in EE crosses the cut (A,B)(A,B). Suppose there were such an edge, say (u,v)E,uA,vB(u,v) \in E, u \in A, v \in B. By definition of AA, there exists a path aua \rightsquigarrow u. By definition of BB, there exists no path vav \rightsquigarrow a. But auva \rightsquigarrow u \rightarrow v is a path ava \rightsquigarrow v, so vAv \in A. This is a contradiction. Therefore, no edge crosses the cut (A,B)(A,B).

(\longleftarrow) Suppose that (A,B)(A,B) is a cut such that no edge crosses it. Then pick aAa \in A, bBb \in B. We claim there is no path aba \rightsquigarrow b. Suppose there exists such a path. Since no edge crosses the cut (A,B)(A,B), every edge in the path has both endpoints in AAm, since aAa \in A. But then, it must be bAb \in A. This is a contradiction, because bBb \in B, and AB=A \cap B = \emptyset. Therefore, a,ba, b are disconnected in GG. \blacksquare

In many proofs related to minimum spanning trees, we will perform the ‘exchange argument’. This means we would like to replace some edge with a different edge and we often need to ensure that the replacement will not introduce a cycle. Although obvious, the following lemma is a very convenient theoretical tool to justify the ‘exchange arguments’ we will perform in many following proofs.

Lemma 2: Double crossing lemma

Claim: Let G=(V,E)G=(V,E) be an undirected graph having the vertex set VV and edge set EE. Suppose that the cycle CEC \subseteq E has an edge ee crossing the cut (A,B)(A,B). Then there exists another edge, vC,vev \in C, v \neq e crossing the cut (A,B)(A,B).

Remark: By drawing a sketch, it is easy to convince yourself this is obviously true.

Proof: (By contradiction)

Suppose that there exists no edge vv, vC,vev \in C, v \neq e crossing the cut (A,B)(A,B). Let e=(u,v)e = (u, v) be the only edge crossing the cut (A,B)(A,B). Suppose uA,vBu \in A, v \in B. Then write the cycle CC as uvuu \rightarrow v \rightsquigarrow u, where vuv \rightsquigarrow u is a path. By assumption, no edge in vuv \rightsquigarrow u crosses the cut (A,B)(A,B). Since vBv \in B, all endpoints of edges comprising the path vuv \rightsquigarrow u belong to BB. But then, uBu \in B. Therefore uABu \in A \cap B. This is a contradiction since AB=A \cap B = \emptyset.

Therefore, there exists another edge, vC,vev \in C, v \neq e crossing the cut (A,B)(A,B). \blacksquare

Lemma 3: Lonely edge corollary

Claim: Let G=(V,E)G=(V,E) be an undirected graph having the vertex set VV and edge set EE. Let (A,B)(A,B) be a cut of GG. If eEe \in E is the unique edge crossing the cut (A,B)(A,B), ee cannot participate in any cycle of GG.

Proof: (By contradiction).

Suppose ee does participate in a cycle CEC \subseteq E. By Double crossing lemma, there exists another edge, vC,vev \in C, v \neq e crossing the cut (A,B)(A,B). This is a contradiction, since ee was the unique edge crossing the cut (A,B)(A,B). \blacksquare

Theorem 9: Disconnecting a minimum spanning tree

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Let TT be a minimum spanning tree of GG. Let e=(u,v)e=(u,v) be an edge of the minimum spanning tree TT. Then T{e}T - \{ e \} is disconnected into two components TU={xV, x reachable from u in T - {e}}T_U = \{ x \in V, \text{ x reachable from u in T - \{e\}} \}, TV={xV, x reachable from v in T - {e}}T_V = \{ x \in V, \text{ x reachable from v in T - \{e\}} \}. Moreover, both (TU,T{e})(T_U, T -\{ e\}) and (TV,T{e})(T_V, T -\{ e\}) are subtrees of TT, and (TU,TV)(T_U, T_V) is a cut. In addition to everything above, no edge in T{e}T - \{e\} crosses the cut (TU,TV)(T_U, T_V).

Proof: Since TT is a tree, every path in TT is unique, ee is the unique path from uu to vv. If there was another path uvu \rightsquigarrow v, entirely in T{e}T - \{ e \}, then TT would have a cycle uvuu \rightsquigarrow v \rightarrow u. Therefore u,vu, v are disconnected in T{e}T - \{ e \}.

Clearly, uu is reachable from uu in T{e}T - \{ e \} and vv is reachable from vv in T{e}T - \{ e \}, so uTUu \in T_U, vTVv \in T_V.

Now we will show that V=TUTVV= T_U \cup T_V. Let aVa \in V. Since TT is spanning, there exists a path ava \rightsquigarrow v in TT. Either ava \rightsquigarrow v contains e or it does not.

If ava \rightsquigarrow v does contain ee, then ava \rightsquigarrow v = auva \rightsquigarrow u \rightarrow v. Since all edges of aua \rightsquigarrow u belong to T{e}T - \{ e \}, aa is reachable from uu in T{e}T - \{ e \}. Hence aUa \in U. If ava \rightsquigarrow v does not contain ee, then all edges of ava \rightsquigarrow v belong to T{e}T - \{ e \}, so aa is reachable from vv. Hence aVa \in V.

Therefore, VTUTVV \subseteq T_U \cup T_V. Trivially, TUTVVT_U \cup T_V \subseteq V so it must be V=TUTVV= T_U \cup T_V.

We claim TUTV=T_U \cap T_V = \emptyset. Assume not. Then there exists tTUt \in T_U and tTVt \in T_V. So there exists a path utu \rightsquigarrow t, tvt \rightsquigarrow v in T{e}T - \{ e\}. So uu and vv are connected in T{e}T - \{ e \}. This is a contradiction. Hence TUTV=T_U \cap T_V = \emptyset.

Therefore (TU,TV)(T_U, T_V) is a cut.

Since (TU,T{e})(T_U, T -\{ e\}) and (TV,T{e})(T_V, T -\{ e\}) are subgraphs of a tree TT, they must be subtrees. Otherwise, TT would not be a tree.

To conclude, we claim that no edge in T{e}T - \{e\} crosses the cut. Suppose there were such an edge (x,y)(x,y). Then xTUx \in T_U and yTVy \in T_V. So there are paths uxu \rightsquigarrow x, yvy \rightsquigarrow v in T{e}T - \{e\}. Hence uxyvu \rightsquigarrow x \rightarrow y \rightsquigarrow v connects u,vu, v so u,vu,v are connected in T{e}T - \{e\}. This is a contradiction. Therefore no edge in T{e}T - \{e\} crosses the cut (TU,TV)(T_U, T_V). \blacksquare

The following theorem, which is a converse of Theorem 6, will strengthen the relationships between cuts and crossing edges belonging to a minimum spanning tree.

Theorem 10: All edges comprising a minimum spanning tree are light

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Let TT be a minimum spanning tree of GG. Then for every edge ee in TT, there exists some cut (A,B)(A,B) of GG such that ee is a minimum weight edge crossing (A,B).

Proof: We will prove by contradiction.

Suppose that TT is a minimum spanning tree of GG and there exists an edge ee such that for every cut (A,B)(A,B), amongst edges of GG crossing (A,B)(A,B), ee is not amongst those of minimum weight.

Suppose e=(u,v)e = (u, v). Consider T{e}T - \{e \}. By Theorem 9, T{e}T - \{ e \} is disconnected into two components TU={xV, x reachable from u in T - {e}}T_U = \{ x \in V, \text{ x reachable from u in T - \{e\}} \}, TV={xV, x reachable from v in T - {e}}T_V = \{ x \in V, \text{ x reachable from v in T - \{e\}} \}. Moreover, both (TU,T{e})(T_U, T -\{ e\}) and (TV,T{e})(T_V, T -\{ e\}) are subtrees of TT, and (TU,TV)(T_U, T_V) is a cut. In addition to everything above, no edge in T{e}T - \{e\} crosses the cut (TU,TV)(T_U, T_V).

By assumption, ee is not a minimum weight edge crossing the cut (TU,TV)(T_U, T_V) in GG. Firstly, observe that ee cannot be the only edge crossing the cut (TU,TV)(T_U, T_V), because then ee would be an edge of minimum weight crossing (TU,TV)(T_U, T_V). So there exists another edge (x,y)E,(x,y)e(x, y) \in E, (x, y) \neq e with w(x,y)<w(e)w(x,y) < w(e).

Set T=(T{e}){(x,y)}T' = (T - \{e\}) \cup \{ (x,y) \}. We claim TT' is a spanning tree. Since no edge in T{e}T - \{e\} crosses the cut (TU,TV)(T_U, T_V), (x,y)(x,y) is the only edge in (T{e}){(x,y)}(T - \{e\}) \cup \{ (x,y) \} crossing the cut (TU,TV)(T_U, T_V). By Lonely edge corollary, TT' is acyclic.

Since (x,y)(x,y) is crossing the cut (TU,TV)(T_U, T_V), we may assume xTUx \in T_U, yTVy \in T_V.

Consider a,bVa,b \in V. If both a,bTUa, b \in T_U then there exist paths au,bua \rightsquigarrow u, b \rightsquigarrow u. If bb appears in path aua \rightsquigarrow u, then aa and bb are connected. Otherwise, auba \rightsquigarrow u \rightsquigarrow b is a path connecting aa to bb in TT'.

Similarly, if a,bTVa, b \in T_V, then a,ba, b are connected, by the same argument as above.

Suppose that aTU,bTVa \in T_U, b \in T_V. Since TUT_U and TVT_V are subtrees, there exist paths axa \rightsquigarrow x entirely in TUT_U and yby \rightsquigarrow b entirely in TVT_V. Then axyba \rightsquigarrow x \rightarrow y \rightsquigarrow b is a path in TT' connecting a,ba, b. Therefore, TT' is connected. Therefore, TT' is a spanning tree.

Since TT is a minimum spanning tree, w(T)w(T)w(T) \leq w(T').

Observe that w(T)=w(T)w(e)+w(x,y)<w(T)w(T') = w(T) - w(e) + w(x,y) < w(T). This is a contradiction to the statement above. Therefore, for every edge ee in TT, there exists some cut (A,B)(A,B) of GG such that ee is a minimum weight edge crossing (A,B). This completes the proof. \blacksquare

Perhaps surprisingly, all minimum spanning trees are ‘similar’. More precisely, for any two minimum spanning trees given, there is a sequence of edge swaps converting one tree into the other one. What is more surprising is the fact that each ‘intermediate’ step tree is also a minimum spanning tree.

Theorem 10: similarity of minimum spanning trees

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Suppose that TT and TT' are two minimum spanning trees of GG. Then there exists a sequence of minimum spanning trees T=T0,T1,...,Tn=TT=T_0, T_1, ..., T_n = T' where each consecutive pair of minimum spanning trees Ti,Ti+1T_i, T_{i+1} differ by only a single edge swap. In other words, there is a sequence of edge swaps converting TT into TT'.

Proof:

To prove this claim, we will firstly introduce a lemma used to set up a proof by induction.

Lemma 11: similarity of minimum spanning trees

Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Suppose that TT and TT' are two distinct minimum spanning trees of GG, differing by nn edges, nNn \in \mathbb{N}. Then there exists a minimum spanning tree TT'' differing from TT by a single edge and differing from TT' by n1n-1 edges.

Proof:

We will construct such a minimum spanning tree TT''. Let e=(u,v)e=(u,v) be an edge in TT' which is not present in TT. By Theorem 10, there exists a cut (A,B)(A,B) such that ee is a minimum cost edge crossing the cut (A,B)(A,B).

Since TT is a minimum spanning tree, there exists a path uvu \rightsquigarrow v such that ee is not on that path. Consider T{e}T \cup \{ e \}. Then uvuu \rightsquigarrow v \rightarrow u is a cycle in T{e}T \cup \{ e \}. Since ee crosses the cut (A,B)(A,B), by Double Crossing lemma, there exists a different edge on the cycle uvuu \rightsquigarrow v \rightarrow u crossing the (A,B)(A,B), say e=(u,v)e'=(u', v'). That edge must be an edge of TT. Without loss of generality, the cycle is of the form uuvvuu \rightsquigarrow u' \rightarrow v' \rightsquigarrow v \rightarrow u. Since ee is a minimum cost edge crossing the cut (A,B)(A,B), we have w(e)w(e)w(e) \leq w(e').

Consider T{e}T - \{ e' \}. By Theorem 9, we have that T{e}T - \{ e' \} is disconnected into two components TU,TVT_U, T_V, TU={xV, x reachable from u’ in T - {e’}}T_U = \{ x \in V, \text{ x reachable from u' in T - \{e'\}} \}, TV={xV, x reachable from v’ in T - {e’}}T_V = \{ x \in V, \text{ x reachable from v' in T - \{e'\}} \}. Moreover, both (TU,T{e})(T_U, T -\{ e'\}) and (TV,T{e})(T_V, T -\{ e'\}) are subtrees of TT, and (TU,TV)(T_U, T_V) is a cut. In addition to everything above, no edge in T{e}T - \{e'\} crosses the cut (TU,TV)(T_U, T_V).

By the form of the cycle uuvvuu \rightsquigarrow u' \rightarrow v' \rightsquigarrow v \rightarrow u, we may assume uTU,vTVu \in T_U, v \in T_V. Clearly, ee crosses the cut (TU,TV)(T_U, T_V).

Now consider T=(T{e}){e}T'' = (T - \{e'\}) \cup \{ e \}. Since no edge in T{e}T - \{e'\} crosses the cut (TU,TV)(T_U, T_V), ee is the unique edge crossing the cut (TU,TV)(T_U, T_V) in TT''. By Lonely edge corollary, ee cannot form a cycle in T”. Since (TU,T{e})(T_U, T -\{ e'\}) and (TV,T{e})(T_V, T -\{ e'\}) are subtrees of TT, there is no cycle in TT''.

Consider a,bVa,b \in V. If both a,bTUa, b \in T_U then there exist paths au,bua \rightsquigarrow u', b \rightsquigarrow u'. If bb appears in path aua \rightsquigarrow u', then aa and bb are connected. Otherwise, auba \rightsquigarrow u' \rightsquigarrow b is a path connecting aa to bb in TT'.

Similarly, if a,bTVa, b \in T_V, then a,ba, b are connected, by the same argument as above.

Suppose that aTU,bTVa \in T_U, b \in T_V. Since TUT_U and TVT_V are subtrees, there exist paths aua \rightsquigarrow u entirely in TUT_U and vbv \rightsquigarrow b entirely in TVT_V. Then auvba \rightsquigarrow u \rightarrow v \rightsquigarrow b is a path in TT'' connecting a,ba, b. Therefore, TT'' is connected. Since TT'' is acyclic, TT'' is a spanning tree.

Now w(T)=w(T)w(e)+w(e)w(T'') = w(T) - w(e') + w(e). Since w(e)w(e)w(e) \leq w(e'), w(T)w(T)w(T'') \leq w(T). Since TT is a minimum spanning tree, w(T)w(T)w(T'') \geq w(T) so w(T)=w(T)w(T'') = w(T). Clearly, TT'' differs from TT by a single edge and includes an edge ee of TT'. Since TT differs from TT' by nn edges and TT'' consists of the same edge set as TT except ee, TT'' differs from TT by n1n-1 edges.

This completes the proof. \blacksquare

Now back to the original claim.

We prove the following statement by induction on the number of edges by which TT and TT' differ:

Let TT, TT' be two minimum spanning trees, differing by exactly nn edges. Then there exists a sequence of minimum spanning trees T=T0,T1,...,Tn+1=TT=T_0, T_1, ..., T_n+1 = T' where each consecutive pair of minimum spanning trees Ti,Ti+1T_i, T_{i+1} differ by only a single edge.

Basis: Suppose T=TT = T'. The claim holds vacously.

Induction Assumption: Let TT, TT' be two minimum spanning trees, differing by exactly nn edges. Then there exists a sequence of minimum spanning trees T=T0,T1,...,Tn+1=TT=T_0, T_1, ..., T_{n+1} = T' where each consecutive pair of minimum spanning trees Ti,Ti+1T_i, T_{i+1} differ by only a single edge.

Induction Step: Now suppose that TT differs from TT' by exactly n+1n + 1 edges. Set T0=TT_0 = T. By Lemma 11, there exists a minimum spanning tree T1T_1 such that T1T_1 differs from TT by a single edge and T1T_1 differs from TT' by nn edges. By Induction, there exists a sequence of minimum spanning trees T1,T2,Tn+1=TT_1, T_2, \ldots T_{n+1} = T' where each consecutive pair of minimum spanning trees Ti,Ti+1T_i, T_{i+1} differ by only a single edge. Now T0=T,T1,T2,Tn+1=TT_0 = T, T_1, T_2, \ldots T_{n+1}= T' is a sequence where each consecutive pair of minimum spanning trees Ti,Ti+1T_i, T_{i+1} differ by only a single edge. This establishes the induction step.

This completes the proof. \blacksquare

Two (less) famous minimum spanning tree algorithms

“The heaviest on a cycle” algorithm

Consider the following algorithm. The input is a connected undirected graph G=(V,E)G=(V,E) with edge costs, not necessarily distinct. The algorithm proceeds in iterations.

The algorithm:

  1. Initially, all edges in T=ET = E.
  2. If the (V,T)(V,T) is a acyclic, then the algorithm halts and returns the set TT.
  3. Otherwise, it picks an arbitrary cycle CC of the current graph and scans all edges of CC. Let (u,v)(u,v) be the heaviest edge on CC. The algorithm removes (u,v)(u,v) from TT and jumps to the step 2.

Theorem 12: “The heaviest on a cycle” is optimal.

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Then “The heaviest on cycle” computes the minimum spanning tree of GG.

Remark: Observe that this algorithm constructs a minimum spanning tree by avoiding to make a decision about which edge to include. This is quite interesting.

Proof:

We will show that for every edge the algorithm discards, there exists at least one minimum spanning tree which does not contain that edge.

We will state and prove the lemma formalising the idea above.

Lemma 13: Every edge removed by “The heaviest on a cycle” is safe to remove.

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Let (u,v)(u,v) be an edge discarded by “The heaviest on a cycle”. Then there exists a minimum spanning tree of GG which does not include (u,v)(u,v).

Proof: Let (u,v)(u, v) be an edge that is discarded by the algorithm. If all minimum spanning trees do not contain (u,v)(u,v), then trivially, there exists at least one minimum spanning tree which does not contain that edge. We may suppose that TT is a minimum spanning tree of GG which does contain an edge (u,v)(u,v). Since (u,v)(u,v) is discarded by the algorithm, there exists some cycle CC, of the form uvuu \rightarrow v \rightsquigarrow u in GG.

Consider TT. By Disconnecting the minimum spanning tree, T{(u,v)}T - \{ (u,v) \} is disconnected into two components TU={xV, x reachable from u in T - {(u,v)}}T_U = \{ x \in V, \text{ x reachable from u in T - \{(u,v)\}} \}, TV={xV, x reachable from v in T - {(u,v)}}T_V = \{ x \in V, \text{ x reachable from v in T - \{(u,v)\}} \}. Moreover, both (TU,T{(u,v)})(T_U, T -\{ (u,v) \}) and (TV,T{(u,v)})(T_V, T -\{ (u,v) \}) are subtrees of TT, and (TU,TV)(T_U, T_V) is a cut. In addition to everything above, no edge in T{(u,v)}T - \{(u,v)\} crosses the cut (TU,TV)(T_U, T_V).

By the Double Crossing Lemma, there is an edge (x,y)(u,v)(x,y) \neq (u,v) in vuv \rightsquigarrow u crossing the cut (TU,TV)(T_U, T_V). By the choice of the algorithm, the edge (u,v)(u,v) was the heaviest edge on the cycle CC, so we have c(x,y)c(u,v)c(x,y) \leq c(u,v).

Set T=(T{(u,v)}){(x,y)}T' = (T - \{(u,v)\}) \cup \{ (x,y) \}. We claim TT' is a spanning tree. Since no edge in T{(u,v)}T - \{(u,v)\} crosses the cut (TU,TV)(T_U, T_V), (x,y)(x,y) is the only edge in (T{(u,v)}){(x,y)}(T - \{(u,v)\}) \cup \{ (x,y) \} crossing the cut (TU,TV)(T_U, T_V). By Lonely edge corollary, TT' is acyclic.

Since (x,y)(x,y) is crossing the cut (TU,TV)(T_U, T_V), we may assume xTUx \in T_U, yTVy \in T_V.

Consider a,bVa,b \in V. If both a,bTUa, b \in T_U then there exist paths au,bua \rightsquigarrow u, b \rightsquigarrow u. If bb appears in path aua \rightsquigarrow u, then aa and bb are connected. Otherwise, auba \rightsquigarrow u \rightsquigarrow b is a path connecting aa to bb in TT'.

Similarly, if a,bTVa, b \in T_V, then a,ba, b are connected, by the same argument as above.

Suppose that aTU,bTVa \in T_U, b \in T_V. Since TUT_U and TVT_V are subtrees, there exist paths axa \rightsquigarrow x entirely in TUT_U and yby \rightsquigarrow b entirely in TVT_V. Then axyba \rightsquigarrow x \rightarrow y \rightsquigarrow b is a path in TT' connecting a,ba, b. Therefore, TT' is connected. Therefore, TT' is a spanning tree.

Since TT is a minimum spanning tree, w(T)w(T)w(T) \leq w(T'). Observe that w(T)=w(T)c(e)+c(x,y)w(T)w(T') = w(T) - c(e) + c(x,y) \leq w(T). But then, w(T)=w(T)w(T)=w(T') so TT' is a minimum spanning tree of GG, without an edge (u,v)(u,v) chosen by the algorithm. \blacksquare

We argue correctness of the algorithm above by splitting into two cases.

Suppose that the algorithm discards no edges. Since no edges were discarded, (V,T)(V,T) is acyclic. Since T=ET=E and GG is connected, (V,T)(V,T) is indeed a spanning tree. But then GG is a spanning tree, so it is a minimum spanning tree.

Now suppose that the algorithm discards edges {e1,e2,ek}\{ e_1, e_2, \ldots e_k \}, kNk \in \mathbb{N}. Applying the lemma above to each ei,1ike_i, 1 \leq i \leq k, we conclude that there exists at least one minimum spanning tree of GG not including any of {e1,e2,ek}\{ e_1, e_2, \ldots e_k \}. The algorithm terminates when (V,T)(V,T) is acyclic. Since the algorithm removed only edges on cycles of (V,T)(V,T), the graph (VT)(V_T) must be connected. Therefore, (V,T)(V,T) is a spanning tree. We claim it is a minimum spanning tree. Since there exists at least one minimum spanning tree of GG not including any of {e1,e2,ek}\{ e_1, e_2, \ldots e_k \}, (V,T)(V,T) must be such a spanning tree. \blacksquare

The oldest minimum spanning tree algorithm - Boruvka’s algorithm

The final section of this post is devoted to the oldest minimum spanning tree algorithm - Boruvka’s algorithm. Similarly to Kruskal’s algorithm, Boruvka’s algorithm initially considers all vertices to be separate connected components and starts with an empty set of edges. For every connected component, the algorithm repeatedly finds the cheapest edge connecting that component to some other component. After finding such an edge for each component, Boruvka includes each such an edge in the partial set of edges defining the connected components of the next iteration. Here is a great animation taken from Wikipedia.

Theoretical results shown above are enough to analyse the oldest known minimum spanning tree algorithm.

Boruvka’s algorithm

Here is a pseudocode of Boruvka’s algorithm\texttt{Boruvka's algorithm}.

boruvka pseudocode

Theorem 14: Boruvka’s algorithm is correct.

Claim: Let G=(V,E)G=(V,E) be an undirected connected graph having the vertex set VV and edge set EE. Let w:ERw : E \to \mathbb{R} be an edge cost function. Then Boruvka’s algorithm\texttt{Boruvka's algorithm} computes the minimum spanning tree of GG.

Proof: We will prove the following invariant.

Prior to each iteration of Boruvka’s algorithm\texttt{Boruvka's algorithm}, the following three conditions hold:

  • Every edge in TT belongs to some minimum spanning tree of GG.
  • (V,T)(V,T) is acyclic.
  • The set CC consists of all and only connected components of (V,T)(V,T). For every CkCC_k \in C, no edge in TT crosses the cut (Ck,VCk)(C_k, V - C_k).

Initialisation: Prior to the first iteration, TT is empty so the first two conditions hold vacously. Since TT is empty, the third condition holds.

Maintenance: Now consider an iteration of Boruvka’s algorithm\texttt{Boruvka's algorithm}.

Let CkC_k in CC. Clearly, (Ck,VCk)(C_k, V - C_k) is a cut of GG. By the innermost loop, the set SS consists only of edges crossing the cut (Ck,VCk)(C_k, V - C_k). Observe that an edge ee is the cheapest edge crossing the cut (Ck,VCk)(C_k, V - C_k). Let e=(u,v)e = (u,v). Then uCku \in C_k, vCl,klv \in C_l, k \neq l. By the fundamental cut property, T{e}T \cup \{ e \} belongs to some minimum spanning tree of GG. Since no edge in TT crosses the cut (Ck,VCk)(C_k, V - C_k), ee is the only edge in T{e}T \cup \{ e \} crossing the cut (Ck,VCk)(C_k, V - C_k). By Lonely cut corollary, T{e}T \cup \{ e \} is acyclic. The algorithm adds ee to the set of edges TT' to be added to TT. After the line TTT \cup T', ClC_l and CkC_k become the same connected component which establishes the first condition of the invariant.

Let CnC_n be a connected component in CC after update T=TTT = T \cup T'. We claim that no edge in TT crosses the (Cn,VCn)(C_n, V - C_n). Suppose that there exists an edge (a,b)(a,b) crossing the cut. Then, since connected components of (V,T)(V,T) partition VV, there exists Cm,mnC_m, m \neq n such that bCmb \in C_m. But since (Cn,T)(C_n, T), (Cm,T)(C_m, T) are both connected and aCna \in C_n, bCmb \in C_m, a,ba, b are connected in (V,T)(V,T). This means they belong to the same connected component of (V,T)(V,T). Hence Cn=CmC_n = C_m. This is a contradiction.

Termination: Firstly, we show Boruvka’s algorithm\texttt{Boruvka's algorithm} terminates. By the proof of invariant maintenance, after the line TTT \cup T', ClC_l and CkC_k become the same connected component, for some connected components Cl,CkC_l, C_k of (V,T)(V,T). Therefore, each edge addition to TT decreases the number of connected components of (V,T)(V,T) by 1. Hence, Boruvka’s algorithm\texttt{Boruvka's algorithm} terminates.

By pseudocode, Boruvka’s algorithm\texttt{Boruvka's algorithm} terminates if and only if CC has exactly one connected component. But then (V,T)(V,T) is connected. By invariant, (V,T)(V,T) is acyclic. Therefore, (V,T)(V,T) is a spanning tree of GG. By invariant, TT is a subset of edges of some minimum spanning tree. Since TT is a subset of edges of some minimum spanning tree and (V,T)(V,T) is a spanning tree of GG, (V,T)(V,T) must be a minimum spanning tree of GG. \blacksquare


Hi👋! I am a final year Advanced Computer Science MSc student at 🎓University of Oxford. This is a place where I (will hopefully) write about topics I find interesting. I plan to write about my personal projects and learning experiences, as well as some random tech topics.
💻github | ✉️contact me | 📚my reading list | 👨‍💻my projects

Built with a lot of
© 2023