<< Chapter < Page Chapter >> Page >

Disjoint-set forests

We now turn to disjoint-set forests, a data structure where each set is represented by a tree data structure where each node holds a reference to its parent node. Disjoint-set forests were first described by Bernard A. Galler and Michael J. Fisher in 1964, although their precise analysis took years.

In a disjoint-set forest, the representative of each set is the root of that set's tree. Find simply follows parent nodes until it reaches the root. Union combines two trees into one by attaching the root of one to the root of the other. One way of implementing these might be:

function MakeSet(x)

x.parent := x

function Find(x)

if x.parent == x

return x

else

return Find(x.parent)

function Union(x, y)

xRoot := Find(x)

yRoot := Find(y)

xRoot.parent := yRoot

In this naive form, this approach is no better than the linked-list approach, because the tree it creates can be highly unbalanced, but it can be enhanced in two ways.

The first way, called union by rank, is to always attach the smaller tree to the root of the larger tree, rather than vice versa. To evaluate which tree is larger, we use a simple heuristic called rank: one-element trees have a rank of zero, and whenever two trees of the same rank are unioned together, the result has one greater rank. Just applying this technique alone yields an amortized running-time of O(logn) per MakeSet, Union, or Find operation. Here are the improved MakeSet and Union:

function MakeSet(x)

x.parent := x

x.rank := 0

function Union(x, y)

xRoot := Find(x)

yRoot := Find(y)

if xRoot.rank>yRoot.rank

yRoot.parent := xRoot

else if xRoot.rank<yRoot.rank

xRoot.parent := yRoot

else if xRoot != yRoot

yRoot.parent := xRoot

xRoot.rank := xRoot.rank + 1

The second improvement, called path compression, is a way of flattening the structure of the tree whenever we use Find on it. The idea is that each node we visit on our way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, we make one traversal up to the root node, to find out what it is, and then make another traversal, making this root node the immediate parent of all nodes along the path. The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly. Here is the improved Find:

function Find(x)

if x.parent == x

return x

else

x.parent := Find(x.parent)

return x.parent

These two techniques complement each other; applied together, the amortized time per operation is only O(α(n)), where α(n) is the inverse of the function f(n) = A(n,n), and A is the extremely quickly-growing Ackermann function. Since α(n) is its inverse, it's less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

In fact, we can't get better than this: Fredman and Saks showed in 1989 that Ω(α(n)) words must be accessed by any disjoint-set data structure per operation on average.

Applications

Disjoint-set data structures arise naturally in many applications, particularly where some kind of partitioning or equivalence relation is involved, and this section discusses some of them.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Data structures and algorithms. OpenStax CNX. Jul 29, 2009 Download for free at http://cnx.org/content/col10765/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Data structures and algorithms' conversation and receive update notifications?

Ask