18.04.2013 Views

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Set Data Structures<br />

profitably thought of as a string, so see Sections and .<br />

When each subset has cardinality exactly two, they form edges in a graph whose vertices are the<br />

universal set. A system of subsets with no restrictions on the cardinality of its members is called a<br />

hypergraph. It often can be profitable to consider whether your problem has a graph-theoretical analogy,<br />

like connected components or shortest path in a hypergraph.<br />

Your primary alternatives for representing arbitrary systems of subsets are:<br />

● Bit vectors - If your universal set U contains n items, an n-bit vector or array can represent any<br />

subset . Bit i will be 1 if , otherwise bit i is 0. Since only one bit is used per element,<br />

bit vectors can be very space efficient for surprisingly large values of |U|. Element insertion and<br />

deletion simply flips the appropriate bit. Intersection and union are done by ``and-ing'' or ``or-ing''<br />

the bits together. <strong>The</strong> only real drawback of a bit vector is that for sparse subsets, it takes O(n)<br />

time to explicitly identify all members of S.<br />

● Containers or dictionaries - A subset can also be represented using a linked list, array, binary<br />

tree, or dictionary containing exactly the elements in the subset. No notion of a fixed universal set<br />

is needed for such a data structure. For sparse subsets, dictionaries can be more space and time<br />

efficient than bit vectors and easier to work with and program. For efficient union and intersection<br />

operations, it pays to keep the elements in each subset sorted, so a linear-time traversal through<br />

both subsets identifies all duplicates.<br />

In many applications, the subsets are all pairwise disjoint, meaning that each element is in exactly one<br />

subset. For example, consider maintaining the connected components of a graph or the party affiliations<br />

of politicians. Each vertex/hack is in exactly one component/party. Such a system of subsets is called a<br />

set partition. <strong>Algorithm</strong>s for constructing partitions of a given set are provided in Section .<br />

For data structures, the primary issue is maintaining a given set partition as things change over time,<br />

perhaps as edges are added or party members defect. <strong>The</strong> queries we are interested in include ``which set<br />

is a particular item in?'' and ``are two items in the same set?'' as we modify the set by (1) changing one<br />

item, (2) merging or unioning two sets, or (3) breaking a set apart. Your primary options are:<br />

● Dictionary with subset attribute - If each item in a binary tree has associated a field recording the<br />

name of the subset it is in, set identification queries and single element modifications can be<br />

performed in the time it takes to search in the dictionary, typically . However, operations<br />

like performing the union of two subsets take time proportional to (at least) the sizes of the<br />

subsets, since each element must have its name changed. <strong>The</strong> need to perform such union<br />

operations quickly is the motivation for the ...<br />

● Union-Find Data Structure - Suppose we represent a subset using a rooted tree, where each node<br />

points to its parent instead of its children. Further, let the name of the subset be the name of the<br />

item that is the root. Finding out which subset we are in is simple, for we keep traversing up the<br />

file:///E|/BOOK/BOOK3/NODE133.HTM (2 of 4) [19/1/2003 1:30:10]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!