Chapter 6. Advanced Data Structures (Search Trees)
Chapter 6. Advanced Data Structures (Search Trees)
Chapter 6. Advanced Data Structures (Search Trees)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-1<br />
<strong>Chapter</strong> <strong>6.</strong> <strong>Advanced</strong> <strong>Data</strong> <strong>Structures</strong> (<strong>Search</strong> <strong>Trees</strong>)<br />
Balanced Binary <strong>Search</strong> <strong>Trees</strong> [S, Ü11.2]<br />
Worst-case running time for BST is ¢´µ ¢´Òµ.<br />
Đ Can be improved to ¢´ÐÓ Òµ if we can guarantee that ¢´ÐÓ Òµ—balanced BST.<br />
Idea: restructure tree if necessary (when insert or delete) to balance its height.<br />
Ð Methods: AVL tree, 2-3 tree, red-black tree, B tree, etc.<br />
Recall that a set Ë has many different BST representations.<br />
Let ´Ì µ denote the height of a BST Ì , with subtrees Ì Ð and Ì Ö , and assume that if Ì is empty<br />
then ´Ì µ ½.<br />
Definition 1<br />
Ì is height balanced of degree k, denoted HB(), if<br />
(1) ´Ì Ð µ ´Ì Ö µ , and<br />
(2) Ì Ð and Ì Ö are HB().<br />
Definition 2<br />
Ì is height balanced, denoted HB, if it is HB(1).<br />
Definition 3<br />
The balance factor of Ì is defined as<br />
´Ì µ ´Ì Ð µ ´Ì Ö µ<br />
Đ Ì is HB µ ´Ì µ = –1, 0, or +1.<br />
Ì is HB(0) µ Ì is a CBT.<br />
-2<br />
0<br />
1<br />
0<br />
0<br />
0<br />
HB<br />
HB(2)<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-2<br />
AVL Tree [S, Ü11.2]<br />
Definition 4 (Adelson/Velskii/Landis)<br />
An empty BST is an AVL tree. If Ì is a nonempty BST with Ì Ð and Ì Ö as its left and right subtrees,<br />
then Ì is an AVL tree iff<br />
1) Ì Ð and Ì Ö are AVL trees, and<br />
2) ´Ì Ð µ ´Ì Ö µ ½.<br />
HB BST AVL tree<br />
Theorem 1<br />
The height of an AVL tree Ì with Ò nodes satisfies<br />
That is, ´Ì µ Ç´ÐÓ Òµ.<br />
´Ì µ ½ ÐÓ Ò<br />
Proof :LetÆ be the minimum number of nodes in an AVL tree Ì with height . Then<br />
Æ ¼ ½<br />
Æ ½ ¾ and<br />
Æ Æ ½ · Æ ¾ · ½ ¾<br />
It can be verified easily (by induction) that<br />
Therefore,<br />
Ô<br />
ÐÓ´<br />
Ò ·¾ ½<br />
<br />
<br />
Æ ·¾ ½<br />
Ô<br />
Ô<br />
Ô<br />
½<br />
´ ½ · <br />
µ ·¾ ´ ½ <br />
µ ·¾ ℄ ½<br />
¾<br />
¾<br />
Ô<br />
Ô<br />
½<br />
´ ½ · <br />
µ ·¾ for large <br />
¾<br />
Òµ ´ · ¾µ ÐÓ´ ½ · Ô<br />
µ ÐÓ Ò Ô<br />
· ÐÓ <br />
ÐÓ ½½<br />
½ ÐÓ Ò<br />
As an example, we insert 10, 6, 2, 3, 11, 7, 9, 8, 1, 5, 4 into an empty BST and keep it HB. The<br />
final tree is an AVL tree:<br />
¾<br />
<br />
¾<br />
µ<br />
¾<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-3<br />
10:<br />
10<br />
6:<br />
10<br />
2:<br />
2<br />
10<br />
6<br />
HB<br />
6<br />
HB<br />
0<br />
1<br />
6<br />
2<br />
not HB<br />
rotate right<br />
µ<br />
2<br />
HB<br />
10<br />
3:<br />
6<br />
11:<br />
6<br />
7:<br />
6<br />
9:<br />
6<br />
2<br />
10<br />
2<br />
10<br />
2<br />
10<br />
2<br />
10<br />
8:<br />
2<br />
3<br />
HB<br />
6<br />
10<br />
3 7 11<br />
µ<br />
3<br />
HB<br />
2<br />
11<br />
3<br />
6<br />
10<br />
8 11<br />
3 7 11<br />
HB<br />
1, 5, 4<br />
...<br />
1<br />
2<br />
4<br />
6<br />
3 7 11<br />
9<br />
HB<br />
10<br />
8 11<br />
9<br />
7<br />
9<br />
3<br />
5<br />
7<br />
9<br />
not HB 8<br />
HB<br />
Definition 5<br />
An AVL tree Ì is balanced if ´Ì Ð µ ´Ì Ö µ. Ì is right heavy if ´Ì Ö µ ´Ì Ð µ·½. Ì is left heavy<br />
if ´Ì Ð µ ´Ì Ö µ · ½.<br />
=<br />
/<br />
<br />
· ½<br />
balanced<br />
right heavy<br />
left heavy<br />
typedef struct node{<br />
char CC; /* condition code */<br />
int key;<br />
struct node *left, *right;<br />
} NODETYPE, *NODEPTR;<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-4<br />
On the insertion path of an element from root to leaf:<br />
Ü<br />
Ü<br />
Ü<br />
Ü<br />
Ü<br />
<br />
=<br />
<br />
<br />
<br />
<br />
Ü Ü Ü <br />
No change in AVL property. No change.<br />
(But CC will change.)<br />
Ü <br />
No change.<br />
Ü <br />
Needs balancing.<br />
Ü <br />
Needs balancing.<br />
Exercise 1<br />
Give the CC of each of the nodes in the following AVL tree.<br />
8<br />
1<br />
3<br />
6<br />
11<br />
21<br />
23<br />
2<br />
5<br />
7<br />
10<br />
16<br />
22<br />
25<br />
4<br />
9<br />
13<br />
18<br />
24<br />
12<br />
14<br />
17<br />
19<br />
¾<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-5<br />
Balancing AVL Tree<br />
Assume before insertion of Ü, tree Ì was HB(1).<br />
(1) Single left rotation (SLR): to change old CCs on the path, we go from leaf up. We find CC =<br />
right-heavy, then rotate at that point.<br />
T1<br />
A<br />
T2<br />
B<br />
T3<br />
x<br />
SLR<br />
µ<br />
T1<br />
A<br />
T2<br />
B<br />
T3<br />
x<br />
(2) Single right rotation (SRR): to change old CCs on the path, we go from leaf up. We find CC =<br />
left-heavy, then rotate at that point.<br />
T1<br />
x<br />
A<br />
T2<br />
B<br />
T3<br />
SRR<br />
µ<br />
T1<br />
x<br />
A<br />
T2<br />
B<br />
T3<br />
Rules: (a) Go up from left till left-heavy found. (b) Go up from right till right-heavy found.<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-6<br />
(3) Double right rotation (DRR)<br />
A<br />
B<br />
C<br />
µ<br />
DRR<br />
A<br />
B<br />
C<br />
T1<br />
T2<br />
x<br />
or<br />
T3<br />
x<br />
T4<br />
T1<br />
T2 T3<br />
x x<br />
Height remains the same.<br />
T4<br />
(4) Double left rotation (DLR)<br />
T1<br />
T2<br />
x<br />
A<br />
B<br />
or<br />
T3<br />
x<br />
C<br />
T4<br />
µ<br />
DLR<br />
T1<br />
B<br />
A<br />
C<br />
T2 T3<br />
x x<br />
Height remains the same.<br />
T4<br />
¯ Must check that BST properties are maintained.<br />
Stack of CCs: We maintain a stack of CCs for each insert operation.<br />
Exercise 2<br />
Insert 20 into the AVL tree of the previous example. Set up the stack when the inserted item is<br />
being pushed along the inserting path.<br />
¾<br />
Going from the insertion point (leaf) up to the root, let current be the node currently being<br />
scanned, with child and gchild.<br />
child = node inserted; gchild = NULL; /* initial condition */<br />
hc = CC = condition code;<br />
if(hc(current) == ‘=’)<br />
if(child == rchild(current)) hc(current) = ‘\’; /* from right */<br />
else hc(current) = ‘/’; /* coming from left */<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-7<br />
gchild = child;<br />
child = current;<br />
current = pop(stack);<br />
if((hc(current) == ‘/’ && child == rchild(current)) ||<br />
(hc(current) == ‘\’ && child == lchild(current)))<br />
{ hc(current) = ‘=’; return; }<br />
else rebalance the tree @ current;<br />
AVL Tree Deletion<br />
Assume the deleted node is a leaf.<br />
Đ Use a stack.<br />
I.C.: child is the deleted node, and current is its parent.<br />
We use a bottom-up strategy:<br />
1. if(hc(current) == ‘=’)<br />
{ if(child == lchild(current)) hc(current) = ‘\’;<br />
else hc(current) = ‘/’;<br />
return;<br />
}<br />
2. if((hc(current) == ‘/’ && child == lchild(current)) ||<br />
(hc(current) == ‘\’ && child == rchild(current)))<br />
{ hc(current) = ‘=’;<br />
child = current;<br />
current = pop(stack);<br />
}<br />
3. else<br />
{ rebalance @ current;<br />
child = current;<br />
current = pop(stack);<br />
}<br />
Exercise 3<br />
(a) We showed in <strong>Chapter</strong> 4 that every comparison based algorithm to sort Ò elements must take<br />
Ç´Ò ÐÓ Òµ time in the worst case. What implication does this result have on the complexity of<br />
initializing an AVL tree of Ò nodes?<br />
(b) Write an algorithm to list the nodes of an AVL tree Ì in ascending order of Ý. Can this be<br />
done in Ç´Òµ time if Ì has Ò nodes?<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-8<br />
(c) How do you combine two AVL trees into a bigger new AVL tree? Can this be done in linear<br />
time?<br />
¾<br />
2-3 Tree<br />
Definition 6<br />
A 2-3 tree is a search tree that is either empty or satisfies the following properties:<br />
1. Each internal node is either a 2-node (with 2 children) or a 3-node (with 3 children). A<br />
2-node has one element; a 3-node has two elements.<br />
2. All elements in the 2-3 subtree with root lchild have key less than lkey.<br />
3. All elements in the 2-3 subtree with root mchild have key greater than lkey (and less than<br />
rkey if it is a 3-node).<br />
4. If it is a 3-node, then all elements in the 2-3 subtree with root rchild have key greater than<br />
rkey.<br />
5. All external nodes are at the same level, i.e. Ì is HB.<br />
<strong>6.</strong> All the elements appear at the leaves (external nodes).<br />
typedef struct NODE *NODEPTR;<br />
struct NODE {<br />
int lkey, rkey;<br />
NODEPTR lchild, mchild, rchild;<br />
};<br />
search(NODEPTR T, int x)<br />
{<br />
while(T)<br />
switch(compare(x,T))<br />
{<br />
case 1: T = T->lchild; break;<br />
case 2: T = T->mchild; break;<br />
case 3: T = T->rchild; break;<br />
case 4: return T;<br />
}<br />
return NULL;<br />
}<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-9<br />
40<br />
+<br />
11<br />
+<br />
10 20<br />
80<br />
+<br />
7<br />
9<br />
15<br />
+<br />
5 10 20 40 80 2 7 9 11 15<br />
Exercise 4<br />
Write the compare function used in the above procedure.<br />
¾<br />
Exercise 5<br />
(a) Show that a 2-3 tree has height which satisfies<br />
ÐÓ ¿<br />
Ò ÐÓ ¾<br />
Ò<br />
(b) Show that searching can be done in Ç´ÐÓ Òµ time.<br />
¾<br />
Insertion:<br />
insert(T, y) NODEPTR T; int y; [cf. HSA, Sec. 10.3.3]<br />
{ NODETYPE p, q;<br />
if(!(*T)) new_root(T, y, NULL); /* T was empty */<br />
else<br />
{ p = find_node(*T, y); /* Is y in T? */<br />
if(!p) /* y is in T */<br />
{ fprintf(stderr, "The key is currently in T.\n");<br />
exit(1);<br />
} /* else get to the node where y is to be inserted */<br />
q = NULL; /* q will be the newly created node after split */<br />
for(;;)<br />
if(p->rkey == INFINITY) /* p is a 2-node */<br />
{ put_in(&p, y, q); /* insert y into p and place q */<br />
break; /* immediately to the right of y */<br />
}<br />
else /* p is a 3-node */<br />
{ split(p, &y, &q);<br />
if(p == *T) /* split the root; h = h+1 */<br />
{ new_root(T, y, q); /* create new node q */<br />
break;<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-10<br />
}<br />
}<br />
}<br />
}<br />
else p = pop(stack); /* follow the path back up */<br />
split(p, yp, qp) NODETYPE p; NODEPTR yp, qp;<br />
{ take node p with 2 keys in it;<br />
create a new node q with q.lkey = max(p.rkey, y);<br />
and q.rkey = INFINITY;<br />
temp = median(p.lkey, p.rkey, y);<br />
p.lkey = min(p.lkey, y);<br />
y = temp; /* the median key sent upward */<br />
update the pointers;<br />
}<br />
40<br />
+<br />
insert(T, 70):<br />
40<br />
+<br />
insert(T, 30):<br />
20 40<br />
10 20<br />
80<br />
+<br />
10 20<br />
70 80<br />
10<br />
+<br />
30<br />
+<br />
70 80<br />
5<br />
10<br />
20<br />
40<br />
80<br />
5<br />
10<br />
20<br />
40<br />
70 80<br />
5<br />
10<br />
20<br />
30<br />
40<br />
70 80<br />
Exercise 6<br />
What is the result of a subsequent insert(T, 60)?<br />
¾<br />
Deletion:<br />
modify p as necessary to reflect status after element has been deleted;<br />
while(p has 0 element && p is not root)<br />
{ r = parent of p;<br />
q = left or right sibling of p (as appropriate);<br />
if(q is a 3-node) rotate;<br />
else combine;<br />
p = r;<br />
}<br />
if(p has 0 element) /* then p must be the root */<br />
{ left child of p becomes the new root;<br />
delete p;<br />
}<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-11<br />
Exercise 7<br />
(a) We showed in <strong>Chapter</strong> 4 that every comparison based algorithm to sort Ò elements must take<br />
Ç´Ò ÐÓ Òµ time in the worst case. What implication does this result have on the complexity of<br />
initializing a 2-3 tree of Ò nodes?<br />
(b) Write an algorithm to list the nodes of an 2-3 tree Ì in ascending order of Ý. Can this be<br />
done in Ç´Òµ time if Ì has Ò nodes?<br />
(c) How do you combine two 2-3 trees into a bigger new 2-3 tree? Can this be done in linear time?<br />
¾<br />
2-3-4 Tree<br />
A 2-3-4 tree extends a 2-3 tree so that 4-nodes are also permitted (4-nodes may have up to 4<br />
children).<br />
typedef struct NODE *NODEPTR;<br />
struct NODE {<br />
int lkey, mkey, rkey;<br />
NODEPTR lchild, lmchild, rmchild, rchild;<br />
};<br />
50<br />
10<br />
70 80<br />
5<br />
7<br />
9<br />
30 40<br />
60<br />
75<br />
85 90 92<br />
Đ The height of a 2-3-4 tree with Ò elements is between ÐÓ ´Ò · ½µ and ÐÓ ¾´Ò · ½µ.<br />
An advantage 2-3-4 trees have over 2-3 trees is that insertion and deletion can be done by a<br />
single root to leaf pass.<br />
We can efficiently represent a 2-3-4 tree as a BST called red-black tree (see next section), which<br />
utilizes space more efficiently than a 2-3 or 2-3-4 tree.<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-12<br />
Insertion: To avoid the backward leaf to root pass, we split 4-nodes on the way down the tree to<br />
the leaf node into which the element is to be inserted (see Figs. 10.21-24, pp. 512-514). The leaf<br />
node which the insertion is to be made is therefore guaranteed to be a 2- or 3-node. No further<br />
node splitting is required.<br />
Deletion: To avoid the backward leaf to root restructuring path, it is necessary to ensure that at<br />
the time of deletion, the element to be deleted is in a 3- or 4-node. This is accomplished by<br />
restructuring the 2-3-4 tree during the downward root to leaf pass (see Fig. 10.25, p. 517).<br />
Exercise 8<br />
Show that insertion and deletion can be done in Ç´ÐÓ Òµ time.<br />
¾<br />
Red-Black Tree [S, Ü11.3]<br />
A red-black tree is a BST representation of a 2-3-4 tree, in which every node (pointer) is colored<br />
either red or black.<br />
typedef enum {red,black} color;<br />
typedef struct NODE *NODEPTR;<br />
typedef struct NODE {<br />
int key;<br />
NODEPTR lchild, rchild;<br />
color lcolor, rcolor;<br />
};<br />
☞ If the child pointer was present in the original 2-3-4 tree, it is a black pointer. Otherwise, it<br />
is a red pointer.<br />
☞ An alternative node structure in which each node has a single color field may also be used,<br />
whose value is the color of the pointer from the node’s parent.<br />
☞ The root node is a black node by definition.<br />
☞ All external nodes are black nodes, too.<br />
Transformation from a 2-3-4 tree to a red-black tree:<br />
1. A 2-node Ô is represented by a node Õ with both its color fields black, key = lkey, q-<br />
>lchild = p->lchild,andq->rchild = p->lmchild.<br />
2. A 3-node is represented by two nodes connected by a red pointer.<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-13<br />
3. A 4-node is represented by three nodes one of which is connected to the remaining two by<br />
red pointers.<br />
The above 2-3-4 tree example is transformed into the following red-black tree.<br />
50<br />
10<br />
70<br />
7<br />
40<br />
60<br />
80<br />
5<br />
9<br />
30<br />
75<br />
90<br />
85<br />
92<br />
From the transformation rules we can show that a BT is a red-black tree iff it satisfies the<br />
following properties:<br />
¯ It is a BST.<br />
¯ Every root to external node path has the same number of black pointers (since all external<br />
nodes of the original 2-3-4 tree are on the same level).<br />
¯ No root to external node path has 2 or more consecutive red pointers.<br />
☞ Every red-black tree with Ò nodes has a height ¾ÐÓ ¾´Ò · ½µ.<br />
☞ Read Ü11.3 [S] for details of insertion & deletion.<br />
Exercise 9<br />
Compare the worst-case height of a red-black tree with Ò nodes and that of an AVL tree with the<br />
same number of nodes.<br />
¾<br />
Multiway <strong>Search</strong> Tree [S, Ü11.4]<br />
Definition 7<br />
A multiway search tree of order Ò (or Ò-way search tree) is a generalization of 2-3 tree, in which<br />
each node has Ò or fewer subtrees and contains one fewer keys than it has subtrees; each of the<br />
subtrees can be empty; and empty subtrees are not necessarily on the right.<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-14<br />
1. Ì is an AVL tree µ Ì is a 2-way search tree.<br />
2. Ì is a 2-3 tree µ Ì is a 3-way search tree.<br />
3. Ì is a 2-3-4 tree µ Ì is a 4-way search tree.<br />
Nodes as full as possible µ as less storage wasted as possible (keep as many keys as possible in<br />
each node).<br />
Example 1<br />
Let there be 4000 elements (keys). If Ò then we have 1000 nodes of 4 keys each, so . If<br />
Ò ½½ then we have 400 nodes of 10 keys each, so ¿.<br />
For full nodes, they use about the same amount of storage.<br />
Accessing a node is the most expensive operation in searching external storage, where multiway<br />
trees are used most often.<br />
¾<br />
B-Tree [S, Ü11.4]<br />
Definition 8<br />
A B-tree of order m is a balanced order-Ñ multiway search tree in which 1) each non-root internal<br />
node contains Ò¾ keys; 2) the root node has at least 2 children; and 3) all external nodes are<br />
at the same level.<br />
☞ A.k.a. ´Ñ ½µ-Ñ tree.<br />
☞ Each node has a maximum of Ñ<br />
½ keys and Ñ children.<br />
☞ They are good at minimizing disk I/O operations.<br />
☞ The height of Ì is Ç´ÐÓ Ñ¾´Ò¾µµ.<br />
☞ Read Ü11.4 [S] for details of insertion & deletion.<br />
Trie<br />
A trie is a tree of degree ¾ in which the branching at any level is determined not by the entire<br />
key value but by only a portion of it. Its internal nodes are branch nodes (which contain pointers<br />
only) and external nodes are element nodes.<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5
<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-15<br />
#define MAXLETTER 27 /* blank + 26 letters */<br />
#define MAXCHAR 30 /* max length of key */<br />
typedef enum {key,pointer} nodetype;<br />
typedef struct NODE *NODEPTR;<br />
struct NODE {<br />
nodetype tag;<br />
union {<br />
int *key;<br />
NODEPTR letter[MAXLETTER];<br />
} u;<br />
};<br />
NODEPTR root;<br />
search(NODEPTR T, int *key, int i)<br />
{<br />
if(!T) return NULL; /* not found */<br />
if(T->tag == key)<br />
return ((strcmp(T->u.key, key))? NULL : T);<br />
return search(T->u.letter[get_index(key,i)], key, i+1);<br />
}<br />
c Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5