18.02.2014 Views

Chapter 6. Advanced Data Structures (Search Trees)

Chapter 6. Advanced Data Structures (Search Trees)

Chapter 6. Advanced Data Structures (Search Trees)

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-1<br />

<strong>Chapter</strong> <strong>6.</strong> <strong>Advanced</strong> <strong>Data</strong> <strong>Structures</strong> (<strong>Search</strong> <strong>Trees</strong>)<br />

Balanced Binary <strong>Search</strong> <strong>Trees</strong> [S, Ü11.2]<br />

Worst-case running time for BST is ¢´µ ¢´Òµ.<br />

Đ Can be improved to ¢´ÐÓ Òµ if we can guarantee that ¢´ÐÓ Òµ—balanced BST.<br />

Idea: restructure tree if necessary (when insert or delete) to balance its height.<br />

Ð Methods: AVL tree, 2-3 tree, red-black tree, B tree, etc.<br />

Recall that a set Ë has many different BST representations.<br />

Let ´Ì µ denote the height of a BST Ì , with subtrees Ì Ð and Ì Ö , and assume that if Ì is empty<br />

then ´Ì µ ½.<br />

Definition 1<br />

Ì is height balanced of degree k, denoted HB(), if<br />

(1) ´Ì Ð µ ´Ì Ö µ , and<br />

(2) Ì Ð and Ì Ö are HB().<br />

Definition 2<br />

Ì is height balanced, denoted HB, if it is HB(1).<br />

Definition 3<br />

The balance factor of Ì is defined as<br />

´Ì µ ´Ì Ð µ ´Ì Ö µ<br />

Đ Ì is HB µ ´Ì µ = –1, 0, or +1.<br />

Ì is HB(0) µ Ì is a CBT.<br />

-2<br />

0<br />

1<br />

0<br />

0<br />

0<br />

HB<br />

HB(2)<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-2<br />

AVL Tree [S, Ü11.2]<br />

Definition 4 (Adelson/Velskii/Landis)<br />

An empty BST is an AVL tree. If Ì is a nonempty BST with Ì Ð and Ì Ö as its left and right subtrees,<br />

then Ì is an AVL tree iff<br />

1) Ì Ð and Ì Ö are AVL trees, and<br />

2) ´Ì Ð µ ´Ì Ö µ ½.<br />

HB BST AVL tree<br />

Theorem 1<br />

The height of an AVL tree Ì with Ò nodes satisfies<br />

That is, ´Ì µ Ç´ÐÓ Òµ.<br />

´Ì µ ½ ÐÓ Ò<br />

Proof :LetÆ be the minimum number of nodes in an AVL tree Ì with height . Then<br />

Æ ¼ ½<br />

Æ ½ ¾ and<br />

Æ Æ ½ · Æ ¾ · ½ ¾<br />

It can be verified easily (by induction) that<br />

Therefore,<br />

Ô<br />

ÐÓ´<br />

Ò ·¾ ½<br />

<br />

<br />

Æ ·¾ ½<br />

Ô<br />

Ô<br />

Ô<br />

½<br />

´ ½ · <br />

µ ·¾ ´ ½ <br />

µ ·¾ ℄ ½<br />

¾<br />

¾<br />

Ô<br />

Ô<br />

½<br />

´ ½ · <br />

µ ·¾ for large <br />

¾<br />

Òµ ´ · ¾µ ÐÓ´ ½ · Ô<br />

µ ÐÓ Ò Ô<br />

· ÐÓ <br />

ÐÓ ½½<br />

½ ÐÓ Ò<br />

As an example, we insert 10, 6, 2, 3, 11, 7, 9, 8, 1, 5, 4 into an empty BST and keep it HB. The<br />

final tree is an AVL tree:<br />

¾<br />

<br />

¾<br />

µ<br />

¾<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-3<br />

10:<br />

10<br />

6:<br />

10<br />

2:<br />

2<br />

10<br />

6<br />

HB<br />

6<br />

HB<br />

0<br />

1<br />

6<br />

2<br />

not HB<br />

rotate right<br />

µ<br />

2<br />

HB<br />

10<br />

3:<br />

6<br />

11:<br />

6<br />

7:<br />

6<br />

9:<br />

6<br />

2<br />

10<br />

2<br />

10<br />

2<br />

10<br />

2<br />

10<br />

8:<br />

2<br />

3<br />

HB<br />

6<br />

10<br />

3 7 11<br />

µ<br />

3<br />

HB<br />

2<br />

11<br />

3<br />

6<br />

10<br />

8 11<br />

3 7 11<br />

HB<br />

1, 5, 4<br />

...<br />

1<br />

2<br />

4<br />

6<br />

3 7 11<br />

9<br />

HB<br />

10<br />

8 11<br />

9<br />

7<br />

9<br />

3<br />

5<br />

7<br />

9<br />

not HB 8<br />

HB<br />

Definition 5<br />

An AVL tree Ì is balanced if ´Ì Ð µ ´Ì Ö µ. Ì is right heavy if ´Ì Ö µ ´Ì Ð µ·½. Ì is left heavy<br />

if ´Ì Ð µ ´Ì Ö µ · ½.<br />

=<br />

/<br />

<br />

· ½<br />

balanced<br />

right heavy<br />

left heavy<br />

typedef struct node{<br />

char CC; /* condition code */<br />

int key;<br />

struct node *left, *right;<br />

} NODETYPE, *NODEPTR;<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-4<br />

On the insertion path of an element from root to leaf:<br />

Ü<br />

Ü<br />

Ü<br />

Ü<br />

Ü<br />

<br />

=<br />

<br />

<br />

<br />

<br />

Ü Ü Ü <br />

No change in AVL property. No change.<br />

(But CC will change.)<br />

Ü <br />

No change.<br />

Ü <br />

Needs balancing.<br />

Ü <br />

Needs balancing.<br />

Exercise 1<br />

Give the CC of each of the nodes in the following AVL tree.<br />

8<br />

1<br />

3<br />

6<br />

11<br />

21<br />

23<br />

2<br />

5<br />

7<br />

10<br />

16<br />

22<br />

25<br />

4<br />

9<br />

13<br />

18<br />

24<br />

12<br />

14<br />

17<br />

19<br />

¾<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-5<br />

Balancing AVL Tree<br />

Assume before insertion of Ü, tree Ì was HB(1).<br />

(1) Single left rotation (SLR): to change old CCs on the path, we go from leaf up. We find CC =<br />

right-heavy, then rotate at that point.<br />

T1<br />

A<br />

T2<br />

B<br />

T3<br />

x<br />

SLR<br />

µ<br />

T1<br />

A<br />

T2<br />

B<br />

T3<br />

x<br />

(2) Single right rotation (SRR): to change old CCs on the path, we go from leaf up. We find CC =<br />

left-heavy, then rotate at that point.<br />

T1<br />

x<br />

A<br />

T2<br />

B<br />

T3<br />

SRR<br />

µ<br />

T1<br />

x<br />

A<br />

T2<br />

B<br />

T3<br />

Rules: (a) Go up from left till left-heavy found. (b) Go up from right till right-heavy found.<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-6<br />

(3) Double right rotation (DRR)<br />

A<br />

B<br />

C<br />

µ<br />

DRR<br />

A<br />

B<br />

C<br />

T1<br />

T2<br />

x<br />

or<br />

T3<br />

x<br />

T4<br />

T1<br />

T2 T3<br />

x x<br />

Height remains the same.<br />

T4<br />

(4) Double left rotation (DLR)<br />

T1<br />

T2<br />

x<br />

A<br />

B<br />

or<br />

T3<br />

x<br />

C<br />

T4<br />

µ<br />

DLR<br />

T1<br />

B<br />

A<br />

C<br />

T2 T3<br />

x x<br />

Height remains the same.<br />

T4<br />

¯ Must check that BST properties are maintained.<br />

Stack of CCs: We maintain a stack of CCs for each insert operation.<br />

Exercise 2<br />

Insert 20 into the AVL tree of the previous example. Set up the stack when the inserted item is<br />

being pushed along the inserting path.<br />

¾<br />

Going from the insertion point (leaf) up to the root, let current be the node currently being<br />

scanned, with child and gchild.<br />

child = node inserted; gchild = NULL; /* initial condition */<br />

hc = CC = condition code;<br />

if(hc(current) == ‘=’)<br />

if(child == rchild(current)) hc(current) = ‘\’; /* from right */<br />

else hc(current) = ‘/’; /* coming from left */<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-7<br />

gchild = child;<br />

child = current;<br />

current = pop(stack);<br />

if((hc(current) == ‘/’ && child == rchild(current)) ||<br />

(hc(current) == ‘\’ && child == lchild(current)))<br />

{ hc(current) = ‘=’; return; }<br />

else rebalance the tree @ current;<br />

AVL Tree Deletion<br />

Assume the deleted node is a leaf.<br />

Đ Use a stack.<br />

I.C.: child is the deleted node, and current is its parent.<br />

We use a bottom-up strategy:<br />

1. if(hc(current) == ‘=’)<br />

{ if(child == lchild(current)) hc(current) = ‘\’;<br />

else hc(current) = ‘/’;<br />

return;<br />

}<br />

2. if((hc(current) == ‘/’ && child == lchild(current)) ||<br />

(hc(current) == ‘\’ && child == rchild(current)))<br />

{ hc(current) = ‘=’;<br />

child = current;<br />

current = pop(stack);<br />

}<br />

3. else<br />

{ rebalance @ current;<br />

child = current;<br />

current = pop(stack);<br />

}<br />

Exercise 3<br />

(a) We showed in <strong>Chapter</strong> 4 that every comparison based algorithm to sort Ò elements must take<br />

Ç´Ò ÐÓ Òµ time in the worst case. What implication does this result have on the complexity of<br />

initializing an AVL tree of Ò nodes?<br />

(b) Write an algorithm to list the nodes of an AVL tree Ì in ascending order of Ý. Can this be<br />

done in Ç´Òµ time if Ì has Ò nodes?<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-8<br />

(c) How do you combine two AVL trees into a bigger new AVL tree? Can this be done in linear<br />

time?<br />

¾<br />

2-3 Tree<br />

Definition 6<br />

A 2-3 tree is a search tree that is either empty or satisfies the following properties:<br />

1. Each internal node is either a 2-node (with 2 children) or a 3-node (with 3 children). A<br />

2-node has one element; a 3-node has two elements.<br />

2. All elements in the 2-3 subtree with root lchild have key less than lkey.<br />

3. All elements in the 2-3 subtree with root mchild have key greater than lkey (and less than<br />

rkey if it is a 3-node).<br />

4. If it is a 3-node, then all elements in the 2-3 subtree with root rchild have key greater than<br />

rkey.<br />

5. All external nodes are at the same level, i.e. Ì is HB.<br />

<strong>6.</strong> All the elements appear at the leaves (external nodes).<br />

typedef struct NODE *NODEPTR;<br />

struct NODE {<br />

int lkey, rkey;<br />

NODEPTR lchild, mchild, rchild;<br />

};<br />

search(NODEPTR T, int x)<br />

{<br />

while(T)<br />

switch(compare(x,T))<br />

{<br />

case 1: T = T->lchild; break;<br />

case 2: T = T->mchild; break;<br />

case 3: T = T->rchild; break;<br />

case 4: return T;<br />

}<br />

return NULL;<br />

}<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-9<br />

40<br />

+<br />

11<br />

+<br />

10 20<br />

80<br />

+<br />

7<br />

9<br />

15<br />

+<br />

5 10 20 40 80 2 7 9 11 15<br />

Exercise 4<br />

Write the compare function used in the above procedure.<br />

¾<br />

Exercise 5<br />

(a) Show that a 2-3 tree has height which satisfies<br />

ÐÓ ¿<br />

Ò ÐÓ ¾<br />

Ò<br />

(b) Show that searching can be done in Ç´ÐÓ Òµ time.<br />

¾<br />

Insertion:<br />

insert(T, y) NODEPTR T; int y; [cf. HSA, Sec. 10.3.3]<br />

{ NODETYPE p, q;<br />

if(!(*T)) new_root(T, y, NULL); /* T was empty */<br />

else<br />

{ p = find_node(*T, y); /* Is y in T? */<br />

if(!p) /* y is in T */<br />

{ fprintf(stderr, "The key is currently in T.\n");<br />

exit(1);<br />

} /* else get to the node where y is to be inserted */<br />

q = NULL; /* q will be the newly created node after split */<br />

for(;;)<br />

if(p->rkey == INFINITY) /* p is a 2-node */<br />

{ put_in(&p, y, q); /* insert y into p and place q */<br />

break; /* immediately to the right of y */<br />

}<br />

else /* p is a 3-node */<br />

{ split(p, &y, &q);<br />

if(p == *T) /* split the root; h = h+1 */<br />

{ new_root(T, y, q); /* create new node q */<br />

break;<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-10<br />

}<br />

}<br />

}<br />

}<br />

else p = pop(stack); /* follow the path back up */<br />

split(p, yp, qp) NODETYPE p; NODEPTR yp, qp;<br />

{ take node p with 2 keys in it;<br />

create a new node q with q.lkey = max(p.rkey, y);<br />

and q.rkey = INFINITY;<br />

temp = median(p.lkey, p.rkey, y);<br />

p.lkey = min(p.lkey, y);<br />

y = temp; /* the median key sent upward */<br />

update the pointers;<br />

}<br />

40<br />

+<br />

insert(T, 70):<br />

40<br />

+<br />

insert(T, 30):<br />

20 40<br />

10 20<br />

80<br />

+<br />

10 20<br />

70 80<br />

10<br />

+<br />

30<br />

+<br />

70 80<br />

5<br />

10<br />

20<br />

40<br />

80<br />

5<br />

10<br />

20<br />

40<br />

70 80<br />

5<br />

10<br />

20<br />

30<br />

40<br />

70 80<br />

Exercise 6<br />

What is the result of a subsequent insert(T, 60)?<br />

¾<br />

Deletion:<br />

modify p as necessary to reflect status after element has been deleted;<br />

while(p has 0 element && p is not root)<br />

{ r = parent of p;<br />

q = left or right sibling of p (as appropriate);<br />

if(q is a 3-node) rotate;<br />

else combine;<br />

p = r;<br />

}<br />

if(p has 0 element) /* then p must be the root */<br />

{ left child of p becomes the new root;<br />

delete p;<br />

}<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-11<br />

Exercise 7<br />

(a) We showed in <strong>Chapter</strong> 4 that every comparison based algorithm to sort Ò elements must take<br />

Ç´Ò ÐÓ Òµ time in the worst case. What implication does this result have on the complexity of<br />

initializing a 2-3 tree of Ò nodes?<br />

(b) Write an algorithm to list the nodes of an 2-3 tree Ì in ascending order of Ý. Can this be<br />

done in Ç´Òµ time if Ì has Ò nodes?<br />

(c) How do you combine two 2-3 trees into a bigger new 2-3 tree? Can this be done in linear time?<br />

¾<br />

2-3-4 Tree<br />

A 2-3-4 tree extends a 2-3 tree so that 4-nodes are also permitted (4-nodes may have up to 4<br />

children).<br />

typedef struct NODE *NODEPTR;<br />

struct NODE {<br />

int lkey, mkey, rkey;<br />

NODEPTR lchild, lmchild, rmchild, rchild;<br />

};<br />

50<br />

10<br />

70 80<br />

5<br />

7<br />

9<br />

30 40<br />

60<br />

75<br />

85 90 92<br />

Đ The height of a 2-3-4 tree with Ò elements is between ÐÓ ´Ò · ½µ and ÐÓ ¾´Ò · ½µ.<br />

An advantage 2-3-4 trees have over 2-3 trees is that insertion and deletion can be done by a<br />

single root to leaf pass.<br />

We can efficiently represent a 2-3-4 tree as a BST called red-black tree (see next section), which<br />

utilizes space more efficiently than a 2-3 or 2-3-4 tree.<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-12<br />

Insertion: To avoid the backward leaf to root pass, we split 4-nodes on the way down the tree to<br />

the leaf node into which the element is to be inserted (see Figs. 10.21-24, pp. 512-514). The leaf<br />

node which the insertion is to be made is therefore guaranteed to be a 2- or 3-node. No further<br />

node splitting is required.<br />

Deletion: To avoid the backward leaf to root restructuring path, it is necessary to ensure that at<br />

the time of deletion, the element to be deleted is in a 3- or 4-node. This is accomplished by<br />

restructuring the 2-3-4 tree during the downward root to leaf pass (see Fig. 10.25, p. 517).<br />

Exercise 8<br />

Show that insertion and deletion can be done in Ç´ÐÓ Òµ time.<br />

¾<br />

Red-Black Tree [S, Ü11.3]<br />

A red-black tree is a BST representation of a 2-3-4 tree, in which every node (pointer) is colored<br />

either red or black.<br />

typedef enum {red,black} color;<br />

typedef struct NODE *NODEPTR;<br />

typedef struct NODE {<br />

int key;<br />

NODEPTR lchild, rchild;<br />

color lcolor, rcolor;<br />

};<br />

☞ If the child pointer was present in the original 2-3-4 tree, it is a black pointer. Otherwise, it<br />

is a red pointer.<br />

☞ An alternative node structure in which each node has a single color field may also be used,<br />

whose value is the color of the pointer from the node’s parent.<br />

☞ The root node is a black node by definition.<br />

☞ All external nodes are black nodes, too.<br />

Transformation from a 2-3-4 tree to a red-black tree:<br />

1. A 2-node Ô is represented by a node Õ with both its color fields black, key = lkey, q-<br />

>lchild = p->lchild,andq->rchild = p->lmchild.<br />

2. A 3-node is represented by two nodes connected by a red pointer.<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-13<br />

3. A 4-node is represented by three nodes one of which is connected to the remaining two by<br />

red pointers.<br />

The above 2-3-4 tree example is transformed into the following red-black tree.<br />

50<br />

10<br />

70<br />

7<br />

40<br />

60<br />

80<br />

5<br />

9<br />

30<br />

75<br />

90<br />

85<br />

92<br />

From the transformation rules we can show that a BT is a red-black tree iff it satisfies the<br />

following properties:<br />

¯ It is a BST.<br />

¯ Every root to external node path has the same number of black pointers (since all external<br />

nodes of the original 2-3-4 tree are on the same level).<br />

¯ No root to external node path has 2 or more consecutive red pointers.<br />

☞ Every red-black tree with Ò nodes has a height ¾ÐÓ ¾´Ò · ½µ.<br />

☞ Read Ü11.3 [S] for details of insertion & deletion.<br />

Exercise 9<br />

Compare the worst-case height of a red-black tree with Ò nodes and that of an AVL tree with the<br />

same number of nodes.<br />

¾<br />

Multiway <strong>Search</strong> Tree [S, Ü11.4]<br />

Definition 7<br />

A multiway search tree of order Ò (or Ò-way search tree) is a generalization of 2-3 tree, in which<br />

each node has Ò or fewer subtrees and contains one fewer keys than it has subtrees; each of the<br />

subtrees can be empty; and empty subtrees are not necessarily on the right.<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-14<br />

1. Ì is an AVL tree µ Ì is a 2-way search tree.<br />

2. Ì is a 2-3 tree µ Ì is a 3-way search tree.<br />

3. Ì is a 2-3-4 tree µ Ì is a 4-way search tree.<br />

Nodes as full as possible µ as less storage wasted as possible (keep as many keys as possible in<br />

each node).<br />

Example 1<br />

Let there be 4000 elements (keys). If Ò then we have 1000 nodes of 4 keys each, so . If<br />

Ò ½½ then we have 400 nodes of 10 keys each, so ¿.<br />

For full nodes, they use about the same amount of storage.<br />

Accessing a node is the most expensive operation in searching external storage, where multiway<br />

trees are used most often.<br />

¾<br />

B-Tree [S, Ü11.4]<br />

Definition 8<br />

A B-tree of order m is a balanced order-Ñ multiway search tree in which 1) each non-root internal<br />

node contains Ò¾ keys; 2) the root node has at least 2 children; and 3) all external nodes are<br />

at the same level.<br />

☞ A.k.a. ´Ñ ½µ-Ñ tree.<br />

☞ Each node has a maximum of Ñ<br />

½ keys and Ñ children.<br />

☞ They are good at minimizing disk I/O operations.<br />

☞ The height of Ì is Ç´ÐÓ Ñ¾´Ò¾µµ.<br />

☞ Read Ü11.4 [S] for details of insertion & deletion.<br />

Trie<br />

A trie is a tree of degree ¾ in which the branching at any level is determined not by the entire<br />

key value but by only a portion of it. Its internal nodes are branch nodes (which contain pointers<br />

only) and external nodes are element nodes.<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5


<strong>6.</strong> Elementary <strong>Data</strong> <strong>Structures</strong> 6-15<br />

#define MAXLETTER 27 /* blank + 26 letters */<br />

#define MAXCHAR 30 /* max length of key */<br />

typedef enum {key,pointer} nodetype;<br />

typedef struct NODE *NODEPTR;<br />

struct NODE {<br />

nodetype tag;<br />

union {<br />

int *key;<br />

NODEPTR letter[MAXLETTER];<br />

} u;<br />

};<br />

NODEPTR root;<br />

search(NODEPTR T, int *key, int i)<br />

{<br />

if(!T) return NULL; /* not found */<br />

if(T->tag == key)<br />

return ((strcmp(T->u.key, key))? NULL : T);<br />

return search(T->u.letter[get_index(key,i)], key, i+1);<br />

}<br />

c­ Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU 00.5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!