PDF of color slides - Communication Systems Group

Summary last lecture 

P2P introduction 

• Classification (client-server, hybrid P2P, pure P2P) 

• Napster 

• Gnutella 

• Analysis 

– Network topology follows power law 

– Where does power law come from 

– Small world networks

2.3 Distributed Hash Tables 

Chapter 2.3 

Distributed 

Hash Tables

2.3 Distributed Hash Tables – DHT 

• What are Distributed Hash-Tables 

– Applications -- What are DHTs used for 

– Complexity – How efficient are DHTs 

• Different approaches 

– Pastry – Rice, Microsoft 

– CAN – UC Berkeley, ICSI/ICIR 

– Chord – UC Berkeley, MIT 

– Tapestry – UC Berkeley 

– Symphony 

– Viceroy 

– Kademlia 

• Discussion and summary

Where can I 

find 

„Matrix.avi“ 

Information lookup in Peer-to-Peer-Systems 

 

I have 

„Matrix.avi“ 

hat is the main problem in peer-to-peer systems 

– Data lookup in distributed systems 

• Where to store data 

– Publish(“content”, …) 

• How does query lookup the data location 

– Lookup(“content”) 

– Minimal overhead in both: communication, storage 

– Robustness against failures and frequent network changes

Content lookup – 1 st idea: centralized server 

Simplistic strategy: Client-Server 

– Server stores information about location of content files 

X has it! 

kup(“content”) 

S 

 

Publish(“content”, 

X 

Known problems 

• scalability (O(N) complexity of storage), freshness of data, 

single-point-of failure, implementation cost, etc. 

– However: 

Get(“content”) 

 

• Very effective mean for simple applications 

Search query 

Get content

Content lookup – 2 nd idea: flooding / breadth 

search 

et(“content”) 

kup 

ontent”) 

available 

available 

Search query 

File download 

– Breadth search (flooding, as with Gnutella) 

• High number of query messages 

• Not scalable 

– High network load 

– High communication overhead

Content lookup 

• search overhead vs. storage overhead 

O(N) 

Search overhead 

O(log N) 

O(1) 

flooding 

bottleneck: 

• communication 

bottleneck: 

Centralized 

server 

• storage, CPU, 

network 

• availability 

O(1) 

O(log N) 

Storage overhead 

O(N)

Content lookup 

• search overhead vs. storage overhead 

O(N) 

Search overhead 

O(log N) 

O(1) 

flooding 

bottleneck: 

• communication 

• complexity: O(log N) 

• Resilient against failures 

– Node failures 

– attacks 

– Network dynamics / 

temporary users 

bottleneck: 

Distributed 

Hash Tables 

Centralized 

server 

• storage, CPU, 

network 

• availability 

O(1) 

O(log N) 

Storage overhead 

O(N)

Principle of Distributed Hash Tables 

Idea of distributed hash tables 

– Distribute data/content on all nodes 

• Or rather: information about location of content 

– On content request, query node that stores information about content 

Challenges: 

– Even distribution of content on all available nodes 

• Efficient lookup of content 

– Constant adaptation on node failure, arrival or leave of nodes 

• Allocation of responsibilities to new nodes 

• Transfer and redistribution of responsibilities in case of node failure or node leave

Principle of DHTs – 1 st step: mapping 

• 1 st step: Map content to linear address space 

– often: 0, …, 2 m -1 >> N max (N is the maximum number of stored 

objects) 

– Content mapping realized through hash function 

• E.g., H(string) modulo 2 m : H(„/movie/Matrix/divx/en“) 2313 

• Distribute address space over DHT nodes 

0- 

1000 

1001- 

2000 

2001- 

4000 

4001- 

7000 

7001- 

10.000 

10.001- 

21.000 

21.001- 

40.000 

40.001- 

65.535 

ften, address 

pace illustrated 

s ring 

2 m -1 0

Distribution of address space on DHT nodes 

• Each node is responsible for at least a part of the address space 

– Allows for redundancy 

– Permanent adaptation 

gical view of 

stributed hash 

ble 

apping on 

al network 

pology 

40.001- 

65.536 

4001- 

7000 

1001- 

2000 

Managin 

2001-400 

21.001- 

40.000 

7001-10.000 

0- 

1000 

10.001- 

21.000

Storage of content in DHTs – I 

How is content stored in the nodes 

– assumption: H(“content”) is mapping in address space 

Direct mapping: 

– Content is stored in node H(“content”) 

not flexible for large content 

40.001- 

65.536 

4001- 

7000 

1001- 

2000 

21.001- 

40.000 

7001-10.000 

0- 

1000 

10.001- 

21.000 

2001- 

4000 

H(“movie”)=2313

Storage of content in DHTs – II 

• Indirect mapping: 

– Node in DHT store tuple (Key, Value) 

• Key = Hash(„/movie/Matrix/divx/en“) 2313 

• Value is (very often) real address, where content is stored: 

(IP, Port) = (129.13.15.3, 4711) 

– More flexible, but one additional step to get to content 

1001- 

2000 

21.001- 

40.000 

40.001- 

65.536 

7001-10.000 

0- 

1000 

4001- 

7000 

10.001- 

21.000 

2001- 

4000 

H(“Film”)=23 

link to content: 

(2313 X)

DHTs – Content Localization 

nd 

Step: 

Location of content (Content based Routing) 

Goal: reduced complexity and scalability 

– O(1) with centralized hash table 

• but: 

Administration of centralized hash table is complex 

– Target complexity with distributed hash table 

• O(log N): Hops to find object / content 

• O(log N): number of keys and routing information per node

Content Localization – II 

• Search for keys 

– Entry at / query any node 

– Routing towards content (Key) 

Lookup(H(“Content”)) 

2313 

Node manages keys 

2001-4000, also 2313 

Key = H(“Conten 

(2313, (IP, port 

Entry node 

(arbitrary) 

Value = Reference 

location of data

DHT-update: Arrival and Departure of Nodes 

Arrival of new nodes 

– Allocation of appointed hash range 

– Transfer Key/Value tuple of hash range (mainly redundancy) 

– Integration in routing structures 

Leave of node 

– partitioning of hash range to neighbor nodes 

– Transfer K/V tuple to responsible nodes 

– Leave the routing structure 

Node failure 

– Use of redundant K/V tuple (on tuple node failure) 

– Use of redundant/alternative routes 

– Key-Value reachable, if at least one copy is available

Generic interface of DHTs 

Interface to DHT 

– Insert information / content (e.g., location of content) 

• Publish(Key,Value) 

– Information lookup (content search) 

• Lookup(Key) 

– Answer: 

• Value 

DHT approaches interchangeable (with regard to interface) 

Distributed application 

Publish(Key,Value) 

Lookup(Key) 

distributed hash table 

(CAN, Chord, Pastry, Tapestry, …) 

Value 

node 1 

node 2 . . . . 

node x

Summary: Distributed Hash Tables 

• Summary: properties DHTs 

– Distribute keys equally over all DHT nodes (using redundancy) 

• No bottlenecks 

• Allow for constant growth of stored keys 

• Tolerant with regard to node failures 

• Survive concerted attacks 

– Self-organizing systems 

– Simple and efficient realization 

– Support for a large field of applications 

• Key has not semantic meaning 

• Value depends on application

Implementations of Distributed Hash Tables 

• DHT implementations 

– Pastry 

Microsoft Research, Rice University 

[2.60] 

– Chord 

UC Berkeley, MIT 

[2.70] 

– CAN 

UC Berkeley, ICSI 

[2.80] 

– Tapestry (not presented here) 

UC Berkeley 

– And Symphony, Viceroy, Kademlia 

[2.90]

Pastry 

• Goal: efficient search of K in data structure 

– K = ID ∈ [0..2 128 ] 

– Basic principle: binary tree – O(log N) 

• Node has two sons 

• O(log 2 N), but with 128-Bit ID, 128 nodes has to be searched 

• Better: 2 b (e.g.,. b=4) sons (like B*-trees in data bases) 

• Complexity: 

(log 2 

N) O b 

[2.54] 

… 

… 

… … … 

2 b sons 

• Pastry-IDs views identifiers as strings of digits to the base 2 b 

– b=4: Hex system, b=2: “Vierer”-System 

– Example: 12345678 10 

= BC614E 16 

= 233012011032 4

Routing in Pastry 

• Insert content: 

– Determine FileID 

– Insert data in k nodes closest to FileID 

• Routing in Pastry: 

– In each routing step, query is routed towards 

“numerically “ closest node 

• That is, query is routed to a node with a 

one (= b Bits) bit longer prefix 

O(log 2 

b N) routing steps 

• If that is not possible: 

route towards node that is numerically closer to ID 

• Required data structure in Pastry node 

– Pastry routing table 

– Leaf-Set 

– Neighborhood-Set 

Destination: 

(b = 2) 

Start 

1. Hop 

2. Hop 

3. Hop 

4. Hop 

5. Hop 

Destination: 

0123 

3213 

0222 

0133 

0121 

0123 

0123 

0123

Pastry Routing Table - I 

• Data structures of a Pastry node: 

– Leaf Set L 

• Set of numerically closest (known) nodes of node ID 

– “leafs of routing tree” 

– |L| most often 2 b (|L|/2 larger, respectively smaller than node ID) 

• Verification and eventually update if new node is numerically closer 

than existing nodes in leaf 

– Neighborhood Set M 

• Set of topological closest (known) nodes of local node 

• Used metric: in general, delay or number of network hops 

• Not involved in routing itself

Leaf Set 

• If requested key is in range of Leaf-Set answer 

request with closest nodeID 

Node-ID = 32101 

Smaller Node-IDs 

higher Node-IDs 

32100 32023 32110 32121 

32012 32022 32123 32120 

• Here: range= 32012 to 32123

Pastry Routing Table - II 

• ⎡log ⎤ rows with 2 b 2 

b N 

-1 entries each 

– row i: hold the IDs of nodes whose ID share an i-digit prefix with 

node 

– column j: digit(i+1) = j 

– Contains topologically closest node that meets these criteria 

• Example: b=2, N = 32, Node-ID = 32101 

igit at 

osition i+1 

red prefix 

th with 

e-ID 

i j 0 1 2 3 

0 01234 14320 22222 –– 

1 30331 31230 –– 33123 

2 32012 –– 32212 323-- 

3 –– 32110 32121 3213- 

4 32100 –– 32102 32103 

Topologically closest 

node with prefix 

length i and 

digit(i+1)=j 

Possible node: 33xyz 

33123 is topologically 

closest node

Pastry Routing Table - III 

• Example for Pastry node routing information 

– Neighborhood Table M 

2 b entries 

32022 

00100 

12300 

11001 

01213 

21021 

32123 

11213 

– Routing table R 

O(log N) entries 

i \ j 

0 

1 

0 

01234 

30331 

1 

14320 

31230 

2 

22222 

–– 

3 

–– 

33123 

2 

32012 

–– 

32212 

323-- 

3 

–– 

32110 

32121 

3213- 

4 

32100 

–– 

32102 

32103 

– Leaf Set L 

2 b entries 

< Node-ID > Node-ID 

32100 32023 32110 32121 

32012 32022 32123 32120

Pastry Routing Algorithm 

Routing of packet with destination K at node N: 

1. Is K in Leaf Set, route packet directly to that node 

2. If not, determine prefix (N, K) 

3. Search entry T in routing table with prefix (T, K) > prefix (N, K), 

and route packet to T 

4. If not possible, search node T with longest prefix (T, K) out of merge 

set of routing table, leaf set, and neighborhood set and route to T 

5. As shown in [2.60], that happens not often 

– Access to routing table O(1), since row and column are known 

– Entry might be empty if corresponding node is unknown

Key = 32102 

Node is in 

range of 

Leaf-Set 

Node-ID = 32101 

Key = 01200 

Common prefix: 

32101 

01200 

-------- 

0---- 

Example 

Key = 32200 


32101 

32200 

-------- 

322-- 

Key = 33122 


32101 

33122 

-------- 

33--- 

i j 0 1 2 3 

0 01234 14320 22222 –– 

1 30331 31230 –– 33123 

2 32012 –– 32212 323-- 

3 –– 32110 32121 3213- 

4 32100 –– 32102 32103 

< Node-ID 

> Node-ID 

32100 

32023 

32110 

32121 

32012 

32022 

32123 

32120

Complexity of Pastry Routing 

• Parameters 

– Choice of b determines: 

size of routing table shortest path length 

– Example: b = 4 

• 1.000.000 nodes: 

75 entries in routing table 

∅ number of hops = 5 

• 1.000.000.000 nodes: 

105 entries in routing table 

∅ number of hops = 7

Arrival of New Node - III 

• Initialization of routing table: 

– Iteration: 

• row i prefix length = i 

• Pastry routing: improve prefix with each routing iteration 

– Idea: Use information of JOIN messages 

• row 0 of A 0 

, since A 0 

is close to X 

• row 1 of A 1 

(1 st Hop), since A 1 

closest node with prefix (A, *) = 1 

• generally: 

• then: 

– Row i of i th hop 

– Query Neighbor table of all nodes in A’s new routing table 

– Check for “closer” nodes with according prefix

Arrival of New Node - I 

• Node X wants to join Pastry 

DHT 

– Determine NodeID of X 

12333 (hash of IP address) 

– Initialize tables at node X 

– Send JOIN message to any 

Pastry node 

i \ j 0 1 2 

0 

1 

2 

3 

4 


3 

JOIN X 

X = 12333 

A 2 = 12222 

A 4 = Z = 12332 

A 0 = 23231 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - II 


DHT 

– Node X copies Neighbor-Set 

from node A0 

32022 12300 01213 3212 

00100 11001 21021 1121 

i \ j 0 1 2 3 

0 

1 

2 

3 

4 


py 

ighbor-Set 

X = 12333 

A 2 = 12222 

A 4 = Z = 12332 

A 0 = 23231 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - III 


DHT 

– Node A0 routes message to 

node Z 

– Each node sends row in 

routing table to X 

– Here A0 

i \ j 

0 

32022 12300 01213 3212 

00100 11001 21021 1121 

0 1 2 3 

02231 13231 - 3233 

1 

2 

3 

4 


JOIN X 

X = 12333 

A 2 = 12222 

A 4 = Z = 12332 

A 0 = 23231 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - IV 


DHT 


node Z 



– Here A1 

i \ j 

0 

1 

32022 12300 01213 3212 

00100 11001 21021 1121 

0 1 2 3 

02231 13231 - 3233 

10122 11312 12222 1312 

2 

3 

4 


X = 12333 

A 2 = 12222 

A 4 = Z = 12332 

A 0 = 23231 

JOIN X 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - V 


DHT 


node Z 



– Here A2 

i \ j 

0 

1 

2 

32022 12300 01213 3212 

00100 11001 21021 1121 

0 1 2 3 

02231 13231 - 3233 

10122 11312 12222 1312 

12033 12111 - 1231 

3 

4 


X = 12333 

JOIN X 

A 2 = 12222 

A 4 = Z = 12332 

A 0 = 23231 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - VI 


DHT 


node Z 



– Here A3 

i \ j 

0 

1 

2 

3 

32022 12300 01213 3212 

00100 11001 21021 1121 

0 1 2 3 

02231 13231 - 3233 

10122 11312 12222 1312 

12033 12111 - 1231 

12301 - 12320 1233 

4 


X = 12333 

A 2 = 12222 

A 4 = Z = 12332 

A 0 = 23231 

JOIN X 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - VII 


DHT 


node Z 



– Here A4 

i \ j 

0 

1 

2 

3 

4 

32022 12300 01213 3212 

00100 11001 21021 1121 

0 1 2 3 

02231 13231 - 3233 

10122 11312 12222 1312 

12033 12111 - 1231 

12301 - 12320 1233 

12330 12331 - 1233 


X = 12333 

A 2 = 12222 

A 4 = Z = 12332 

JOIN X 

A 0 = 23231 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - VIII 

32022 

00100 

12300 

11001 

01213 

21021 

3212 

1121 


DHT 

i \ j 

0 

0 

02231 

1 

13231 

2 

- 

3 

3233 

– Node Z copies its Leaf-Set to 

Node X 

1 

2 

10122 

12033 

11312 

12111 

12222 

- 

1312 

1231 

3 

12301 

- 

12320 

1233 

4 

12330 

12331 

- 

1233 

X = 12333 

A 2 = 12222 


12311 12322 12333 13000 

12331 12330 13001 13003 

A 4 = Z = 12332 

A 0 = 23231 

Copy Leaf-Set 

to X 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - IX 

32022 

00100 

12300 

11001 

01213 

21021 

3212 

1121 


DHT 

i \ j 

0 

0 

02231 

1 

13231 

2 

- 

3 

3233 

– Node x sends its routing table 

to each neighbor 

1 

2 

10122 

12033 

11312 

12111 

12222 

- 

1312 

1231 

3 

12301 

- 

12320 

1233 

4 

12330 

12331 

- 

1233 

X = 12333 

A 2 = 12222 


12311 12322 12333 13000 

12331 12330 13001 13003 

A 4 = Z = 12332 

JOIN X 

A 0 = 23231 

A 3 = 12311 

A 1 = 13231

Arrival of New Node - IV 

• Efficiency of procedure of initialization 

– Quality of routing table (b=4, |L| = 16, |M| = 32, 5k nodes) 

[2.60] 

SL: transfer only the i th routing table row of A i 

WT: transfer of i th routing table row of A i 

as well as analysis of leaf and neighbor set 

WTF: same as WT, but also query the newly discovered nodes from WT and analysis data

Failure of Pastry Nodes 

• Detection of failure 

– Periodic verification of nodes in Neighborhood Set 

• “Are you alive” also checks capability of neighbor 

– Route query fails 

• Routing in spite of failure possible 

– Route slightly worse – problems only with |L| consecutive nodes 

• Replacement of corrupted entries 

– Neighborhood-Set: 

• Choose alternative node from Leaf (L) ∪ Leaf (±|L|/2) 

– entry R x y 

in routing table failed: 

• Ask node R x i (i≠y) of same row for route to R x y 

• If not successful, test entry R x++ i in next row

Using Pastry 

• We have seen how Pastry networks are built, but how to 

use the routing structure 

– Create objectID, using file name and object owner 

– Replicas are stored on the k Pastry nodes with nodeIDs 

numerically closest to the objectID 

• By definition, the lookup it is guaranteed to reach a node 

that stores the object as long as one of the k nodes is 

alive.

Performance Evaluation 

[2.60 

Routing Performance 

– Number of Pastry hops (b=4, 

|L| = 16, |M| = 32, 2·10 5 queries 

– O(log N) for number of hops 

in the overlay 

– Overhead of overlay 

(in comparison to route 

between two node in the IP network) 

– But: 

Routing table has only O(log N) 

entries instead of O(N)

Summary Pastry 

• Complexity: 

– O(log N) hops to destination 

• Often even better through Leaf- and Neighbor-Set: 

– O(log N) storage overhead per node 

(log 2 

N ) O b 

• Good support of locality 

– Explicit search of close by nodes (following some metric) 

• Use in many applications 

– PAST (file system), Squirrel (Web-Cache), … 

– Simulator many publications available (FreePastry)

Distributed Hash Table 

– Routing complexity: O(log N) 

– Node complexity: O(log N) 

Principle: 

Chord 

– Keys of the DHT are unique l-bit identifiers 

– Form a uni-dimensional identifier 

circle space: 0 ≤ k < 2 M 

– Node and Key-Value tuple are in the same 

value range 

• node: N x 

= Hash(IP Address) 

• Key-Value-Tuple: K y 

= Hash(String) 

– E.g. Hash(„Matrix.avi“) = 107 

– K107 points to IP-Address of storage location (K107, (51.12.3.2, 4711)) 

• Based on consistent hashing 

• Recommended hash function: SHA-1 (m = 164) 

[2.59] 

N32 

K41 

N21 

N60 

[2.70] 

Beispiel: k=7 

N71 

K80 

N99 

K107 

0 2 7 -1

Each node is responsible for 

segments of the range 0...2 M -1 

– Key-Value pair of a segment are 

managed in the successor node 

– E.g., K22...K32 are managed in 

node N32 

Routing in Chord 

1. Start search for K at any node N 

2. If N manages key K, reply with tuple (K, V) 

3. If not, send query clockwise to ring neighbor 

until N > K 

 

 

Chord-Routing - I 

(K23 → IP 23 ) 

(K27 → IP 27 ) 

(K32 → IP 32 ) 

(K33 → IP 33 ) 

(K51 → IP 51 ) 

N32 

Answer is guaranteed: if K exist, it will be found 

Question: efficient forwarding 

K33 

K32 

K27 

K23 

N21 

N60 

K51 

Example: M=7 

0 2 7 -1 

N71 

K87 

K99 

N99 

(K99 → IP 9 

(K87 → IP 8

Simple Lookup 

• Each node maintains 

successor 

• Route packet(ID, data) to 

the node responsible for 

ID using successor 

pointers 

N58 

Lookup 37 

N4 

N8 

Node 44 

N15 

N44 

N20 

N35 

N32

Joining Operation I 

• Each node A periodically sends stabilize() messages to 

its successor B 

• B replies with a notify() message containing the 

predecessor of B (node X) 

• If X is between A and B, A updates its successor to X

Joining Operation II 

• Node 50 joins ring 

• Node 50 knows at least one 

other node in ring 

Succ=4 

• E.g., N15 

Pred=44 

N58 

N4 

N8 

Succ=nil 

Pred=nil 

N50 

N15 

Succ=58 

Pred=35 

N44 

N35 

N20 

N32

Joining Operation III 

• N50 sends join to N15 

• N44 returns N58 

• N50 updates succ. N4 

Succ=4 

Pred=44 

N58 

N8 

Succ=58 

Pred=nil 

N50 

Join 

N15 

Succ=58 

Pred=35 

N44 

N35 

Route to key 50 

N20 

N32

Joining Operation IV 

• N50 sends stabilize() to N58 

• N58 updates pred. 

• N58 sends notify(). N4 

Stabilize() 

Succ=4 

Pred=50 

N58 

N8 

Succ=58 

Pred=nil 

N50 

Notify(50) 

N15 

Succ=58 

Pred=35 

N44 

N35 

N20 

N32

Joining Operation V 

• Node 44 sends stabilize to 

successor N58 

• Node 58 replies with 

notify(N50) 

• Node 44 

updates 

Succ=58 

Pred=nil 

its successor 

to N50 

N50 

Succ=50 

Pred=35 

Succ=4 

Pred=50 

Stabilize() 

N44 

N58 

Notify(50) 

N4 

N8 

N15 

N35 

N20 

N32

Joining Operation VI 

• Node 44 sends stabilize to new 

successor N50 

• Node 50 sets its 

predecessor 

to N40 

Succ=58 

Pred=40 

N50 

Succ=4 

Pred=50 

N58 

N4 

N8 

Stabilize() 

N15 

Succ=50 

Pred=35 

N44 

N20 

N35 

N32

Improve Efficiency 

• Simple Routing: 

– Node must know clockwise successor 

– Consecutive forwarding to next node: O(N) 

N46 

N38 

• Improvement: 

– Node R knows all next nodes (Neighbor Set) 

– Complexity O(N/R) 

Neighbor-Set 

N21 

N33 

• R := N 

– Search complexity: O(1) ☺ 

#R 

N33 -> IP 33 

N38 -> IP 38 

N46 -> IP 46 

– But: 

Storage complexity is O(N) and problems with table updates

Exponential Search – Finger Tables 

• Exponential strategy for pointers in hash range 

– Node points with M = log N pointers in value range 

– Pointer i in node K points to the node that stores key K + 2 i , i.e., 

Successor (K + 2 i ) 

– Routing: search for node in Finger Table 

that is closest to K 

– Complexity: 

• O(log N) Hops 

• O(log N) pointers 

• Complexity during arrival 

and node departure

Exponential Search – Finger Tables 

• Exponential strategy for pointers in hash range 

– Node points with M = log N pointers in value range 

– Pointer i in node K points to the node that stores key K + 2 i , i.e., 

Successor (K + 2 i ) 

– Example K=N32 

• Finger 0=32+2 0 = 33(32,33] 

• Finger 1=32+2 1 = 34(33,34] 

• Finger 2=32+2 2 = 36(34,36] 

• Finger 2=32+2 3 = 40(36,40] 

• … 

i 

Finger-Table 

Key. Pointer 

N60 

48 

40 

33 3436 

64 

N71 

96 

0 K33 N60 

1 K34 N60 

2 K36 N60 

3 K40 N60 

4 K48 N60 

5 K64 N71 

N32 

N21


• search for K18 (M=7) 

N120 

N5 

109 

N12 

N15 

Finger-Table N32 

79 

Successor(N32+2 6) = 

Successor(K96) = N98 

N20 

N32 

i Key. Pointer 

0 K33 N59 

1 K34 N59 

2 K36 N59 

3 K40 N59 

4 K48 N59 

5 K64 N79 

6 K96 N98 

N59 

Search (K18) 

[2.7



N120 

N5 

109 

79 

N98+2 5 

→ K2 



N12 

N15 

N20 

N32 



0 K99 N109 

1 K100 N109 

2 K102 N109 

3 K106 N109 

4 K114 N120 

5 K2 N5 

6 K34 N59 

N59 

Search (K18) 

[2.7



N120 

N5 

109 

79 

N98+2 32 

→ K3 N5+2 3 

→ K13 



N12 

N15 

N20 

N32 



0 K6 N12 

1 K7 N12 

2 K9 N12 

3 K13 N15 

4 K21 N32 

5 K37 N59 

6 K71 N79 

N59 

Search (K18) 

[2.7



109 

79 

N120 

N98+2 32 

→ K3 

N5 

N5+2 3 

→ K13 

N15+2 1 

→ K17 



N12 

N15 

N20 

N32 



0 K16 N20 

1 K17 N20 

2 K19 N20 

3 K23 N32 

4 K31 N32 

5 K47 N59 

6 K79 N79 

N59 

Search (K18) 

[2.7



109 

N120 

N5 

N98+2 32 

→ K3 N5+2 3 

→ K13 

(K18 → IP 18 ) 

. . . 

N12 

N15 

N20 

N20 in 

Neighbor Set 



N32 

79 

N59 

Search (K18) 

[2.7

Example: Routing in Chord 

• search for K18 (M=7) • Evolution of path length 

109 

79 

N120 

N5 

N12 

N98+2 32 

N15 

→ K3 N5+2 3 

→ K13 N20 

N20 in 

Neighb.Set 



(K18 → IP 18 ) 

. . . 

N32 

and network size 

(Simulation) 

– O(log N) 

Path length (Chord-Hops) 

N59 

Search (K18) 

# nodes N 

[2.7

Arrival of new node in Chord 

• Objectives for insert/leave of nodes 

– Maintenance of correct search queries 

• Each node mode know its successor 

– Secondary objective for efficiency of queries 

• Finger table should be correct 

• Insert of node N 

– Start node N 1 searches successor N S and predecessor N P of N 

– Building of Finger- and Neighbor Table of N: O(log² N) 

– Update of Finger Tables, that point to N or N S respectively O(log² 

N) 

– Transfer of Key-Value pairs N P < K ≤ N: O(log N) 

– If nodes are double linked, search for predecessor

Example: Insert of new Node 

N38 

N32 

Insert of N38: 

• Building of finger table 

at node N38 

• Update of finger tables 

with link to N38, N32 

• Transfer of keys in 

segment of N38 

(K38 – K59) 

48 

36 

33 34 

N21 

N60 

64 


i dest pointer 

0 K33 N60 

1 K34 N60 

2 K36 N60 

3 K40 N60 

4 K48 N60 

5 K64 N71 

6 K96 N99 

N71 

96 

N99 


i dest pointe 

0 K39 N60 

1 K40 N60 

2 K42 N60 

3 K46 N60 

4 K54 N60 

5 K70 N71 

6 K102 N21 


i dest pointe 

0 K33 N38 

1 K34 N38 

2 K36 N38 

3 K40 N60 

4 K48 N60 

5 K64 N71 

6 K96 N99

Problems of Node Insert 

• Problems of node insert 

– Case 1: Tables of all participating nodes are correct 

• No problem – Search complexity O(log N) 

– Case 2: Successor table correct, but Finger table inconsistent 

• Query still correct – but increased complexity 

– Fall3: Successor-Table not correct 

• Possibility of wrong query result, i.e., existing key will not be found 

• Periodic stabilization prevents inconsistencies 

– Periodic verification of successor's predecessor 

– Periodic update of finger tables

Chord - Summary 

• Complexity 

– Search complexity: O(log N) 

– Storage complexity : O(log N) 

– Maintenance complexity O(log² N) 

• Advantages 

– Theoretical validation of complexity 

– In case of failure, still logarithmic 

complexity 

• Disadvantages 

– No explicit search for close by nodes (no proximity) 

– No build in security 

– Unsolved problem of network partitioning

CAN – Content Addressable Network 

O( 

D 

• DHT with complexity of 

4 

• Hash value corresponds to point in 3 dimensional space 

– E.g., H(„Matrix.avi“) → (0.7, 0.2) 

– DHT handles (key, value) 

• Each overlay node manages 

partition of space 

1 

N D 

– Node 4 manages all hash values 

in the range: x ∈ [0.5, 1], y ∈ [0, 0,25] 

– Adjoining areas called neighbors 

• 6, 2, 4, 3 neighbor of 5 (not 1!) 

• „wrap around“ at DHT borders 

• Expected number of neighbors: O(2·D) 

) 

1 

7 

3 

8 

9 

2 

6 

4 

5

CAN – Content Addressable networks

CAN – Content Addressable Network 

O( 

D 

• DHT with complexity of 

4 

• Hash value corresponds to point in 3 dimensional space 

– E.g., H(„Matrix.avi“) → (0.7, 0.2) 

– DHT handles (key, value) 

• Each overlay node manages 

partition of space 

1 

N D 

– Node 4 manages all hash values 

in the range: x ∈ [0.5, 1], y ∈ [0, 0,25] 

– Adjoining areas called neighbors 

• 6, 2, 4 neighbor of 5 (not 1!) 

• „wrap around“ at DHT borders 

• Expected number of neighbors: O(2·D) 

) 

1 

7 

3 

8 

9 

2 

6 

4 

5

Routing in CAN 

How to get from P8 to Z 

– Route along the shortest path in D-dimensional 

space 

– specifically: 

neighbor with closest distance to destination is next hop. 

Example: 

• Distance is constantly decreasing 

• Complexity: Hops 7 in DHT 

– P8: Lookup(Z) 

8 

1 

– Which neighbor is closest to 9Z 

starting at P8 

• Route to P1 

– Then go to P3 

– Then go to Z 

O( 

D 

4 

1 

N D 

) 

3 

2 

6 

4 

Z 

5 

1 

7 

3 

Start: Lookup(Z) 

6 

8 

9 

2 5 

Z 

4 

Distance to neighbors (from P8) to dest. 

Route choice using CAN-Routing

Building a CAN - I 

sert node X 

Select an arbitrary CAN node as entry to DHT 

Choose an arbitrary point in the D-dimensional space 

Route JOIN message towards this point or to node K that handles this point 

Divide the region of this node in two parts 

• Dimension of division is chosen according to dimension 

• E.g., x, y, z, x, y, z, ... with D=3 

• (key, value) pairs of nodes in the newly created region will be transferred from node K 

to node X 

• New node inherit neighbors from former node 

• Neighbors update neighborhood information 

complexity: O(D) – independent of N (!)

Building a CAN - II 

• Example: Insert P2, ..., P7 

1 

1 

1 

2 

2 

3 2 

3 

4 

1 

1 

6 

7 

6 

1 

2 5 

2 5 

2 5 

3 

4 

3 

4 

3 

4

Remove node from CAN 

• Leave of node K from CAN 

– Managed region and Key/Value pairs are transferred to 

neighbor: 

• ideally: regions merge according to border 

• If not possible: smallest neighbor (neighbor with smallest number of 

managed keys) manages both regions (no fusion!) 

– In case of controlled leave: controlled transfer 

– In case of node failure: TAKEOVER procedure 

• Triggered if no update messages from neighbors arrive at neighbors 

• According to region size, neighbors start timers (as with. ARP) 

• (smallest) neighbor indicates TAKEOVER to other neighbors and 

absorbs region 

– Network restructuring in the background

Performance Improvements with CAN 

• CAN complexity: 

– State information per node: O(D) (independent of N)! 

1 

– Routing: O( 

DN ) Hops (in Overlay!) (linear increase)! 

D 

• Problem: Overlay-Hop != Hop in IP network topology 

• Goal: similar delays between two hops as in IP network 

• Improvements: 

– Adding more dimensions: 

• Increase number of neighbors D 

– Multiple (realities) 

• Multiple DHTs at the same time 

• Point (x,y,z) is store at many nodes

Multi Dimensions ↔ Parallel Coordinate Systems 

• Multiple dimensions 

– More neighbors 

– More routing choices 

– More status information O(2D) 

• multiple coordinate systems (r) 

– r possibilities for routing 

– Status information O(r·D) 

• r-time redundancy! 

Increasing dimensions Increasing realities comparison 

conclusion: the more dimensions, the shorter the routing path in the 

verlay, but multiple coordinate systems increase redundancy 

[2.8

Additional CAN Enhancements 

• Better routing metrics 

– Measure network quality to neighbors 

– Choose neighbor with best increase 

• Overlapping regions 

– Adding redundancy 

– Faster routing due to smaller number of regions 

• Multiple hash functions 

– Redundancy 

• Equitable division in regions 

– Target region verifies before insert if another neighbor is larger 

or better suitable for handling 

• Consider topology during growth of CAN

Summary: Distributed Hash Tables 

• Low storage and search complexity 

– O(log N) with Pastry and Chord 

• CAN tends to linear complexity for search 

• Only O(D) storage complexity 

– Pastry and Chord similar 

• Pastry eventually better for locality 

• Open issues: 

– Security 

• DoS attacks against single nodes are absorbed 

• Problem: targeted attack of an node with faked/bogus information 

• Some DHT have problems with partitioning

Complexity of Hash Tables 

• Comparison: Complexity DHTs 

State per 

node 

Path length 

(Routing) 

Node insert 

Node 

removal 

CAN Chord Pastry Tapestry 

O(D) O(log N) O(log N) O(log N) 

1 

O( 

D 4 

N D ) 

1 

O( 

DN D ) 

O(log N) O(log N) O(log N) 

O(log² N) O(log N) O(log N) 

O(log² N)

Applications using DHTs 

• File Sharing: CFS, OceanStore, PAST 

• Web Cache: Squirrel 

• Censor-resistant stores: Eternity, FreeNet 

• Instant messaging (IM): Scribe 

• Name spaces: ChordDNS, INS 

• Indexing and search: Kademlia 

• Communication platform: i³ 

• distributed Backup: HiveNet 

• Web Archiving: Herodotus

References 

• DHTs allgemein: 

[2.50] R. Albert, A. Barabasi: „Statistical mechanics of complex networks“, Reviews of Modern Physics, Vol. 74, 

2002 

[2.51] L.A. Adamic, R.M. Lukose, A.R. Puniyani, B.A. Huberman: „Search in Power-Law Networks“, Physical 

Review E, Volume 64 

[2.52] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, I. Stoica: “Load Balancing in Structured P2P Systems”, 

2nd International Workshop on Peer-to-Peer Systems (IPTPS '03), Berkeley, Feb. 2003. 

[2.53] H. Balakrishnan, S. Shenker, M. Walfish: “Semantic-Free Referencing in Linked Distributed Systems”, 2nd 

International Workshop on Peer-to-Peer Systems (IPTPS '03), Berkeley, CA, February 2003. 

[2.54] T. Ottmann, P. Widmayer: „Algorithmen und Datenstrukturen“, Spektrum Akademischer Verlag, 

2. Auflage, 2002 

[2.55] H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, I. Stoica: „Looking up Data in P2P Systems“, 

Communications of ACM, Vol. 43, No.2, Feb. 2003 

[2.56] K. Hildrun, J. Kubiatowicz, S. Rao, B. Zhao: „Distributed Object Location in a Dynamic Network“, Proc. of 

14th ACM Symp. On Parallel Algorithms and Architectures (SPAA), August 2002 

[2.57] C. Plaxton, R. Rajaraman, A. Richa: „Accessing nearby copies of replicated objects in a distributed 

environment“, Proc. of ACM Symp. On Parallel Algorithms and Architectures, June 1997 

[2.58] S. Ratnasamy, S. Shenker, I. Stoica: „Routing Algorithms for DHTs: Some Open Questions“, 1st 

International Workshop on Peer-to-Peer Systems (IPTPS '02), Cambridge, February 2002 

[2.59] D. Karger, E. Lehmann, F. Leighton, et.al.: „Consistent hashing and random trees“, Proc. Of 29th Annual 

ACM Symposium on Theory of Computing, El Paso, Mai 1997

Mehr Literatur zu Pastry unter http://research.microsoft.com/~antr/Pastry/pubs.htm 

References 

• Pastry: 

[2.60] A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location 

and routing for large-scale peer-to-peer systems". IFIP/ACM International 

Conference on Distributed Systems Platforms, Heidelberg, November, 2001 

[2.61] R. Mahajan, M. Castro and A. Rowstron, "Controlling the Cost of Reliability in 

Peer-to-peer Overlays", IPTPS'03, Berkeley, CA, February 2003. 

[2.62] M. Castro, P. Druschel, A-M. Kermarrec and A. Rowstron, "One ring to rule 

them all: Service discovery and binding in structured peer-to-peer overlay 

networks", SIGOPS European Workshop, France, September, 2002. 

[2.63] M. Castro, P. Druschel, Y. C. Hu and A. Rowstron, "Exploiting network 

proximity in peer-to-peer overlay networks", Proc. Of Intern. Workshop on Future 

Directions in Distributed Computing, Bertinoro, Italy, June, 2002. 

[2.64] M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S. Wallach: "Security 

for structured peer-to-peer overlay networks". Proc. of the 5th Symposium on 

Operating Systems Design and Implementation (OSDI'02), Boston, Dezember 

2002

References 

• Chord: 

[2.70] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan: "Chord: A Scalable Peerto-peer 

Lookup Service for Internet Applications", Proc. of ACM SIGCOMM'01, San Diego, 

September 2001. 

[2.71] F. Dabek, E. Brunskill, M.F. Kaashoek, D. Karger, R. Morris, I. Stoica, H. Balakrishnan: 

„Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service“, Proc. of the 8th 

Workshop on Hot Topics in Operating Systems (HotOS-VIII), Mai 2001 

[2.72] D. Liben-Nowell, H. Balakrishnan, D. Karger: “Observations on the Dynamic Evolution of 

Peer-to-Peer Networks”, Proc. of the 1st Intern. Workshop on Peer-to-Peer Systems (IPTPS 

'02), Cambridge, März, 2002 

[2.73] E. Sit, R.T. Morris: “Security Considerations for Peer-to-Peer Distributed Hash Tables”, 

Proc. of the 1st Intern. Workshop on Peer-to-Peer Systems (IPTPS '02), Cambridge, März, 

2002 

[2.74] D. Liben-Nowell, H. Balakrishnan, D. Karger: “Analysis of the Evolution of Peer-to-Peer 

Systems“, Proc. of ACM Conf. on Principles of Distributed Computing (PODC), Monterey, 

Juli 2002 

[2.75] D.R. Karger, M. Ruhl: „Finding Nearest Neighbors in Growth-restricted Metrics“, ACM 

Symposium on Theory of Computing (STOC '02), Montréal, Mai 2002 

Mehr Literatur zum Chord-Projekt unter http://www.pdos.lcs.mit.edu/chord/

References 

• CAN: 

[2.80] S. Ratnasami, P. Francis, M. Handley, R. Karp, S. Shenker: „A Scalable 

Content-Addressable Network“, Proc. ACM SIGCOMM 2002, San Diego 

[2.81] S. Ratnasami: „A Scalable Content-Addressable Network”, Ph.D. Thesis, UC 

Berkeley, Oktober 2002 

[2.82] S. Ratnasami, M. Handley, R. Karp, S. Shenker: “Application-level Multicast 

using Content-Addressable Networks”, Proc. of NGC 2001. 

• Weitere DHT-Modelle: 

[2.90] B.Y. Zhao, J. Kubiatowicz, A. Joseph: „Tapestry: An Infrastructure for Faulttolerant 

Wide-Area Location and Routing“, Technical Report, UC Berkeley 

[2.91] B. Silaghi, B. Bhattacharjee, P. Keleher: „Query Routing in the TerraDir 

Distributed Directory“, Proc. of SPIE ITCOM’02 

[2.92] I. Clarke, O. Sandberg, B. Wiley, T. Hong: „Freenet: A distributed anonymous 

information storage and retrieval system“, Proc. of ICSI Workshop on Design 

Issues in Anonymity and Unobservability, Berkeley, Juni, 2000

PDF of color slides - Communication Systems Group

Create successful ePaper yourself

Delete template?

Save as template?