17.12.2012 Views

S-FTL: An Efficient Address Translation for Flash Memory by ...

S-FTL: An Efficient Address Translation for Flash Memory by ...

S-FTL: An Efficient Address Translation for Flash Memory by ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

S-<strong>FTL</strong>: <strong>An</strong> <strong>Efficient</strong> <strong>Address</strong> <strong>Translation</strong> <strong>for</strong> <strong>Flash</strong><br />

<strong>Memory</strong> <strong>by</strong> Exploiting Spatial Locality<br />

Song Jiang, Lei Zhang, Xinhao Yuan, Hao Hu, and Yu Chen<br />

Department of Electrical and<br />

Computer Engineering<br />

Wayne State University<br />

Cluster and Internet Computing Laboratory<br />

Department of Computer<br />

Science and Technology<br />

Tsinghua University<br />

Wayne State University


Making Solid-State Disk Cost Competitive<br />

� The solid-state disk (SSD) is becoming increasingly popular.<br />

– Great per<strong>for</strong>mance advantage over HDD.<br />

– Uncompetitive price as HDD keeps rapidly dropping price.<br />

– Has to adopt low-end configurations to reduce cost (MLC, small DRAM<br />

cache).<br />

� The built-in cache is per<strong>for</strong>mance critical.<br />

– Much fast than the flash, especially <strong>for</strong> write.<br />

– Used as buffer <strong>for</strong> data pages (well studied)<br />

– Used as a buffer <strong>for</strong> mapping table of address translation.<br />

�� The challenge: how to maintain efficient SSD operations even with with<br />

a small buffer cache?<br />

2


<strong>Flash</strong> <strong>Address</strong> <strong>Translation</strong><br />

<strong>Flash</strong><br />

<strong>Translation</strong><br />

Level<br />

Logical Page Number<br />

(LPN)<br />

The Mapping Rule<br />

Physical Page Number<br />

(PPN)<br />

Data Pages<br />

(In the flash)<br />

The Mapping Table<br />

(in cache)<br />

What if the table is too large to be fully held in the cache?


Method 1: Move the Table to the <strong>Flash</strong> (Page-level <strong>FTL</strong>)<br />

Logical Page Number<br />

(LPN)<br />

The Mapping Rule<br />

Physical Page Number<br />

(PPN)<br />

The <strong>Flash</strong><br />

Table<br />

Miss? Writebacks?<br />

The Mapping Table


Method 2: Make the Table Smaller (Block-level <strong>FTL</strong>)<br />

For pages in a block,<br />

Page-level Table<br />

LPN<br />

(1) The first LPN<br />

0 PPN must<br />

0<br />

be a multiple of block<br />

1 PPN 1<br />

size; and<br />

n-1<br />

Logical Page Number (LPN)<br />

Logical Block Number (LBN) In-block Offset<br />

Logical Page Number (LPN)<br />

. ..<br />

(2) All pages’ LPNs<br />

must be contiguous.<br />

PPN n-1<br />

Block-level Table<br />

LBN<br />

0<br />

1<br />

k -1<br />

PBN 0<br />

PBN 1<br />

..<br />

PBN k<br />

Physical Block Physical Number Page (PBN) Number (PPN)<br />

In-block In block Offset<br />

Physical Page Number (PPN)


Maintaining Block-based Page Layout is Expensive!<br />

� The flash requires “erase be<strong>for</strong>e write”, or has to “out of place” write to<br />

erased place.<br />

– The erase unit is block, which usually contains 32-128 pages.<br />

– Erase operation is expensive (2ms compared with 0.1ms <strong>for</strong> read and<br />

0.4ms <strong>for</strong> write)<br />

� For the block-level <strong>FTL</strong>, one page write can lead to tens of reads/writes.<br />

Physical Block 1 (Erased)<br />

LPN = 0<br />

LPN = 1<br />

LPN = 2<br />

LPN = 3<br />

x<br />

x<br />

x<br />

x<br />

Write to<br />

page 3<br />

Physical Block 10<br />

LPN = 0<br />

LPN = 1<br />

LPN = 2<br />

LPN = 3<br />

(Erased)


Hybrid <strong>FTL</strong> attempts to <strong>Address</strong> the Issue<br />

� Set aside a small number of log blocks to hold newly written<br />

pages managed <strong>by</strong> the page-level mapping.<br />

� Majority of pages stay in the data blocks managed <strong>by</strong> the blocklevel<br />

mapping.<br />

� But garbage collection and block merging can still be expensive!<br />

– because the rule <strong>for</strong> the page layout in a block is so strict!<br />

Log Block<br />

LPN = 0 √<br />

LPN = 1 √<br />

LPN = 2 √<br />

LPN = 3 √<br />

Switch<br />

Merge<br />

Data Block<br />

LPN = 0 √<br />

LPN = 1 √<br />

LPN = 2 √<br />

LPN = 3 √


Only Block-based Page Layout can Help<br />

Log Block<br />

LPN = 1 √<br />

LPN = 2 √<br />

LPN = 3 √<br />

LPN = 4 √<br />

New Data Blocks<br />

LPN = 0<br />

LPN = 1<br />

LPN = 2<br />

LPN = 3<br />

LPN = 4<br />

LPN = 5<br />

LPN = 6<br />

LPN = 7<br />

Data Blocks<br />

LPN = 0 √<br />

LPN = 1 x<br />

LPN = 2 x<br />

LPN = 3 x<br />

LPN = 4 x<br />

LPN = 5 √<br />

LPN = 6 √<br />

LPN = 7 √


Only Block-based Page Layout can Help (cont’d)<br />

Log Block<br />

LPN = 0 √<br />

LPN = 1 √<br />

LPN = 2 √<br />

LPN = 4 √<br />

New Data Blocks<br />

LPN = 0<br />

LPN = 1<br />

LPN = 2<br />

LPN = 3<br />

LPN = 4<br />

LPN = 5<br />

LPN = 6<br />

LPN = 7<br />

Data Blocks<br />

LPN = 0 x<br />

LPN = 1 x<br />

LPN = 2 x<br />

LPN = 3 √<br />

LPN = 4 x<br />

LPN = 5 √<br />

LPN = 6 √<br />

LPN = 7 √


S-<strong>FTL</strong>: Spatial-locality-aware <strong>FTL</strong><br />

� S-<strong>FTL</strong> does not impose mapping regularity to reduce mapping table.<br />

– It uses page-level mapping.<br />

– The table is stored in the flash and cached in the memory.<br />

� S-<strong>FTL</strong> can reduce the cached table <strong>by</strong> exploiting readily available locality<br />

in the access streams.<br />

– <strong>An</strong>y access sequentiality exhibited in the request stream can be used<br />

to reduce the table.<br />

– The more sequential, the more reduction.<br />

10


Page LPN = 10<br />

TransLPN is 1 (10 ÷ 8)<br />

Mapping Table<br />

TransPPN = 10<br />

Data PPN<br />

21<br />

22<br />

23<br />

24<br />

78<br />

79<br />

80<br />

81<br />

TransLPN = 0<br />

� PPN = 201<br />

In-page offset is 2 (10 mod 8)<br />

TransPPN = 50<br />

Data PPN<br />

199<br />

200<br />

201<br />

202<br />

203<br />

204<br />

205<br />

206<br />

TransLPN = 1<br />

The <strong>Flash</strong><br />

……<br />

The Cache<br />

TransPPN<br />

10 √<br />

50 x<br />

…<br />

Global <strong>Translation</strong><br />

Directory<br />

Cached Trans. Pages<br />

TransLPN = 0<br />

TransLPN = 6<br />

……<br />

TransLPN = 10<br />

TransLPN = 8


Make the <strong>Translation</strong> Page Smaller<br />

<strong>Translation</strong> Page<br />

LPN 8 9 10 11 12 13 14 15<br />

PPN<br />

bitmap<br />

1 2 3 4 10 11 12 14<br />

� If we know some PPNs are contiguous, all PPNs can be computed from the<br />

first mapping entry (head entry).<br />

– <strong>An</strong> example:<br />

PPN of LPN 11 is 4 [ 1 + (11 - 8) ]<br />

– There<strong>for</strong>e, only the head entry needs to be stored.<br />

� Use bitmap to record the contiguousness of PPNs.<br />

� How much space can be saved?<br />

– Assume 512 entries / page, 4B entries, and fully contiguous PPNs in the page<br />

� reduced <strong>by</strong> 96.6%.<br />

– Assume average contiguous sequence length is 2 � reduced <strong>by</strong> nearly 50%.<br />

3<br />

1 0 0 0 1 0 0 1


Use Multi-bit Bitmap to Retain Long Sequences<br />

LPN<br />

LPN<br />

8 9 10 11 12 13 14 15<br />

1 2 3 32 4 33 5 6 7 8<br />

1 0 0 01 0 01 0 0<br />

8 9 10 11 12 13 14 15<br />

1 2 3 32 33 6 7 8<br />

10 00 00 11 01 00 00 00<br />

•The first bit: is it a head entry?<br />

•The second bit: is it my head entry?<br />

5<br />

1<br />

<strong>Translation</strong> Page<br />

Bitmap<br />

<strong>Translation</strong> Page<br />

Bitmap


<strong>Efficient</strong> Caching and Batched Writeback<br />

� Exploit sequential access <strong>for</strong> efficient caching.<br />

– Two convertible representations of a translation page: In-flash<br />

<strong>for</strong>m and bitmap <strong>for</strong>m.<br />

– The size of translation page in the bitmap <strong>for</strong>m changes.<br />

– Use translation page as the caching unit.<br />

– More space-efficient <strong>for</strong>m is used <strong>for</strong> caching.<br />

� Use batched writeback to reduce overhead.<br />

– Replaced dirty pages need to be written back.<br />

– If there are only few dirty entries in a page, keep them in the<br />

cache <strong>for</strong> longer time.<br />

– Batch the dirty entries <strong>for</strong> cost-effective writeback.<br />

14


Experiment Setup<br />

� The SSD to be simulated:<br />

– 32GB large-block NAND SSD<br />

– 2KB pages, 128KB blocks, and 64KB cache <strong>by</strong> default.<br />

– 0.12ms page read, 0.41ms page write, and 2.0ms block erasure.<br />

– Using the enhanced <strong>Flash</strong>Sim simulator.<br />

� The <strong>FTL</strong>s to be compared:<br />

– D<strong>FTL</strong>: use page-level mapping and cache recently used mapping entries.<br />

[Kim, et al. ASPLOS’09]<br />

– FAST: use hybrid mapping. [Sang, et al, ACM TECS 2007]<br />

– Optimal <strong>FTL</strong>: use page-level mapping with an infinitely large cache<br />

� The traces


Hit Ratios


Response times


Standard Deviation of Respond Time


Distribution of System Response Time (Financial)<br />

S-<strong>FTL</strong> vs. D<strong>FTL</strong><br />

Trans. page access reduced <strong>by</strong> 93%<br />

Trans. block erases reduced <strong>by</strong> 28%


Distribution of System Response Time (WebSearch)<br />

Hit ratio <strong>for</strong> D<strong>FTL</strong> is 93%<br />

Hit ratio <strong>for</strong> S-<strong>FTL</strong> is 31%


Impact of Cache Size on Hit Ratio


Impact of Cache Size on Response Time


Conclusions<br />

� S-<strong>FTL</strong> reduces space demand <strong>for</strong> caching mapping table<br />

� S-<strong>FTL</strong> exploits only readily available spatial locality<br />

� S-<strong>FTL</strong> does not impose strict mapping rule and does not<br />

introduce additional garbage collection cost.<br />

� Experiments demonstrate that it can consistently improve<br />

response times <strong>for</strong> workloads of diverse access behaviors.<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!