28.07.2013 Views

The Hadoop Distributed File System - IEEE-NASA Storage ...

The Hadoop Distributed File System - IEEE-NASA Storage ...

The Hadoop Distributed File System - IEEE-NASA Storage ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

aged our users to create larger files, this has not happened since<br />

it would require changes in application behavior. We have<br />

added quotas to manage the usage and have provided an archive<br />

tool. However these do not fundamentally address the<br />

scalability problem.<br />

Our near-term solution to scalability is to allow multiple<br />

namespaces (and NameNodes) to share the physical storage<br />

within a cluster. We are extending our block IDs to be prefixed<br />

by block pool identifiers. Block pools are analogous to LUNs<br />

in a SAN storage system and a namespace with its pool of<br />

blocks is analogous as a file system volume.<br />

This approach is fairly simple and requires minimal<br />

changes to the system. It offers a number of advantages besides<br />

scalability: it isolates namespaces of different sets of applications<br />

and improves the overall availability of the cluster. It also<br />

generalizes the block storage abstraction to allow other services<br />

to use the block storage service with perhaps a different namespace<br />

structure. We plan to explore other approaches to scaling<br />

such as storing only partial namespace in memory and truly<br />

distributed implementation of the NameNode in the future. In<br />

particular, our assumption that applications will create a small<br />

number of large files was flawed. As noted earlier, changing<br />

application behavior is hard. Furthermore, we are seeing new<br />

classes of applications for HDFS that need to store a large<br />

number of smaller files.<br />

<strong>The</strong> main drawback of multiple independent namespaces is<br />

the cost of managing them, especially if the number of namespaces<br />

is large. We are also planning to use application or job<br />

centric namespaces rather than cluster centric namespaces—<br />

this is analogous to the per-process namespaces that are used to<br />

deal with remote execution in distributed systems in the late<br />

80s and early 90s [10][11][12].<br />

Currently our clusters are less than 4000 nodes. We believe<br />

we can scale to much larger clusters with the solutions outlined<br />

above. However, we believe it is prudent to have multiple clusters<br />

rather than a single large cluster (say three 6000-node clusters<br />

rather than a single 18 000-node cluster) as it allows much<br />

improved availability and isolation. To that end we are planning<br />

to provide greater cooperation between clusters. For example<br />

caching remotely accessed files or reducing the replication<br />

factor of blocks when files sets are replicated across clusters.<br />

VI. ACKNOWLEDGMENT<br />

We would like to thank all members of the HDFS team at<br />

Yahoo! present and past for their hard work building the file<br />

system. We would like to thank all <strong>Hadoop</strong> committers and<br />

collaborators for their valuable contributions. Corinne Chandel<br />

drew illustrations for this paper.<br />

10<br />

REFERENCES<br />

[1] Apache <strong>Hadoop</strong>. http://hadoop.apache.org/<br />

[2] P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur. “PVFS: A<br />

parallel file system for Linux clusters,” in Proc. of 4th Annual Linux<br />

Showcase and Conference, 2000, pp. 317–327.<br />

[3] J. Dean, S. Ghemawat, “MapReduce: Simplified Data Processing on<br />

Large Clusters,” In Proc. of the 6th Symposium on Operating <strong>System</strong>s<br />

Design and Implementation, San Francisco CA, Dec. 2004.<br />

[4] A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C.<br />

Olston, B. Reed, S. Srinivasan, U. Srivastava. “Building a High-Level<br />

Dataflow <strong>System</strong> on top of MapReduce: <strong>The</strong> Pig Experience,” In Proc.<br />

of Very Large Data Bases, vol 2 no. 2, 2009, pp. 1414–1425<br />

[5] S. Ghemawat, H. Gobioff, S. Leung. “<strong>The</strong> Google file system,” In Proc.<br />

of ACM Symposium on Operating <strong>System</strong>s Principles, Lake George,<br />

NY, Oct 2003, pp 29–43.<br />

[6] F. P. Junqueira, B. C. Reed. “<strong>The</strong> life and times of a zookeeper,” In<br />

Proc. of the 28th ACM Symposium on Principles of <strong>Distributed</strong><br />

Computing, Calgary, AB, Canada, August 10–12, 2009.<br />

[7] Lustre <strong>File</strong> <strong>System</strong>. http://www.lustre.org<br />

[8] M. K. McKusick, S. Quinlan. “GFS: Evolution on Fast-forward,” ACM<br />

Queue, vol. 7, no. 7, New York, NY. August 2009.<br />

[9] O. O'Malley, A. C. Murthy. <strong>Hadoop</strong> Sorts a Petabyte in 16.25 Hours and<br />

a Terabyte in 62 Seconds. May 2009.<br />

http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petab<br />

yte_in_162.html<br />

[10] R. Pike, D. Presotto, K. Thompson, H. Trickey, P. Winterbottom, “Use<br />

of Name Spaces in Plan9,” Operating <strong>System</strong>s Review, 27(2), April<br />

1993, pages 72–76.<br />

[11] S. Radia, "Naming Policies in the spring system," In Proc. of 1st <strong>IEEE</strong><br />

Workshop on Services in <strong>Distributed</strong> and Networked Environments,<br />

June 1994, pp. 164–171.<br />

[12] S. Radia, J. Pachl, “<strong>The</strong> Per-Process View of Naming and Remote<br />

Execution,” <strong>IEEE</strong> Parallel and <strong>Distributed</strong> Technology, vol. 1, no. 3,<br />

August 1993, pp. 71–80.<br />

[13] K. V. Shvachko, “HDFS Scalability: <strong>The</strong> limits to growth,” ;login:.<br />

April 2010, pp. 6–16.<br />

[14] W. Tantisiriroj, S. Patil, G. Gibson. “Data-intensive file systems for<br />

Internet services: A rose by any other name ...” Technical Report CMU-<br />

PDL-08-114, Parallel Data Laboratory, Carnegie Mellon University,<br />

Pittsburgh, PA, October 2008.<br />

[15] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu,<br />

P. Wyckoff, R. Murthy, “Hive – A Warehousing Solution Over a Map-<br />

Reduce Framework,” In Proc. of Very Large Data Bases, vol. 2 no. 2,<br />

August 2009, pp. 1626-1629.<br />

[16] J. Venner, Pro <strong>Hadoop</strong>. Apress, June 22, 2009.<br />

[17] S. Weil, S. Brandt, E. Miller, D. Long, C. Maltzahn, “Ceph: A Scalable,<br />

High-Performance <strong>Distributed</strong> <strong>File</strong> <strong>System</strong>,” In Proc. of the 7 th<br />

Symposium on Operating <strong>System</strong>s Design and Implementation, Seattle,<br />

WA, November 2006.<br />

[18] B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J.<br />

Zelenka, B. Zhou, “Scalable Performance of the Panasas Parallel file<br />

<strong>System</strong>”, In Proc. of the 6 th USENIX Conference on <strong>File</strong> and <strong>Storage</strong><br />

Technologies, San Jose, CA, February 2008<br />

[19] T. White, <strong>Hadoop</strong>: <strong>The</strong> Definitive Guide. O'Reilly Media, Yahoo! Press,<br />

June 5, 2009.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!