You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>THE</strong> <strong>EXPAND</strong> <strong>PARALLEL</strong> <strong>FILE</strong><br />
<strong>SYSTEM</strong><br />
A <strong>FILE</strong> <strong>SYSTEM</strong> FOR CLUSTER AND<br />
GRID COMPUTING<br />
José Daniel García Sánchez<br />
ARCOS Group – University Carlos III of Madrid
Contents<br />
2<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
University Carlos III of Madrid<br />
3<br />
Founded in 1989<br />
Three faculties:<br />
Faculty of Social Sciences<br />
and Law.<br />
Faculty of<br />
Humanities, Documentation<br />
and Communication.<br />
Higher Technical School.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
The ARCOS Group<br />
4<br />
The Computer Architecture, Communications and<br />
Systems Group is part of the Department of<br />
Computer Science.<br />
20 full time members<br />
9 PhD’s (2 full professors + 4 associate professors + 3<br />
visiting professors).<br />
11 PhD students<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Research lines<br />
5<br />
Data management on Grid environments.<br />
Parallel file systems.<br />
Optimization of irregular applications.<br />
OS for Wireless Sensor Networks.<br />
Real-time systems.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Some products<br />
6<br />
Expand: A parallel file system for cluster and grid<br />
environment.<br />
WinPFS: Windows Parallel File System.<br />
MiMPI: MPI implementation for heterogeneous<br />
cluster environments<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Contents<br />
7<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
June-93<br />
April-94<br />
February-95<br />
December-95<br />
October-96<br />
August-97<br />
June-98<br />
April-99<br />
February-00<br />
December-00<br />
October-01<br />
August-02<br />
June-03<br />
April-04<br />
February-05<br />
December-05<br />
October-06<br />
8<br />
Trends in the supercomputing<br />
environment<br />
500<br />
450<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
Clusters in top500.org<br />
75 % of supercomputers in top500 are clusters.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Trends in the supercomputing<br />
9<br />
environment<br />
Number of transistors per chip still doubling every<br />
1.5 years.<br />
Does not mean doubling frequency, performance, …<br />
More space more cores per chip.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Trends in the supercomputing<br />
10<br />
environment<br />
Grid Computing: Interconnecting supercomputers to<br />
aggregate geographically distributed resources.<br />
Applications are deployed somewhere in the grid.<br />
Applications read input data and produce output data.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Trends in the supercomputing<br />
11<br />
environment<br />
Clusters becoming the preferred option for<br />
supercomputing.<br />
Processors with increasing capacity.<br />
Grid computing using clusters as a building block.<br />
I/O will remain as a major bottleneck.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Storage system typical architecture<br />
12<br />
Clients<br />
Communication network<br />
Storage network<br />
I/O server<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Aggregated bandwidth (MB/s)<br />
Problems with storage architectures<br />
13<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
1 2 4 8 16<br />
Clients<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
NAS<br />
July 2007 - University of Modena
Solution: Parallelism<br />
14<br />
Parallel applications<br />
Parallel computers<br />
Exploit parallelism at<br />
multiple layers<br />
Parallel file systems<br />
Parallel devices<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Parallel File System Architecture<br />
15<br />
Computing<br />
node<br />
Computing<br />
node<br />
Computing node<br />
Apps<br />
Client<br />
Communication Network<br />
I/O<br />
Server<br />
I/O<br />
Server<br />
I/O<br />
Server<br />
File 1<br />
File 2<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Parallel File System Architecture<br />
16<br />
Computing<br />
node<br />
Computing<br />
node<br />
Computing node<br />
Apps<br />
Client<br />
Communication Network<br />
GPFS<br />
I/O<br />
Server<br />
I/O<br />
Server<br />
I/O<br />
Server<br />
Storage Network<br />
File 1<br />
File 2<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Parallel File System Architecture<br />
17<br />
GPFS<br />
Computing<br />
node<br />
Computing<br />
node<br />
Computing node<br />
Client<br />
Apps<br />
Storage Network<br />
File 1<br />
File 2<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
18<br />
Expand Parallel File System:<br />
Motivation<br />
Provide a high performance storage system by<br />
using standard protocols and servers.<br />
Easy integration of heterogeneous systems.<br />
Reuse and aggregation of existing resources.<br />
Parallel data access.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Why Expand?<br />
19<br />
A standard data server already includes almost all<br />
the needed functionality.<br />
Reuse.<br />
Standard protocols and servers make resources<br />
universally available.<br />
Easy to deploy.<br />
Independence of the underlying storage<br />
infrastructure.<br />
Portability.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Objective<br />
20<br />
Offer a new approach to build PFS for cluster and<br />
grid environments by using standard data servers.<br />
<br />
Advantages:<br />
No server change needed.<br />
• Operations at client side.<br />
Independence of client and server OS’s.<br />
• Operations through standard protocols.<br />
Simplified PFS construction.<br />
• Take advantage of already implemented server high performance<br />
mechanisms.<br />
Allows mixing servers with different platforms and OS’s.<br />
Easy installation and configuration.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Contents<br />
21<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Computing node<br />
Architecture<br />
22<br />
POSIX<br />
MPI-IO<br />
…<br />
Expand<br />
NFS GridFTP RNS-WS Local<br />
…<br />
Parallel access<br />
Server protocol<br />
Distributed partition<br />
File 1<br />
File 2<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
File structure<br />
23<br />
Expand file:<br />
Metadata sub-file.<br />
Several data sub-files.<br />
Data distributed across<br />
several servers.<br />
File-to-server flexible<br />
mapping policy.<br />
Sub-files<br />
Expand file<br />
0 1 2 3 4 5 6 7 8<br />
0<br />
3<br />
6<br />
1<br />
4<br />
7<br />
…………..<br />
Server 1 Server 2 Server 3<br />
2<br />
5<br />
8<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Directory structure<br />
24<br />
Logical<br />
View<br />
Mapping<br />
Physical<br />
View<br />
Dir1<br />
/Expand<br />
Dir2 Dir3<br />
Server 1 Server 2 Server 3<br />
/export1 /export2 /export3<br />
Dir1 Dir2 Dir3 Dir1 Dir2 Dir3 Dir1 Dir2 Dir3<br />
Dir4<br />
Dir4<br />
Dir4<br />
Dir4<br />
FileA<br />
FileA<br />
FileA<br />
FileA<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Metadata management<br />
25<br />
Metadata distributed<br />
management.<br />
Two levels.<br />
Without locking.<br />
No metadata manager.<br />
Metadata distributed<br />
across servers.<br />
Master node.<br />
Hashing on name.<br />
Load balancing.<br />
Expand file<br />
0 1 2 3 4 5 6 7 8 …………..<br />
0<br />
3<br />
6<br />
block<br />
Server 1 Server 2 Server 3<br />
1<br />
4<br />
7<br />
metadata<br />
2<br />
5<br />
8<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Metadata management<br />
26<br />
Metadata distributed<br />
management.<br />
Two levels.<br />
Without locking.<br />
No metadata manager.<br />
Metadata distributed<br />
across servers.<br />
Master node.<br />
Hashing on name.<br />
Load balancing.<br />
Expand file<br />
0 1 2 3 4 5 6 7 8 …………..<br />
Server 1 Server 2 Server 3<br />
Metadata<br />
FileA<br />
Metadata<br />
FileC<br />
block<br />
Metadata<br />
FileD<br />
Metadata<br />
FileF<br />
Metadata<br />
FileB<br />
Metadata<br />
FileE<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Parallel access<br />
27<br />
read(fd, buffer,count)<br />
buffer<br />
Data blocks<br />
0 1 2 3 4 5 6 7 8<br />
Expand<br />
Threads<br />
Server 1 Server 2 Server 3<br />
0<br />
3<br />
6<br />
1<br />
4<br />
7<br />
2<br />
5<br />
8<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
MPI-IO interface using ROMIO<br />
28<br />
MPI-IO<br />
ADIO<br />
Unix NFS PFVS Expand<br />
IBM<br />
PIOFS<br />
SGI XFS<br />
Distributed partition<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Dynamic partition reconfiguration<br />
29<br />
Server 1 Server 2 Server 3 Server 4<br />
2<br />
5<br />
8<br />
0<br />
3<br />
6<br />
1<br />
4<br />
7<br />
Instantaneous.<br />
Deferred.<br />
hash(file) = server 3<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Dynamic partition reconfiguration<br />
30<br />
Server 1 Server 2 Server 3 Server 3<br />
2<br />
5<br />
8<br />
12<br />
16<br />
0<br />
3<br />
6<br />
1<br />
4<br />
7<br />
9 10<br />
13 14<br />
11<br />
15<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Expand cluster versions<br />
31<br />
Linux/NFS<br />
Java/NFS<br />
NFS<br />
Server<br />
NFS<br />
Server<br />
NFS<br />
Server<br />
NFS<br />
Server<br />
Distributed partition<br />
Distributed partition<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Contents<br />
32<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand adaptation for Grid Computing.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Requirements for a Grid File System<br />
33<br />
Hierarchical logical space name.<br />
Resource Namespace Service (RNS).<br />
Standard access interface.<br />
POSIX and MPI-IO.<br />
Data access.<br />
GridFTP.<br />
Security.<br />
Grid Security Infrastructure (GSI).<br />
Performance optimization and improvement.<br />
Paralle I/O.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Computing node<br />
Adapting Expand to Grid environments<br />
34<br />
POSIX<br />
MPI-IO<br />
…<br />
Computing node<br />
Expand<br />
Computing node<br />
NFS GridFTP RNS-WS Local<br />
…<br />
RNS<br />
NFS NFS NFS<br />
Internet + GSI<br />
GridFTP GridFTP GridFTP GridFTP<br />
Distributed<br />
partition<br />
Site 1 Site 2 Site 3 Site 4<br />
Distributed partition<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Contents<br />
35<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Evaluation<br />
36<br />
How does Expand behaves compared to other<br />
existing solutions?<br />
Cluster<br />
• PFVS.<br />
• GPFS.<br />
Grid<br />
• Globus Grid services.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Cluster environment<br />
37<br />
8 biprocessors (Pentium VI, 3.2 GHz).<br />
2 GB RAM per node.<br />
Network: Gigabit ethernet.<br />
Expand.<br />
PVFS.<br />
GPFS.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Cluster benchmarnking<br />
38<br />
High performance.<br />
Parallel access to a file: IOR benchmark.<br />
FLASH I/O benchmark.<br />
Metadata operations.<br />
High throughput.<br />
Image processing.<br />
Dynamic partition reconfiguration.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
High performance:<br />
39<br />
Parallel access to a file<br />
Parallel program (IOR) making interleaved writes<br />
and reads to a single file with different access sizes.<br />
MPI-IO interface<br />
Process 1 Process 2<br />
File 0 1 2 3 4 5 6 7 8<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Bandwidth (MB/s)<br />
40<br />
High performance:<br />
Parallel access to a file for writing<br />
120<br />
8 processes (writing)<br />
100<br />
80<br />
60<br />
40<br />
20<br />
XPN<br />
PVFS<br />
GPFS<br />
0<br />
access size<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Bandwidth (MB/s)<br />
41<br />
High performance:<br />
Parallel access to a file for reading<br />
140<br />
8 processes (reading)<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
XPN<br />
PVFS<br />
GPFS<br />
0<br />
access size<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Bandwidth (MB/s)<br />
42<br />
High performance:<br />
Parallel access to a file for writing<br />
140<br />
Parallel access – writing (8 KB)<br />
120<br />
100<br />
XPN<br />
PVFS<br />
GPFS<br />
80<br />
60<br />
40<br />
20<br />
0<br />
1 2 4 8 16<br />
Number of processes<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Bandwidth (MB/s)<br />
43<br />
High performance.<br />
Parallel access to a file for writing<br />
100<br />
90<br />
80<br />
XPN<br />
PVFS<br />
GPFS<br />
Parallel access – writing (256 KB)<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
1 2 4 8 16<br />
Number of processes<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
High Performance: FLASH-IO<br />
44<br />
FLASH is a parallel application simulating<br />
thermonuclear flashes.<br />
FLASH-IO simulates I/O operations performed by<br />
FLASH.<br />
Data size is proportional to number of running<br />
processes.<br />
1 process 73.53 MB<br />
16 processes 1.16 GB<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Bandwidth (MB/s)<br />
High Performance: FLASH-IO<br />
45<br />
80<br />
70<br />
60<br />
XPN<br />
PVFS<br />
GPFS<br />
Benchmark FLASH-IO<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
1 2 4 8 16<br />
Number of processes<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Files/second<br />
Metadata: Creating empty files<br />
46<br />
1400<br />
File creation (empty files))<br />
1200<br />
1 process 4 processes<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
XPN PVFS GPFS<br />
Filesystem<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Files/second<br />
Metadata: Creating small files<br />
47<br />
1200<br />
File creation (small files)<br />
1000<br />
800<br />
1 process<br />
4 processes<br />
600<br />
400<br />
200<br />
0<br />
XPN PVFS GPFS<br />
Filesystem<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
High throughput<br />
48<br />
Parallel application processing a set of 256 images.<br />
Each process works on a subset of images<br />
independently.<br />
No concurrent access to a file.<br />
Sizes:<br />
Image file 5 MB.<br />
Full dataset 2.5 GB.<br />
The process applies to each image file a fixed<br />
bitmask to generate a new image file.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Time (s)<br />
49<br />
High throughput:<br />
Image processing in C<br />
100<br />
90<br />
80<br />
70<br />
60<br />
Image processing (C application)<br />
XPN<br />
PVFS<br />
GPFS<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
1 2 4 8 16<br />
Number of processes<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Time (s)<br />
50<br />
High throughput:<br />
Image processing in Java<br />
450<br />
Image processing (Java application)<br />
400<br />
350<br />
XPN<br />
PVFS<br />
GPFS<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
1 2 4 8 16<br />
Number of processes<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Bandwidth (MB/s)<br />
Reconstruction time (min)<br />
51<br />
Dynamic partition reconfiguration:<br />
Adding new nodes<br />
90<br />
80<br />
70<br />
9<br />
8<br />
7<br />
60<br />
6<br />
50<br />
40<br />
30<br />
20<br />
10<br />
5<br />
4<br />
3<br />
2<br />
1<br />
Bandwidth (MB/s)<br />
Reconstruction time (min)<br />
0<br />
4-5 4-6 4-7 4-8 5-6 5-7 5-8 6-7 6-8 7-8<br />
Reconstruction Model<br />
0<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Grid evaluation environment<br />
52<br />
Evaluation for high throughput.<br />
Perform 500 jobs.<br />
Each job selects randomly a file (among 200) to access.<br />
File size is 200 MB.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Testbed environment<br />
53<br />
Intel Pentium 4<br />
3.20GHz<br />
•<br />
GridFTP<br />
Intel Xeon<br />
Doble Procesador 2.4GHz<br />
Intel(R) Pentium(R) 4<br />
CPU 2.40GHz<br />
•<br />
GridFTP<br />
•<br />
GridFTP<br />
•<br />
GridFTP<br />
Intel Pentium 4 CPU<br />
2.80GHz<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
AMD Athlon<br />
July 2007 - University of Modena
Evaluated scenarios<br />
54<br />
Typical Grid<br />
Completely transfer<br />
file to local node.<br />
Processing starts after<br />
transfer finishes.<br />
Globus services for<br />
transfer.<br />
globus-url-copy<br />
Expand<br />
Direct remote access to<br />
file.<br />
No previous transfer<br />
to node needed!<br />
The Expand Parallel File System José Daniel García Sánchez – ARCOS<br />
Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Model 1 / Scenario 1<br />
55<br />
1 server<br />
Complete transfer to<br />
local node.<br />
Application access<br />
local copy.<br />
GridFTP<br />
Files<br />
The Expand Parallel File System José Daniel García Sánchez – ARCOS<br />
Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Model 2 / Scenario 1<br />
56<br />
4 servers.<br />
Distributed files.<br />
Each server stores 50<br />
files.<br />
Complete transfer to<br />
local node.<br />
Application accesses<br />
local copy.<br />
GridFTP GridFTP GridFTP GridFTP<br />
Files<br />
The Expand Parallel File System José Daniel García Sánchez – ARCOS<br />
Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Computing node<br />
Scenario 2 (Expand)<br />
57<br />
Expand with 1, 2 and<br />
4 servers.<br />
POSIX<br />
MPI-IO<br />
Expand<br />
…<br />
NFS GridFTP RNS-WS Local<br />
…<br />
Local node accesses<br />
remotely needed<br />
data.<br />
No previous transfer<br />
needed.<br />
RNS<br />
Internet + GSI<br />
GridFTP GridFTP GridFTP GridFTP<br />
Site 1 Site 2 Site 3 Site 4<br />
The Expand Parallel File System José Daniel García Sánchez – ARCOS<br />
Group – University Carlos III of Madrid<br />
Distributed partition<br />
July 2007 - University of Modena
Time (min)<br />
Grid Evaluation<br />
58<br />
200<br />
180<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
1 site 4 sites<br />
(distributed<br />
files)<br />
Expand (1<br />
server)<br />
Expand (2<br />
servers)<br />
Expand (4<br />
servers)<br />
Model<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Contents<br />
59<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Conclusions<br />
60<br />
It is feasible to build parallel file system by using<br />
standard protocols and servers.<br />
Our solution is easily adaptable to different<br />
environments/situations (cluster and grid are<br />
examples).<br />
Performance results are comparable to other<br />
solutions (even comercial).<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Contents<br />
61<br />
The ARCOS Group.<br />
Expand motivation.<br />
Expand design.<br />
Expand evaluation.<br />
Conclusions.<br />
Ongoing Work.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Ongoing work<br />
62<br />
Add new protocols (e.g. Web Services)<br />
Evaluation in large clusters and grid environments.<br />
Use Expand to improve performance when<br />
accessing replicated data.<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
Ongoing work<br />
63<br />
Use Expand as intermediate file system in large<br />
clusters.<br />
Apps<br />
Expand<br />
Cluster File System (PFVS, GPFS, etc.)<br />
Compute<br />
nodes<br />
Parallel<br />
access<br />
Network<br />
I/O nodes<br />
The Expand Parallel File System<br />
José Daniel García Sánchez – ARCOS Group – University Carlos III of Madrid<br />
July 2007 - University of Modena
<strong>THE</strong> <strong>EXPAND</strong> <strong>PARALLEL</strong> <strong>FILE</strong><br />
<strong>SYSTEM</strong><br />
A <strong>FILE</strong> <strong>SYSTEM</strong> FOR CLUSTER AND<br />
GRID COMPUTING<br />
José Daniel García Sánchez<br />
ARCOS Group – University Carlos III of Madrid