Embed Parallelism into PostgreSQL
DEXA 2013 DEXA 2013
Taming Elephants, or How to Embed Parallelism into PostgreSQL Mikhail Zymbler, Constantin Pan South Ural State University, Chelyabinsk, Russian Federation 24th International Conference on Database and Expert Systems Applications - DEXA 2013 Prague, Czech Republic, August 26-29, 2013 The reported study was partially supported by the Russian Foundation for Basic Research, research projects No. 12-07-31217 and No. 12-07-00443.
- Page 2 and 3: PargreSQL project + PARTITIONED PAR
- Page 4 and 5: Serial vs parallel query plan SELEC
- Page 6 and 7: Deployment scheme PostgreSQL Pargre
- Page 8 and 9: DBMS subsystems PostgreSQL PargreSQ
- Page 10 and 11: par_Compat PostgreSQL application P
- Page 12 and 13: EXCHANGE operator exchange port p E
- Page 14 and 15: par_Storage create table R ( ID in
- Page 16 and 17: Parallelizer: merging and aggregati
- Page 18 and 19: Parallelizer: UPDATE queries update
- Page 20 and 21: Experiments R ⋈ S |R|=6∙10 7 |S
- Page 22 and 23: Conclusion Design and implementati
Taming Elephants, or How to<br />
<strong>Embed</strong> <strong>Parallelism</strong> <strong>into</strong> <strong>PostgreSQL</strong><br />
Mikhail Zymbler, Constantin Pan<br />
South Ural State University,<br />
Chelyabinsk, Russian Federation<br />
24th International Conference on Database<br />
and Expert Systems Applications - DEXA 2013<br />
Prague, Czech Republic, August 26-29, 2013<br />
The reported study was partially supported by the Russian Foundation<br />
for Basic Research, research projects No. 12-07-31217 and No. 12-07-00443.
PargreSQL project<br />
+<br />
PARTITIONED PARALLELISM<br />
26-Aug-13 DEXA 2013 2
Partitioned parallelism<br />
26-Aug-13 DEXA 2013 3
Serial vs parallel query plan<br />
SELECT Name<br />
FROM S<br />
WHERE City='Prague'<br />
π<br />
Name<br />
σ<br />
City='Prague'<br />
Serial plan<br />
π<br />
Name<br />
σ<br />
City='Prague'<br />
S<br />
Parallel plan<br />
S 0<br />
…<br />
π<br />
Name<br />
σ<br />
City='Prague'<br />
S n-1<br />
26-Aug-13 DEXA 2013 4
DBMS processes<br />
Daemon<br />
connect<br />
fork<br />
Backend<br />
Daemon<br />
fork<br />
PargreSQL<br />
Backend<br />
Frontend<br />
<strong>PostgreSQL</strong><br />
interact<br />
…<br />
connect<br />
…<br />
Frontend<br />
…<br />
interact<br />
…<br />
exchange<br />
Daemon<br />
fork<br />
Backend<br />
26-Aug-13 DEXA 2013<br />
5
Deployment scheme<br />
<strong>PostgreSQL</strong><br />
PargreSQL<br />
26-Aug-13 DEXA 2013 6
Query processing<br />
<strong>PostgreSQL</strong><br />
PargreSQL<br />
26-Aug-13 DEXA 2013 7
DBMS subsystems<br />
<strong>PostgreSQL</strong><br />
PargreSQL<br />
26-Aug-13 DEXA 2013 8
par_libpq-fe<br />
26-Aug-13 DEXA 2013 9
par_Compat<br />
<strong>PostgreSQL</strong> application<br />
PargreSQL application<br />
26-Aug-13 DEXA 2013 10
EXCHANGE operator<br />
26-Aug-13 DEXA 2013 11
EXCHANGE operator<br />
exchange<br />
port p<br />
E<br />
exchange<br />
function ψ<br />
Exchange port p means ID to differ such<br />
operators.<br />
Exchange function ψ returns a number of<br />
node where tuple should be processed.<br />
Pseudo code<br />
if (ψ(tuple)==mynode())<br />
Put(tuple, this_output_buffer);<br />
else {<br />
Send(tuple, ψ(tuple));<br />
Put(tuple, that_output_buffer);<br />
}<br />
26-Aug-13 DEXA 2013 12
Backend<br />
Backend<br />
Backend<br />
Backend<br />
Backend<br />
Backend<br />
Message passing system<br />
Communicator<br />
Shared<br />
Memory<br />
MPI<br />
Communicator<br />
Shared<br />
Memory<br />
Why not plain MPI?<br />
• Because of a fork()<br />
inside the <strong>PostgreSQL</strong><br />
daemon<br />
MPI-like interface<br />
• ISend()<br />
• IRecv()<br />
• Test()<br />
26-Aug-13 DEXA 2013 13
par_Storage<br />
create table R (<br />
ID integer primary key,<br />
Attr1 …<br />
…)<br />
with (fragattr = ID);<br />
-- Set ID as fragmentation attribute<br />
-- with fragmentation function ID % N,<br />
-- where number of nodes N<br />
-- is a number of lines in config file.<br />
26-Aug-13 DEXA 2013 14
Parallelizer: joins<br />
select smth from T1, T2<br />
merge<br />
join<br />
hash<br />
join<br />
nested loop<br />
join<br />
sort<br />
sort<br />
hash<br />
hash<br />
E<br />
material<br />
E<br />
E<br />
E<br />
E<br />
E<br />
26-Aug-13 DEXA 2013 15
Parallelizer: merging and aggregation<br />
select sum(a) from T<br />
select a, sum(b) from T<br />
group by a<br />
E<br />
port=0 ψ≡MERGER_NODE<br />
root<br />
agg<br />
E<br />
ψ≡MERGER_NODE<br />
group<br />
agg<br />
E<br />
by attr<br />
ψ≡f(attr)<br />
26-Aug-13 DEXA 2013 16
Parallelizer: INSERT queries<br />
insert <strong>into</strong> T values (...);<br />
Result<br />
filter(t.fragattr % n = mynode)<br />
26-Aug-13 DEXA 2013 17
Parallelizer: UPDATE queries<br />
update T set …<br />
if (!IsNative(t)) {<br />
// for the SCATTER<br />
dup=Duplicate(t);<br />
dup.SystemFlag=DO_INSERT;<br />
Send(t, ψ(t));<br />
// for the MERGE<br />
t.SystemFlag=DO_DELETE;<br />
return (t);<br />
} else do as usual<br />
26-Aug-13 DEXA 2013 18
Current results<br />
Implemented<br />
To Do<br />
Source code size:<br />
5K lines<br />
26-Aug-13 DEXA 2013 19
Experiments<br />
R ⋈ S<br />
|R|=6∙10 7<br />
|S|=1.5∙10 6<br />
26-Aug-13 DEXA 2013 20
Related work<br />
<br />
ParGRES is a middleware<br />
over cluster of <strong>PostgreSQL</strong><br />
DBMSes to process OLAP<br />
queries.<br />
<br />
Paes M., Lima A.A., Valduriez P., Mattoso M. High-Performance Query<br />
Processing of a Real-World OLAP Database with ParGRES // High Performance<br />
Computing for Computational Science - VECPAR 2008: 8th International<br />
Conference, Toulouse, France, June 24-27, 2008. LNCS. Vol. 5336. P. 188-200.<br />
26-Aug-13 DEXA 2013 21
Conclusion<br />
Design and implementation of PargreSQL parallel DBMS for<br />
cluster systems has been presented.<br />
PargreSQL is based upon <strong>PostgreSQL</strong> open-source DBMS<br />
and exploits partitioned parallelism.<br />
This approach is applicable to other open-source relational<br />
DBMSes (e.g. MySQL).<br />
26-Aug-13 DEXA 2013 22
Thank you for paying attention!<br />
Questions?<br />
• Mikhail Zymbler<br />
zymbler@gmail.com<br />
More info<br />
• http://supercomputer.susu.ac.ru/en/<br />
• http://omega.susu.ru<br />
26-Aug-13 DEXA 2013 23