21.05.2014 Views

On snakes and elephants - Using Python with and in ... - PGCon

On snakes and elephants - Using Python with and in ... - PGCon

On snakes and elephants - Using Python with and in ... - PGCon

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong><br />

<strong>Us<strong>in</strong>g</strong> <strong>Python</strong> <strong>with</strong> <strong>and</strong> <strong>in</strong> PostgreSQL<br />

Jan Urbański<br />

j.urbanski@wulczer.org<br />

Ducksboard<br />

<strong>PGCon</strong> 2012, Ottawa, May 18<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 1 / 64


For those follow<strong>in</strong>g at home<br />

Gett<strong>in</strong>g the slides<br />

$ wget http://wulczer.org/<strong>snakes</strong>-<strong>and</strong>-<strong>elephants</strong>.pdf<br />

Try the code<br />

$ mkvirutalenv pgcon<br />

$ pip <strong>in</strong>stall psycopg2 ipython requests<br />

$ createdb pgcon<br />

$ psql pgcon -c ’create extension plpythonu’<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 2 / 64


1 The language<br />

A quick glance<br />

Choos<strong>in</strong>g the version<br />

2 Drivers<br />

DB API 2.0<br />

Overview of exist<strong>in</strong>g drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

3 ORMs<br />

Why would you even want one?<br />

Django ORM<br />

SQLAlchemy<br />

4 PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

Best practices<br />

Tricks, skulduggery <strong>and</strong> black magic<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 3 / 64


Outl<strong>in</strong>e<br />

The language<br />

A quick glance<br />

1 The language<br />

A quick glance<br />

Choos<strong>in</strong>g the version<br />

2 Drivers<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 4 / 64


The language<br />

The language<br />

A quick glance<br />

What is <strong>Python</strong>?<br />

<strong>Python</strong> is an old, bor<strong>in</strong>g, enterprise technology.<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 5 / 64


The language<br />

The language<br />

A quick glance<br />

What is <strong>Python</strong>?<br />

<strong>Python</strong> is an old, bor<strong>in</strong>g, enterprise technology.<br />

◮ Java (released 1995), PHP (released 1995), Postgres95 (released<br />

1995), <strong>Python</strong> (released 1991)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 5 / 64


The language<br />

The language<br />

A quick glance<br />

What is <strong>Python</strong>?<br />

<strong>Python</strong> is an old, bor<strong>in</strong>g, enterprise technology.<br />

◮ Java (released 1995), PHP (released 1995), Postgres95 (released<br />

1995), <strong>Python</strong> (released 1991)<br />

◮ there should be one – <strong>and</strong> preferably only one – obvious way to do it<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 5 / 64


The language<br />

The language<br />

A quick glance<br />

What is <strong>Python</strong>?<br />

<strong>Python</strong> is an old, bor<strong>in</strong>g, enterprise technology.<br />

◮ Java (released 1995), PHP (released 1995), Postgres95 (released<br />

1995), <strong>Python</strong> (released 1991)<br />

◮ there should be one – <strong>and</strong> preferably only one – obvious way to do it<br />

◮ used at Google, the NASA <strong>and</strong> everywhere from web development<br />

through scientific software to system adm<strong>in</strong>istration tools<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 5 / 64


Problems <strong>with</strong> <strong>Python</strong><br />

The language<br />

A quick glance<br />

◮ multiple exist<strong>in</strong>g Postgres drivers make it difficult to decide which one<br />

to use <strong>in</strong> a new project<br />

◮ the same goes for ORMs <strong>and</strong> other higher-level libraries<br />

◮ the <strong>Python</strong> 2 vs <strong>Python</strong> 3 mess<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 6 / 64


Outl<strong>in</strong>e<br />

The language<br />

Choos<strong>in</strong>g the version<br />

1 The language<br />

A quick glance<br />

Choos<strong>in</strong>g the version<br />

2 Drivers<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 7 / 64


The language<br />

Choos<strong>in</strong>g the version<br />

Use <strong>Python</strong> 2.7<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 8 / 64


<strong>Python</strong> 2 or <strong>Python</strong> 3<br />

The language<br />

Choos<strong>in</strong>g the version<br />

◮ even though <strong>Python</strong> 3 is the future, the future is still not here<br />

◮ many resources, libraries <strong>and</strong> tutorials work <strong>with</strong> <strong>Python</strong> 2 only<br />

◮ follow some basic rules to make your future transition pa<strong>in</strong>less<br />

◮ make sure you know when you’re deal<strong>in</strong>g <strong>with</strong> characters <strong>and</strong> when<br />

you’re deal<strong>in</strong>g <strong>with</strong> bytes<br />

◮ don’t use deprecated syntax - it’s usually much uglier, anyway!<br />

◮ use a recent version of <strong>Python</strong> 2<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 9 / 64


Outl<strong>in</strong>e<br />

Drivers DB API 2.0<br />

1 The language<br />

2 Drivers<br />

DB API 2.0<br />

Overview of exist<strong>in</strong>g drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 10 / 64


What is DB API 2.0<br />

Drivers DB API 2.0<br />

◮ a common API for <strong>Python</strong> database drivers, also known as PEP 249<br />

◮ been around s<strong>in</strong>ce 2001, <strong>with</strong> small modifications s<strong>in</strong>ce then<br />

◮ a bare bones specification, the lowest common denom<strong>in</strong>ator of<br />

database APIs<br />

◮ many drivers provide their own extensions<br />

◮ even though DB people tend to hate it, it has proven useful over the<br />

years<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 11 / 64


Outl<strong>in</strong>e<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

1 The language<br />

2 Drivers<br />

DB API 2.0<br />

Overview of exist<strong>in</strong>g drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 12 / 64


Quite a few drivers<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ pyPgSQL<br />

◮ PyGreSQL<br />

◮ bpgsql<br />

◮ ocpgdb<br />

◮ pgasync<br />

◮ pg8000<br />

◮ py-postgresql<br />

◮ psycopg2<br />

◮ ... <strong>and</strong> the fun cont<strong>in</strong>ues!<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 13 / 64


Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

Use Psycopg2<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 14 / 64


Driver categories<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ C wrappers around libpq<br />

◮ fast<br />

◮ features like PGPORT, .pgpass, PGSERVICE, SSL modes all work out<br />

of the box<br />

◮ one less th<strong>in</strong>g to get wrong<br />

◮ FEBE protocol implementations <strong>in</strong> <strong>Python</strong><br />

◮ work <strong>with</strong> other <strong>Python</strong> <strong>in</strong>terpreters, like PyPy or Jython<br />

◮ can give much tighter control over the communication <strong>with</strong> the<br />

backend<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 15 / 64


pyPgSQL<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ libpq wrapper<br />

◮ provides a DB API 2.0 compatible module<br />

◮ last release was <strong>in</strong> 2006<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 16 / 64


PyGreSQL<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ libpq wrapper<br />

◮ provides both its own <strong>in</strong>terface <strong>and</strong> a DB API 2.0 compatible module<br />

◮ last release was <strong>in</strong> 2009, but the project seems to be alive<br />

◮ supports most PostgreSQL features when us<strong>in</strong>g its own <strong>in</strong>terface (but<br />

for <strong>in</strong>stance server-side cursors are unsupported)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 17 / 64


pgsql<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ pure <strong>Python</strong> implementation<br />

◮ provides most of the DB API 2.0 <strong>in</strong>terface<br />

◮ last activity was <strong>in</strong> 2009<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 18 / 64


ocpgdb<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ a wrapper around ODBC<br />

◮ last project activity was <strong>in</strong> February 2012<br />

◮ use this is you’re married to ODBC somehow<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 19 / 64


pgasync<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ pure <strong>Python</strong> implementation as a Twisted protocol<br />

◮ last release was <strong>in</strong> 2005<br />

◮ <strong>in</strong>terest<strong>in</strong>g because it’s one of the few drivers that has well <strong>in</strong>tegrated<br />

asynchronous features<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 20 / 64


pg8000<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ pure <strong>Python</strong> implementation<br />

◮ provides a DB API 2.0 <strong>in</strong>terface, <strong>with</strong> some extensions (but for<br />

<strong>in</strong>stance large objects <strong>and</strong> server-side cursors are unsupported)<br />

◮ actively ma<strong>in</strong>ta<strong>in</strong>ed<br />

◮ might be useful if you can’t depend on C <strong>Python</strong> extensions or want<br />

to use an <strong>in</strong>terpreter that does not support them, like PyPy or Jython)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 21 / 64


py-postgresql<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ pure <strong>Python</strong> implementation, <strong>with</strong> optional optimisations <strong>in</strong> C<br />

◮ provides its own <strong>in</strong>terface as well as a DB API 2.0 compatible module<br />

◮ actively ma<strong>in</strong>ta<strong>in</strong>ed<br />

◮ quite featureful, go<strong>in</strong>g beyond simple query<strong>in</strong>g - supports COPY,<br />

LISTEN/NOTIFY, server-side cursors, advisory locks, etc<br />

◮ <strong>Python</strong> 3 only<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 22 / 64


psycopg2<br />

Drivers<br />

Overview of exist<strong>in</strong>g drivers<br />

◮ libpq wrapper<br />

◮ based on DB API 2.0 <strong>with</strong> own extensions for Postgres-specific th<strong>in</strong>gs<br />

◮ actively ma<strong>in</strong>ta<strong>in</strong>ed<br />

◮ big community, widespread usage, quite featureful<br />

◮ runs on <strong>Python</strong> 2 <strong>and</strong> 3<br />

◮ well-def<strong>in</strong>ed thread<strong>in</strong>g behaviour<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 23 / 64


Outl<strong>in</strong>e<br />

Drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

1 The language<br />

2 Drivers<br />

DB API 2.0<br />

Overview of exist<strong>in</strong>g drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 24 / 64


Drivers<br />

DB API 2.0 usage example<br />

Psycopg2 features <strong>and</strong> examples<br />

Short example<br />

import psycopg2<br />

conn = psycopg2.connect(’dbname=schemaverse sslmode=require’)<br />

cur = conn.cursor()<br />

cur.execute(’select attack, defense from my_ships’)<br />

for row <strong>in</strong> cur.fetchall():<br />

pr<strong>in</strong>t(’total power %s’ % row[0] + row[1])<br />

conn.close()<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 25 / 64


Parameter pass<strong>in</strong>g<br />

Drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

◮ placeholders <strong>in</strong> query str<strong>in</strong>gs are substituted for parameters<br />

◮ the only conversion argument <strong>in</strong> use is %s<br />

◮ supports both positional <strong>and</strong> named parameter pass<strong>in</strong>g, but not mixed<br />

◮ the arguments should always be a sequence, even if there’s only one!<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 26 / 64


Drivers<br />

Parameter pass<strong>in</strong>g examples<br />

Psycopg2 features <strong>and</strong> examples<br />

Parameter pass<strong>in</strong>g<br />

cur.execute("select name from persons where last = %s",<br />

("O’Hara", ))<br />

cur.execute("update persons set last = %(prefix)s "<br />

"|| ’ ’ || %(last)s where surname = %(last)s",<br />

{"last": "O’Hara", "prefix": "Ms"})<br />

cur.mogrify(’select %s + %s’, (1, 2))<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 27 / 64


Drivers<br />

Advanced psycopg2 features<br />

Psycopg2 features <strong>and</strong> examples<br />

◮ <strong>Python</strong> lists are transformed <strong>in</strong>to Postgres arrays, <strong>Python</strong> tuples <strong>in</strong>to<br />

constructs suitable for usage <strong>with</strong> IN<br />

◮ you can register your own typecasters for <strong>Python</strong> classes <strong>and</strong> for<br />

Postgres types, some are provided out of the box:<br />

◮ hstore<br />

◮ composite types<br />

◮ UUID<br />

◮ INET<br />

◮ by provid<strong>in</strong>g custom cursor classes you can hook <strong>in</strong>to the query<strong>in</strong>g<br />

process<br />

◮ provide asynchronous <strong>in</strong>terface <strong>and</strong> <strong>in</strong>tegration <strong>with</strong> many async<br />

frameworks<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 28 / 64


Drivers<br />

Custom cursor class example<br />

Psycopg2 features <strong>and</strong> examples<br />

<strong>Us<strong>in</strong>g</strong> a custom cursor<br />

import logg<strong>in</strong>g<br />

from psycopg2 import extensions<br />

log = logg<strong>in</strong>g.getLogger(’queries’)<br />

class Logg<strong>in</strong>gCursor(extensions.cursor):<br />

def execute(self, query, vars=None):<br />

log.<strong>in</strong>fo(’%s’, self.mogrify(query, vars))<br />

return extensions.cursor.execute(self, query, vars)<br />

cur = conn.cursor(cursor_factory=Logg<strong>in</strong>gCursor)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 29 / 64


Drivers<br />

Custom typecaster example<br />

Psycopg2 features <strong>and</strong> examples<br />

Automatic hstore typecast<strong>in</strong>g<br />

from psycopg2 import extras<br />

extras.register_hstore(None, globally=True, oid=16829)<br />

cur.execute(’select %s || %s’, ({’key1’: ’val1’},<br />

{’key2’: ’val2’}))<br />

INFO:queries:select hstore(ARRAY[’key1’], ARRAY[’val1’]) ||<br />

hstore(ARRAY[’key2’], ARRAY[’val2’])<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 30 / 64


Drivers<br />

Asynchronous usage example<br />

Psycopg2 features <strong>and</strong> examples<br />

Asynchronous notifies<br />

import psycopg2<br />

from psycopg2.extras import wait_select<br />

conn = psycopg2.connect(’dbname=schemaverse sslmode=require’,<br />

async=1)<br />

wait_select(conn)<br />

cur = conn.cursor()<br />

cur.execute(’LISTEN error_channel’)<br />

wait_select(conn)<br />

while 1:<br />

wait_select(conn)<br />

while conn.notifies:<br />

pr<strong>in</strong>t("notify: %s" % conn.notifies.pop().payload)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 31 / 64


Drivers<br />

Asynchronous usage example cd.<br />

Asynchronous <strong>in</strong>ternals<br />

Psycopg2 features <strong>and</strong> examples<br />

import select<br />

from psycopg2 import extensions<br />

def wait_select(conn):<br />

while 1:<br />

state = conn.poll()<br />

if state == extensions.POLL_OK:<br />

break<br />

elif state == extensions.POLL_READ:<br />

select.select([conn.fileno()], [], [])<br />

elif state == extensions.POLL_WRITE:<br />

select.select([], [conn.fileno()], [])<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 32 / 64


Transaction management<br />

Drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

◮ transactions are started automatically, but you need to commit them<br />

manually<br />

◮ you can change that by sett<strong>in</strong>g connection.autocommit = True<br />

◮ always use connection.commit() or .rollback(), otherwise you<br />

might get open transactions left around<br />

◮ use a recent version, lots of wr<strong>in</strong>kles have been smoothed lately<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 33 / 64


Miss<strong>in</strong>g features<br />

Drivers<br />

Psycopg2 features <strong>and</strong> examples<br />

◮ PQexecParams support (marooned on discussions about the <strong>in</strong>terface)<br />

◮ easy access to prepared statements (connected <strong>with</strong> the above)<br />

◮ asynchronous support for COPY<br />

◮ quot<strong>in</strong>g of identifiers<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 34 / 64


Outl<strong>in</strong>e<br />

ORMs<br />

Why would you even want one?<br />

1 The language<br />

2 Drivers<br />

3 ORMs<br />

Why would you even want one?<br />

Django ORM<br />

SQLAlchemy<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 35 / 64


Reasons to use an ORM<br />

ORMs<br />

Why would you even want one?<br />

◮ keep<strong>in</strong>g the whole application <strong>in</strong> one language<br />

◮ application developers don’t know SQL well enough<br />

◮ for simple cases it can speed up development <strong>and</strong> help <strong>with</strong> migrat<strong>in</strong>g<br />

between schema versions<br />

◮ your PM doesn’t ask whether to use one, she asks which one to use<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 36 / 64


ORMs<br />

Why would you even want one?<br />

Use SQLAlchemy<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 37 / 64


Outl<strong>in</strong>e<br />

ORMs<br />

Django ORM<br />

1 The language<br />

2 Drivers<br />

3 ORMs<br />

Why would you even want one?<br />

Django ORM<br />

SQLAlchemy<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 38 / 64


ORMs<br />

Django ORM<br />

Characteristics of the Django ORM<br />

◮ wildly popular, most people will end up hav<strong>in</strong>g contact <strong>with</strong> it<br />

◮ provides a h<strong>and</strong>y adm<strong>in</strong>istration tool if you use it across the board<br />

◮ <strong>in</strong>tegrates well <strong>with</strong> the rest of the framework, mak<strong>in</strong>g th<strong>in</strong>gs work<br />

seamlessly (until they break)<br />

◮ has a bunch of shortcom<strong>in</strong>gs<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 39 / 64


ORMs<br />

Django ORM<br />

Shortcom<strong>in</strong>gs of the Django ORM<br />

◮ no support for multiple column primary keys<br />

◮ little control over generated queries<br />

◮ creates <strong>and</strong> tears down connections for every request<br />

◮ no support for Postgres-specific datatypes<br />

◮ no support for stored procedures<br />

◮ for a long time some Django <strong>and</strong> Postgres comb<strong>in</strong>ations would leave<br />

open transactions hang<strong>in</strong>g<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 40 / 64


ORMs<br />

Cop<strong>in</strong>g <strong>with</strong> the Django ORM<br />

Django ORM<br />

◮ use a connection pooler<br />

◮ monitor long-runn<strong>in</strong>g transactions, upgrade to the latest version of<br />

Django <strong>and</strong> psycopg2<br />

◮ use psycopg2!<br />

◮ be careful to assess the trade-offs of us<strong>in</strong>g Django <strong>with</strong>out its ORM<br />

◮ the adm<strong>in</strong> won’t work<br />

◮ authentication is tied to the Django User model, so it won’t work out<br />

of the box<br />

◮ expect to jump through some hoops, but it’s doable<br />

◮ for simple CRUD applications, the Django ORM can actually be very<br />

useful (th<strong>in</strong>k 80/20)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 41 / 64


Outl<strong>in</strong>e<br />

ORMs<br />

SQLAlchemy<br />

1 The language<br />

2 Drivers<br />

3 ORMs<br />

Why would you even want one?<br />

Django ORM<br />

SQLAlchemy<br />

4 PL/<strong>Python</strong><br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 42 / 64


Overview of SQLAlchemy<br />

ORMs<br />

SQLAlchemy<br />

◮ not just an ORM, but a complete SQL toolkit <strong>in</strong> <strong>Python</strong><br />

◮ comprises two ma<strong>in</strong> parts, the ORM <strong>and</strong> the Expression Language<br />

◮ if you dig long enough, you can do almost everyth<strong>in</strong>g<br />

◮ a complex beast, but well worth tam<strong>in</strong>g<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 43 / 64


ORM done right<br />

ORMs<br />

SQLAlchemy<br />

◮ def<strong>in</strong>e your models, tables <strong>and</strong> mapp<strong>in</strong>gs separately<br />

◮ models are Pla<strong>in</strong> Old <strong>Python</strong> objects, you can use them <strong>with</strong>out<br />

mapp<strong>in</strong>g them to database entities<br />

◮ you can also use the Declarative mode, where you def<strong>in</strong>e both the<br />

application objects <strong>and</strong> the database entities <strong>in</strong> one go<br />

◮ supports:<br />

◮<br />

◮<br />

◮<br />

◮<br />

◮<br />

CHECK constra<strong>in</strong>ts<br />

database-specific types <strong>and</strong> operators<br />

cascad<strong>in</strong>g updates <strong>and</strong> deletes<br />

schemas<br />

... are you drool<strong>in</strong>g yet?<br />

◮ it strives to get the “last 20%” right<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 44 / 64


ORMs<br />

Mapper vs declarative style<br />

SQLAlchemy<br />

Declare the object<br />

class Car(object):<br />

def __<strong>in</strong>it__(self, plate_no, make, price):<br />

self.plate_no = plate_no<br />

self.make = make<br />

self.price = price<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 45 / 64


ORMs<br />

Mapper vs declarative style<br />

SQLAlchemy<br />

Declare the mapp<strong>in</strong>g<br />

import sqlalchemy as sa<br />

from sqlalchemy import orm<br />

metadata = sa.MetaData()<br />

car = sa.Table(<br />

’car’, metadata,<br />

sa.Column(’plate_no’, sa.Unicode(length=6),<br />

primary_key=True),<br />

sa.Column(’make’, sa.UnicodeText(),<br />

sa.ForeignKey(’car_makes’, deferrable=True)),<br />

sa.Column(’price’, sa.Numeric, nullable=False))<br />

orm.mapper(Car, car)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 46 / 64


ORMs<br />

Mapper vs declarative style cd.<br />

Declarative style<br />

SQLAlchemy<br />

import sqlalchemy as sa<br />

from sqlalchemy.ext import declarative<br />

Base = declarative.declarative_base()<br />

class Car(Base):<br />

__tablename__ = ’car’<br />

plate_no = sa.Column(sa.Unicode(length=6),<br />

primary_key=True)<br />

make = sa.Column(sa.UnicodeText(),<br />

sa.ForeignKey(’car_makes’,<br />

deferrable=True))<br />

price = sa.Column(sa.Numeric, nullable=False)<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 47 / 64


It’s all SQL<br />

ORMs<br />

SQLAlchemy<br />

Declarative style<br />

from sqlalchemy import schema<br />

pr<strong>in</strong>t(schema.CreateTable(car).compile())<br />

"""<br />

CREATE TABLE car (<br />

plate_no VARCHAR(6) NOT NULL,<br />

make TEXT,<br />

price NUMERIC NOT NULL,<br />

PRIMARY KEY (plate_no),<br />

FOREIGN KEY(make) REFERENCES car_makes (make)<br />

DEFERRABLE<br />

)<br />

"""<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 48 / 64


Expression language<br />

ORMs<br />

SQLAlchemy<br />

◮ either let the ORM h<strong>and</strong>le the query<strong>in</strong>g or construct expressions by<br />

h<strong>and</strong><br />

◮ all SQL constructs are supported, but s<strong>in</strong>ce it’s <strong>Python</strong> it’s more<br />

composable<br />

◮ allows you to rewrite places where the ORM does it wrong <strong>in</strong><br />

someth<strong>in</strong>g that feels like SQL<br />

◮ if there’s an SQL that can’t be generated, it’s a bug!<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 49 / 64


ORMs<br />

Expression language example<br />

SQLAlchemy<br />

Construct<strong>in</strong>g an expression<br />

exp = (car.update()<br />

.where(sa.sql.func.lower(car.c.make) == ’chevy’)<br />

.values(price=car.c.price + 2))<br />

pr<strong>in</strong>t exp<br />

"""<br />

UPDATE car SET price=(car.price + :price_1)<br />

WHERE lower(car.make) = :lower_1<br />

"""<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 50 / 64


ORMs<br />

Advantages over pla<strong>in</strong> SQL<br />

SQLAlchemy<br />

◮ more modular, allow<strong>in</strong>g for code reuse <strong>and</strong> delegation of concerns<br />

◮ have one part of the code generate the update values <strong>and</strong> other the<br />

clauses<br />

◮ document <strong>and</strong> test us<strong>in</strong>g st<strong>and</strong>ard <strong>Python</strong> approaches<br />

◮ use SQLAlchemy as a query generation backend <strong>and</strong> send literal SQL<br />

somewhere else for execution<br />

◮ put logic <strong>in</strong> mapped objects to save code<br />

◮ transparently encrypt a column us<strong>in</strong>g pg crypto<br />

◮ add generated columns to the objects, like first name || last name<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 51 / 64


Sessions <strong>and</strong> connections<br />

ORMs<br />

SQLAlchemy<br />

◮ all access to the database is done through a Session<br />

◮ a Session keeps track of the objects that got modified until flushed<br />

<strong>and</strong> committed or rolled back<br />

◮ Sessions check out <strong>and</strong> return database connections from a built-<strong>in</strong><br />

pool<br />

◮ you can keep Session objects around to ma<strong>in</strong>ta<strong>in</strong> long-lived<br />

connections, but remember to always close them after you’re done<br />

◮ typically, one Session per WSGI process<br />

◮ use scoped session to have a thread-local Session that can be shared<br />

◮ ensure the thread’s session is closed after request is done, for <strong>in</strong>stance<br />

us<strong>in</strong>g a middleware<br />

◮ clos<strong>in</strong>g a session does not close the underly<strong>in</strong>g connection!<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 52 / 64


Outl<strong>in</strong>e<br />

PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

1 The language<br />

2 Drivers<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

Best practices<br />

Tricks, skulduggery <strong>and</strong> black magic<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 53 / 64


What’s PL/<strong>Python</strong><br />

PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

◮ the ability to run a <strong>Python</strong> <strong>in</strong>terpreter <strong>in</strong>side the backend<br />

◮ runs as the backend’s OS user, so untrusted<br />

◮ can run arbitrary <strong>Python</strong> code, <strong>in</strong>clud<strong>in</strong>g do<strong>in</strong>g very nasty or really<br />

crazy th<strong>in</strong>gs<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 54 / 64


What’s PL/<strong>Python</strong><br />

PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

◮ the ability to run a <strong>Python</strong> <strong>in</strong>terpreter <strong>in</strong>side the backend<br />

◮ runs as the backend’s OS user, so untrusted<br />

◮ can run arbitrary <strong>Python</strong> code, <strong>in</strong>clud<strong>in</strong>g do<strong>in</strong>g very nasty or really<br />

crazy th<strong>in</strong>gs<br />

◮ but that’s the fun of it!<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 54 / 64


How does it work<br />

PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

◮ the first time a PL/<strong>Python</strong> function is run, a <strong>Python</strong> <strong>in</strong>terpreter is<br />

<strong>in</strong>itialised <strong>in</strong>side the backend process<br />

◮ preload plpython.so to avoid the <strong>in</strong>itial slowdown<br />

◮ use long-lived connections to only pay the overhead once<br />

◮ Postgres types are transformed <strong>in</strong>to <strong>Python</strong> types <strong>and</strong> vice versa<br />

◮ only works for built-<strong>in</strong> types, the rest gets passed us<strong>in</strong>g the str<strong>in</strong>g<br />

representation<br />

◮ there’s an extension module to parse hstore’s to <strong>and</strong> from <strong>Python</strong> dicts<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 55 / 64


How does it work cd.<br />

PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

◮ function arguments are visible as global variables<br />

◮ the function has access to various magic globals that describe the<br />

execution environment<br />

◮ the plpy module conta<strong>in</strong><strong>in</strong>g SPI <strong>and</strong> utility functions<br />

◮ a dictionary <strong>with</strong> the old <strong>and</strong> new tuples if called as a trigger<br />

◮ dictionaries kept <strong>in</strong> memory between queries, useful for caches<br />

◮ the module path depends on the postmaster’s PYTHONPATH<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 56 / 64


Outl<strong>in</strong>e<br />

PL/<strong>Python</strong><br />

Best practices<br />

1 The language<br />

2 Drivers<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

Best practices<br />

Tricks, skulduggery <strong>and</strong> black magic<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 57 / 64


PL/<strong>Python</strong><br />

Organis<strong>in</strong>g PL/<strong>Python</strong> code<br />

Best practices<br />

◮ keep your PL/<strong>Python</strong> code <strong>in</strong> a module<br />

◮ make all your SQL functions two-l<strong>in</strong>ers<br />

CREATE FUNCTION the_func(arg1 text, arg2 text)<br />

RETURNS INTEGER as $$<br />

from myapp.plpython import functions<br />

return functions.the_func(locals())<br />

$$ LANGUAGE plpythonu;<br />

◮ test the <strong>Python</strong> code by mock<strong>in</strong>g out magic variables<br />

◮ it’s a sharp tool, be careful<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 58 / 64


Outl<strong>in</strong>e<br />

PL/<strong>Python</strong><br />

Tricks, skulduggery <strong>and</strong> black magic<br />

1 The language<br />

2 Drivers<br />

3 ORMs<br />

4 PL/<strong>Python</strong><br />

Use <strong>Python</strong> straight from the database<br />

Best practices<br />

Tricks, skulduggery <strong>and</strong> black magic<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 59 / 64


PL/<strong>Python</strong><br />

Ideas for us<strong>in</strong>g PL/<strong>Python</strong><br />

Tricks, skulduggery <strong>and</strong> black magic<br />

◮ do<strong>in</strong>g numerical computations <strong>in</strong> the database <strong>with</strong> NumPy<br />

◮ writ<strong>in</strong>g a constra<strong>in</strong>t that checks if a column conta<strong>in</strong>s JSON<br />

◮<br />

◮<br />

or a protobuf stream<br />

or a PNG image<br />

◮ connect<strong>in</strong>g to other Postgres <strong>in</strong>stances <strong>and</strong> do<strong>in</strong>g th<strong>in</strong>gs to them<br />

◮ communicat<strong>in</strong>g <strong>with</strong> external services to <strong>in</strong>validate caches or trigger<br />

actions<br />

◮ check<strong>in</strong>g if an email field’s doma<strong>in</strong> has a valid MX record<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 60 / 64


PL/<strong>Python</strong> examples<br />

PL/<strong>Python</strong><br />

Tricks, skulduggery <strong>and</strong> black magic<br />

<strong>Us<strong>in</strong>g</strong> <strong>Python</strong> modules<br />

CREATE FUNCTION f<strong>in</strong>d_extension(extname TEXT)<br />

RETURNS TEXT[] as $$<br />

import difflib<br />

sql = ’select name from pg_available_extensions’<br />

result = plpy.execute(sql)<br />

names = [extension[’name’] for extension <strong>in</strong> result]<br />

return difflib.get_close_matches(extname, names)<br />

$$ LANGUAGE plpythonu;<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 61 / 64


PL/<strong>Python</strong> examples<br />

PL/<strong>Python</strong><br />

Tricks, skulduggery <strong>and</strong> black magic<br />

<strong>Us<strong>in</strong>g</strong> <strong>Python</strong> modules<br />

CREATE FUNCTION check_mx()<br />

RETURNS TRIGGER as $$<br />

from dns import resolver<br />

doma<strong>in</strong> = TD[’new’][’email’].split(’@’, 1)[1]<br />

try:<br />

resolver.query(doma<strong>in</strong>, ’MX’)<br />

except resolver.NoAnswer:<br />

plpy.error(’no MX record for doma<strong>in</strong> %s’ % doma<strong>in</strong>)<br />

$$ LANGUAGE plpythonu;<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 62 / 64


PL/<strong>Python</strong> examples<br />

PL/<strong>Python</strong><br />

Tricks, skulduggery <strong>and</strong> black magic<br />

<strong>PGCon</strong> schedule<br />

CREATE FUNCTION schedule(summary OUT TEXT,<br />

location OUT TEXT,<br />

start OUT TIMESTAMPTZ)<br />

RETURNS SETOF RECORD as $$<br />

import icalendar, requests<br />

resp = requests.get(GD[’url’])<br />

cal = icalendar.Calendar.from_ical(resp.content)<br />

for event <strong>in</strong> cal.walk(’VEVENT’):<br />

yield (event[’SUMMARY’], event[’LOCATION’],<br />

event[’DTSTART’].dt.isoformat())<br />

$$ LANGUAGE plpythonu;<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 63 / 64


PL/<strong>Python</strong><br />

Questions<br />

Questions?<br />

Jan Urbański (Ducksboard) <strong>On</strong> <strong>snakes</strong> <strong>and</strong> <strong>elephants</strong> <strong>PGCon</strong> 2012 64 / 64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!