Brandon Konkle
Brandon Konkle

Principal Engineer, type system nerd, Rust enthusiast, supporter of social justice, loving husband & father, avid comic & manga reader, 日本語を勉強してる。

I’m a Software Architect with more than 15 years of experience creating high performance server and front-end applications targeting web and mobile platforms, & today I lead a team at Formidable Labs.



Postgres 9 Streaming Replication and Django-Balancer

Over the past couple of weeks I've had the opportunity to dig in to Postgres 9's streaming replication. What I found was a relatively easy way to configure a system for basic replication with very impressive speed. Postgres's streaming replication is an enhancement to the log-shipping warm standby that was available in previous versions. Instead of replaying write-ahead log (WAL) files as they become available on the slave's filesystem, the slave nodes connect directly to the master and stream the log as it generated. This results in extremely fast replication, typically under a second.

It can't match the cascading features of Slony, but if you're looking for a simple structure with a master database feeding one or more read-only slaves then Postgres 9 is a great choice.

Here's how I got everything working on Ubuntu 10.04.

UPDATE: I've changed the NFS details below to squash the user and group IDs and set the anonymous access user and group to the slave server's user and group IDs. This resolves issues where the postgres use has a different ID on the slave server than the master server, so files received from master are owned by the wrong user.

Postgres 9

Postgres 9 isn't included in the Lucid repos (or Maverick, for that matter), so you'll need to use a PPA. I used Martin Pitt's.

$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:pitti/postgresql
$ sudo apt-get update
$ sudo apt-get install postgresql postgresql-server-dev-9.0 postgresql-contrib-9.0

If you're using GeoDjango, you'll need to re-compile the requirements from source after 9.0 is installed. Make sure to use Postgis 1.5.2, which is the first release to support Postgres 9.

Because the default encoding is ASCII, you'll need to drop the cluster, and re-initialize it as UTF8.

$ sudo -u postgres pg_dropcluster --stop 9.0 main
$ sudo -u postgres pg_createcluster --start -e UTF-8 9.0 main

At this point, you can restore any database backups needed. If you'd like, you can begin accepting live queries - the base backup needed below can run while the database is still available.

NFS Sharing

I'm using log-shipping and streaming replication together, just in case something goes wrong with the direct connection. I use NFS to share a folder on the slave node and mount it on the master, so that the master can send WAL files to the slave.

On the Slave Server

$ sudo apt-get install nfs-kernel-server nfs-common portmap

Note: If you're on Rackspace, you'll have trouble with nfs-kernel-server. See my post here for details on how to work around it.

$ sudo dpkg-reconfigure portmap

Select 'no'.

$ sudo restart portmap
$ sudo emacs /etc/exports

Find out the user and group IDs for postgres. You'll use them in the next step.

$ id -u postgres
$ id -g postgres

Add a line like this:


Replace the CIDR notation with your own. Use a CIDR calculator if needed. Replace the anonuid and anongid with the appropriate values for your server.

Then, restart the server and export your shares. Make sure the path exists on the filesystem before doing so, and adjust the permissions as needed so that the path can be written to.

$ sudo /etc/init.d/nfs-kernel-server restart
$ sudo exportfs -a

On the Master Server

$ sudo apt-get install nfs-common portmap
$ sudo emacs /etc/fstab

Add a line like this:

slavehostname:/path/for/backups /path/for/mount nfs rsize=8192,wsize=8192,timeo=14,intr 0 0

Then mount it:

$ sudo mount /path/for/mount

Test the NFS connection by touching a new file on the primary machine, and making sure it can be seen on the standby machine.


On the Master Server

Edit postgresql.conf to enable WAL archiving:

wal_level = hot_standby

archive_mode = on

archive_command = 'cp -i %p /path/for/mount/%f </dev/null'

max_wal_senders = 1

Adjust max_wal_senders to the number of slave servers you plan on using.

Create a superuser for streaming replication:


Modify pg_hba.conf to allow the replication user to connect:

host    replication     myuser             md5

You'll need to restart Postgres at this point.

Base Backup

Next, a base backup needs to be created. This must be a filesystem-level backup, not a logical backup like the pg_dump command creates. On the master server, run the pg_start_backup command:

SELECT pg_start_backup('base_backup');

Then, tar and bzip the data directory:

$ cd /var/lib/postgresql/9.0/
$ sudo tar -cjf ~/postgres-data.tar.bz2 main

This will take awhile. Once it's done, run the pg_stop_backup command:

SELECT pg_stop_backup();

Then, copy the archive to the standby server. Make sure postgresql is not running on the standby, and unzip the archive:

$ cd /var/lib/postgresql/9.0/

# Make SURE you're on the standby server here
$ sudo rm -rf main
$ sudo tar -xjf ~/postgres-data.tar.bz2
$ sudo chown -R postgres:postgres main

On the Slave Server

Edit postgresql.conf to enable hot standby:

hot_standby = on

Create a recovery.conf in /var/lib/postgresql/9.0/main/ similar to this::

standby_mode = 'on'
primary_conninfo = 'host= port=5432 user=myuser password=mypassword'
restore_command = 'cp /path/for/backups/%f %p'
trigger_file = '/path/for/backups/trigger_file'
archive_cleanup_command = 'pg_archivecleanup /path/for/backups %r'

The postgresql-contrib-9.0 package doesn't link pg_archivecleanup into the path. You can do so manually:

$ sudo ln -s /usr/lib/postgresql/9.0/bin/pg_archivecleanup /usr/bin/

Smoke Test

To make sure everything is working properly, first check and make sure wal sender and receiver processes are live on the master and slave servers. On the primary, run ps -ef | grep "wal sender" and make sure you see a wal sender process. On the standby run ps -ef | grep "wal receiver".

Next, create a test table on the primary:

CREATE TABLE test (test varchar(30));
INSERT INTO test VALUES ('Testing 1 2 3');

On the standby, query the table:


You should see the following output:

 Testing 1 2 3
(1 row)

You can then delete your test table:


Load Balancing

For load balancing, I decided to leverage the excellent multidb capability introduced in Django 1.2. To do this, I created the django-balancer project. It builds on the example of database routers in the Django docs, a post on Eli Bendersky's blog, and Jeff Balogh's django-multidb-router project.

For my project, I decided on the WeightedMasterSlaveRouter. I want to include my master database in the read pool, but to a lesser extent than my slave database

I borrowed master-pinning from Jeff Balogh's project for django-balancer, but I ended up not needing it in my project. In my tests, I never had an issue with the databases being out of sync. The replication was really that fast! Even in situations where a user just created an object and was being rerouted to a rendered view of that object, the replication was speedy enough to keep up and prevent a 404.


Install with pip:

$ pip install django-balancer

Then, add the desired router to your DATABASE_ROUTERS setting. Also add any settings required by that router.

DATABASE_ROUTERS = ['balancer.routers.WeightedMasterSlaveRouter']
    'default': 1,
    'slave1': 2,

Make sure to configure your DATABASES setting to add the new slave database.

And with that, you should be all set!

Create Your Own

You can also use the classes and mixins inside the django-balancer project to create your own routers. If you come up with one that you'd like to share, fork me on Github. I'd be glad to pull it into the project.

I’m a Software Architect with more than 15 years of experience creating high performance server and front-end applications targeting web and mobile platforms, & today I lead a team at Formidable Labs.

View Comments