by James Armes (they/them)
I was recently working on a database migration from
Amazon Web Services
GovCloud to AWS Commercial. We had a production database that was initially launched in GovCloud despite not being a FISMA Federal Information Security Management Act
High workload. Other pieces of the stack had already been migrated, and the database, the most difficult piece to move, was the last piece remaining.
Our options were more limited as we were going across partitions and AWS services can’t communicate across partitions. This includes IAM trust relationships. Both the source and destination were RDS Postgres clusters.
After reviewing a number of options, replication using pglogical was an easy choice. To minimize downtime and allow for easier rollback, we opted to use bidirectional replication. This is where each cluster replicates changes in both directions. In our case, the GovCloud cluster would replicate changes to the Commercial cluster, and vice-versa.
Replication is the process of creating and maintaining one or more copies of a database. These copies are kept in sync with each other. Replication can be used to achieve high availability, improve performance, provide backups, allow for faster disaster recovery, and enable geo-distribution.
There are different types of database replication, including:
Unidirectional replication creates copies of a single, primary database to one or more replicas. These replicas could be used as read-only servers to help distribute database load, or one of them could be promoted if the primary goes down or requires maintenance. Unidirectional replication is the most common replication configuration.
Bidirectional replication creates copies in each direction. Each database is replicated to the others and vice-versa. This can create conflicts, especially if one or more nodes fall behind. Bidirectional replication is useful for geo-distribution, improved write performance by distributing queries, and migrations where you need the ability to quickly roll back without data loss.
The total number of nodes that you’ll need in your database cluster depends on your needs, and this is further complicated by DBaaS Database-as-a-Service providers that often have their own replicas. In fact, if you’re using a DBaaS provider, you may not need to configure replication unless you’re looking to migrate your database.
In general, the recommended number of nodes for ongoing replication is an odd number of three or more. The importance of an odd number is that clusters can participate in elections, where each node gets a vote. An odd number of nodes helps to avoid a tie that could result in data loss. These elections can be initiated in the case of a conflict during replication, when determining if a node is down, and promoting a node to primary.
pglogical is a PostgreSQL extension that provides logical replication capabilities. It supports both unidirectional, and bidirectional replication, as well as replication between different versions of PostgreSQL. This makes it a great option for database migrations.
There are some limitations to replication with pglogical (not an exhaustive list):
To set up replication using pglogical, you first need to install the pglogical extension on all databases that’ll be participating in replication. You should be able to find pglogical (it may be listed as “pg_logical”) in your systems package manager. For Ubuntu, you can use the following command to install pglogical for PostgreSQL 14:
apt install -y postgresql-14-pglogical
Once installed, you’ll need to configure PostgreSQL to load the extension and use logical replication. This can be done by configuring the following settings in your postgresql.conf:
wal_level = logical shared_preload_libraries = 'pglogical'
You’ll also need to allow scram-sha-256 authentication. Add the following to your pg_hba.conf:
host all all all scram-sha-256
After making these changes, you’ll need to restart PostgreSQL before they’ll be applied.
For RDS, you can modify these settings in the database parameter group.
rds.logical_replication=1 shared_preload_libraries = 'pglogical'
Before we jump into bidirectional replication, let’s get unidirectional replication setup. Since we’re trying to migrate a database, we’ll setup this up on two nodes called “source” and “destination.”
You will need to create a replication user on each node and that user will require the superuser privileges. For this example, we’ll connect to each node using basic authentication (username and password) as this is the only option available on AWS without being able to use IAM roles.
Connect to the source database and load the extension:
CREATE EXTENSION pglogical;
Now add this node to pglogical. This will need to match the settings we use when subscribing from the destination.
BEGIN; SET LOCAL log_statement = 'none'; SET LOCAL log_min_duration_statement = -1; SELECT pglogical.create_node( node_name := 'source', dsn := 'host=source.example.com port=5432 sslmode=require dbname=databasename user=replication password=********' ); COMMIT;
When you load pglogical, it creates three replication sets:
We’re going to be using the default replication set, which should be sufficient for most situations. You can also define your own replication sets. See the pglogical documentation for more information on that.
We can add all existing tables to the default replication set with a single command, adding any schemas that you want to replicate to the array:
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
Our source node is now ready to replicate data to any subscribers! Before we’re done with the source, you’ll need the database schema to exist on the destination. If you already have a SQL file or some other way of building it on your destination, then great! If not, you can grab it from the source using pg_dump (make sure to add any necessary connection flags):
pg_dump --schema-only databasename > schema.sql
Okay, now we’re done with the source (for now). On the destination, start by adding the schema. If you used the pg_dump command above, you can feed that file to the psql client:
psql databasename < schema.sql
Now, let’s repeat the first few steps from source. We’re going to load the extension and add the node:
CREATE EXTENSION pglogical; BEGIN; SET LOCAL log_statement = 'none'; SET LOCAL log_min_duration_statement = -1; SELECT pglogical.create_node( node_name := 'destination', dsn := 'host=destination.example.com port=5432 sslmode=require dbname=databasename user=replication password=********' ); COMMIT;
Since we’re only setting up unidirectional replication at this point, we don’t need to setup any replication sets. We do, however, need to subscribe to the source database:
BEGIN; SET LOCAL log_statement = 'none'; SET LOCAL log_min_duration_statement = -1; SELECT pglogical.create_subscription( subscription_name := 'source', provider_dsn := 'host=source.example.com port=5432 sslmode=require dbname=databasename user=replication password=********' ); COMMIT;
Data should now begin replicating from the source to the destination. You can
monitor progress using the
Now that our destination has caught up, let’s start replicating data the other way. We’re going to be repeating some steps from above, just on different nodes.
To start, let’s add our tables from the destination to the default replication set (I did say “yet”). Once again, you can add all the schemas to be replicated to the array.
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
Now let’s hop over to the source database and subscribe to the destination:
BEGIN; SET LOCAL log_statement = 'none'; SET LOCAL log_min_duration_statement = -1; SELECT pglogical.create_subscription( subscription_name := 'destination', provider_dsn := 'host=destination.example.com port=5432 sslmode=require dbname=databasename user=replication password=********' ); COMMIT;
And that’s it! We’ve just setup bidirectional replication between two PostgreSQL databases. Like anything else though, it’s not quite that simple.
Sequences, sometimes called autoincrement columns, are commonly used to create unique ids as a primary key. On the backend, the database keeps track of the current value for the field. When a new record is added, it uses the latest available value and increments the counter.
This gets complicated when you have multiple places that writes can happen. Not only do new records need to be synced, but also the current state of the counter. Two records could be written to different nodes in close enough proximity that they end up with the same id.
pglogical handles sequences separate from tables. You need to explicitly add sequences to the replication set in addition to the tables. This can be done by adding all sequences at once:
SELECT pglogical.replication_set_add_all_sequences('default', ARRAY['public']);
When new nodes subscribe to the replication set, it will sync the sequences and create an offset for each node. This way, there won’t be any overlap. The sequences are then re-synced on a periodic basis.
Let’s say we have a table called people with columns id and name. id is a sequence column. This table is empty at the time that we setup replication, so the sequences have been synced before any data has been added.
If we create our first record in the source database, we’ll get something like the following:
If we then create our second record, but this time on the destination database, we get something similar to:
You can see that the record we added to the primary was replicated over and we have a new record with the id 1001. This gives us some breathing room before we would have to deal with a sequence collision.
A sequence is a database object in PostgreSQL that’s used to generate unique numeric identifiers. Sequences are often used to generate primary key values for tables. When you create a new table, you can specify that the primary key column should use a sequence for its default value.
If you want to manually trigger a sync of sequences, you can use the following to synchronize a single sequence:
Or all sequences:
SELECT pglogical.synchronize_sequence(seqoid) FROM pglogical.sequence_state;
DDL, or Data Definition Language, is a subset of SQL for creating and modifying objects in a database schema. This includes statements such as CREATE, ALTER, and DROP. These statements aren’t replicated by pglogical unless it is explicitly told to do so.
If you want to replicate a DDL statement, such as ALTER TABLE, you’ll need to do so using the replicate_ddl_command function. For example:
SELECT pglogical.replicate_ddl_command('ALTER TABLE public.people ADD COLUMN notes TEXT');
This will add the column to the table locally, then add it to the ddl_sql replication set for any subscribers.
In order to test these configurations, I put together a docker compose file. It launches two containers running PostgreSQL with pglogical installed and loaded. A container with pgAdmin, a web-based management interface, is launched and exposed over localhost port 8080.
I have made this available on GitHub for anyone who’d like to give it a try. The default credentials are documented in the README along with other details.tags: databases - docker - postgresql - replication