Lessons Learned from Running Debezium with PostgreSQL on Amazon RDS

https://debezium.io/blog/2020/02/25/lessons-learned-running-debezium-with-postgresql-on-rds/

We work in the logistics landscape and hence most of the software we write is focused around state - status changes of a shipment, tracking location updates, collecting real time data and reacting to it. The most common place where you might find "state" in any software architecture is the database. We maintain all of our transactional data primarily in a document database like MongoDB and in relational database systems (specifically PostgreSQL) for different services within the organisation. There is a need to allow efficient and near-real time analysis of the transactional data across all the different data sources to allow surfacing insights and looking at the big picture of how the organisation is doing and to make data driven decisions.

To solve the above goal, we are using Debezium to perform Change Data Capture on our transactional data to make it available in Kafka, our choice of message broker. Once that data is available in Kafka we can do one or all of the following:

Perform streaming joins or data enrichment across change streams of different relational tables (maybe even from different databases or services altogether. eg. Enriching shipments with trip and vehicle data)
Creating domain events from change streams for consumption by downstream services (eg. Creating an aggregate message with order, shipment and product information from three different change streams)
Moving the change stream data into a data lake to allow for disaster recovery or replaying part of the data
Complex event processing to generate real time metrics and power dashboards (eg. Live count of items in transit, average trip time within each region etc.)

Debezium makes all of those above use-cases possible and very easy to build by providing a common platform and framework to connect our existing data sources like MongoDB, PostgreSQL or MySQL.

This article is going to share our learnings with using Debezium on AWS RDS (AWS’s managed database service) and hopefully help transfer some knowledge we’ve gained in that process and also document how to skip unparseable records from PostgreSQL’s WAL until DBZ-1760 is fixed (already implemented and scheduled for the next Debezium 1.1 preview release).

Here’s a brief architecture overview that shows a few of the use-cases that Debezium is powering and the general data platform.

Figure 1. Current Architecture

But to get to the above was an iterative process which took a lot of experimentation and trial and error.