What is change data capture?
Change data capture (CDC) is a method for keeping systems in sync by streaming only what changed, rather than re-querying everything periodically. When a record is inserted, updated, or deleted in the source database, CDC captures that event and propagates it to downstream systems in near real time.
Traditional batch ETL pulls data on a schedule — nightly, hourly, whatever the pipeline runs on. CDC is event-driven: the moment a change happens, it moves. The result is downstream systems that are current, not hours behind.
CDC typically works by reading the database transaction log (binlog in MySQL, WAL in Postgres) rather than querying tables directly. That makes it low-impact on the source system and capable of capturing deletes — something polling-based approaches miss entirely.
Why it matters
Stale data is one of the most expensive invisible problems in enterprise operations. Analytics teams waiting on overnight exports, AI agents reasoning on yesterday’s customer records, CRM data that doesn’t reflect this morning’s activity — all of it traces back to batch pipelines that were acceptable when real-time was hard.
Real-time expectations have raised the bar. Customers expect instant updates. Agents need current data to act correctly. RevOps teams can’t route leads on stale scoring. CDC is the foundation that makes real-time data architectures work.
The tradeoff is operational complexity. Managing CDC connectors, handling schema changes, and dealing with at-least-once delivery semantics requires platform support — it’s not something you want to build from scratch.
Change data capture at Tray.ai
Data Integration in Tray.ai includes native CDC support — so pipelines into Snowflake, Databricks, Redshift, and BigQuery stay current without re-pulling full datasets. Pair it with the SQL Transformer to reshape and join data in-flight before it lands in the warehouse.