TiFlow is a comprehensive data replication and migration platform for TiDB/TiKV clusters. It provides two primary subsystems:
This document provides a high-level overview of the TiFlow platform architecture, its major components, and design philosophy. For detailed information about specific subsystems:
The following diagram illustrates the overall TiFlow platform architecture, showing how TiCDC and DM integrate with external systems:
Sources: go.mod1-128 Makefile1-130 cdc/model/owner.go1-24 dm/master/server.go14-66 dm/worker/server.go14-43
TiCDC is a distributed system for capturing and replicating change data from TiDB/TiKV clusters in real-time. It operates as a cluster of Capture nodes where one is elected as Owner for orchestration, while others act as Processors handling table replication.
| Component | Code Location | Purpose |
|---|---|---|
| CDC Server | cmd/cdc/main.go | Entry point for TiCDC service |
| Capture | cdc/capture/ | Node in TiCDC cluster, runs Owner or Processor |
| Owner | cdc/owner/ | Leader that schedules tables and manages changefeeds |
| Processor | cdc/processor/ | Worker that replicates assigned tables |
| KV Client | cdc/kv/client.go | Pulls change events from TiKV |
| Mounter | cdc/entry/mounter.go | Decodes KV events into row changes |
| Sink | cdc/sink/ | Writes data to downstream systems |
TiCDC Architecture Details: TiCDC uses a distributed architecture where the Owner (elected via etcd) manages changefeed lifecycle and schedules table replication across Processor nodes. Each processor pulls KV events from TiKV using gRPC streaming, sorts them by commit timestamp, decodes them using schema information, and writes to configured downstream sinks with at-least-once semantics.
Sources: cdc/model/owner.go14-144 dm/master/server.go102-139 pkg/errors/cdc_errors.go1-333
DM (Data Migration) uses a master-worker architecture for migrating data from MySQL/MariaDB to TiDB. It supports full data migration (dump + load) and incremental synchronization via binlog replication.
| Component | Code Location | Purpose |
|---|---|---|
| DM-Master | cmd/dm-master/, dm/master/server.go | Cluster coordinator, task scheduler |
| DM-Worker | cmd/dm-worker/, dm/worker/server.go | Executes migration tasks |
| dmctl | cmd/dm-ctl/, dm/ctl/ | Command-line control tool |
| Scheduler | dm/master/scheduler/ | Assigns tasks to workers |
| Syncer | dm/syncer/syncer.go | Binlog replication unit |
| Dumper | dm/dumpling/ | Full data dump unit |
| Loader | dm/loader/ | Data load unit |
DM Architecture Details: DM-Master nodes form a cluster with leader election. The leader master schedules tasks to workers via gRPC. Each worker processes one MySQL source through a pipeline of units: Dumper exports data, Loader imports to TiDB, and Syncer performs continuous binlog replication. The system supports shard DDL coordination for merging multiple MySQL schemas.
Sources: dm/master/server.go68-155 dm/worker/server.go44-95 dm/worker/subtask.go42-125 dm/ctl/ctl.go1-34
TiFlow provides common infrastructure shared by both TiCDC and DM subsystems.
The project uses a sophisticated Makefile-based build system with Go modules for dependency management:
Key Build Configurations:
go 1.25.5 (go.mod3)make cdc → bin/cdc (Makefile160-161)make dm → bin/dm-master, bin/dm-worker, bin/dmctl (Makefile401-418)kafka_consumer, storage_consumer, pulsar_consumer (Makefile163-171)Sources: Makefile1-130 go.mod1-128
TiFlow uses a centralized RFC-style error framework with structured error codes:
Error Framework Structure:
errors.toml defines ~1000+ error codes with descriptions (errors.toml1-1000)errors.Normalize() for consistent error creation (pkg/errors/cdc_errors.go23-333)Component:ErrName (e.g., CDC:ErrChangeFeedNotExists) (errors.toml25-28)Sources: errors.toml1-130 pkg/errors/cdc_errors.go14-333
| Configuration Type | Code Location | Description |
|---|---|---|
| ServerConfig | pkg/config/ | Global TiCDC server settings |
| ReplicaConfig | cdc/model/ | Per-changefeed configuration |
| SinkConfig | cdc/sink/ | Downstream sink parameters |
| TaskConfig | dm/config/ | DM task configuration |
| SubTaskConfig | dm/config/ | DM subtask configuration |
Configuration Loading Priority: CLI args → Config files (TOML) → etcd runtime config → Defaults
Sources: dm/master/server.go102-106 dm/worker/server.go86-87
TiFlow provides comprehensive monitoring through Prometheus metrics and Grafana dashboards:
| Component | Location | Purpose |
|---|---|---|
| Metrics Code | Various *_metrics.go files | Prometheus metric definitions |
| Grafana Dashboards | metrics/grafana/ticdc.json | Pre-built visualization panels |
| Alert Rules | metrics/ticdc.rules.yml | Alertmanager configurations |
Key Metrics Categories:
Sources: dm/master/server.go42 Makefile197-198
TiFlow integrates with multiple external systems for coordination, data source, and data targets.
Both TiCDC and DM rely on etcd for distributed coordination:
etcd Usage:
Sources: dm/master/server.go68-139 dm/worker/server.go70-71
TiFlow supports multiple downstream systems:
| Target Type | TiCDC Support | DM Support | Code Location |
|---|---|---|---|
| MySQL | ✓ | ✓ | cdc/sink/mysql/, dm/syncer/ |
| TiDB | ✓ | ✓ | Same as MySQL |
| Kafka | ✓ | ✗ | cdc/sink/kafka/ |
| Pulsar | ✓ | ✗ | cdc/sink/pulsar/ |
| Cloud Storage | ✓ | ✗ | cdc/sink/cloudstorage/ |
Protocol Support (TiCDC):
Sources: go.mod6-127 pkg/errors/cdc_errors.go172-333
TiFlow's architecture reflects several key design principles:
Both TiCDC and DM are designed as distributed systems that scale horizontally:
TiCDC provides a unified sink interface for extensibility:
Sink interface abstracts downstream systemsBoth systems aim for data consistency:
Centralized error framework enables:
Rich instrumentation throughout the codebase:
Sources: dm/master/server.go68-155 cdc/model/owner.go1-182 errors.toml1-130 Makefile197-198
High-level directory organization of the TiFlow repository:
| Directory | Purpose | Key Entry Points |
|---|---|---|
cdc/ | TiCDC subsystem | capture/, owner/, processor/, sink/ |
dm/ | DM subsystem | master/, worker/, syncer/, loader/ |
cmd/ | Binary entry points | cdc/, dm-master/, dm-worker/, dm-ctl/ |
pkg/ | Shared utilities | config/, errors/, etcdutil/, log/ |
engine/ | Dataflow engine (new) | Experimental distributed execution engine |
metrics/ | Observability | grafana/, alert rules |
tests/ | Integration tests | integration_tests/ |
deployments/ | Deployment configs | Docker Compose, Kubernetes manifests |
Binary Outputs (from make build):
bin/cdc - TiCDC serverbin/dm-master - DM master serverbin/dm-worker - DM worker serverbin/dmctl - DM control CLISources: Makefile141-142 go.mod1-3
To build TiFlow from source:
Build Requirements:
Sources: Makefile1-142 go.mod1-5
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.