Apache SeaTunnel is a multimodal, high-performance, distributed data integration platform designed for synchronizing large volumes of data across diverse systems. It addresses common challenges in data integration including:
SeaTunnel provides a unified Connector API v2 that enables the same connectors to run unchanged across multiple execution engines (SeaTunnel Zeta, Apache Flink, Apache Spark). It is licensed under the Apache License 2.0 and maintained by the Apache Software Foundation.
This document covers SeaTunnel's architecture, component responsibilities, and distribution model. For specific topics:
Sources: README.md30-66 pom.xml32-34 seatunnel-dist/release-docs/LICENSE1-18
SeaTunnel is organized into seven major architectural layers, each serving distinct responsibilities in the data integration lifecycle.
Diagram: SeaTunnel System Architecture and Component Organization
Sources: README.md67-71 pom.xml36-56 seatunnel-dist/pom.xml38-72
SeaTunnel is organized as a Maven multi-module project with clear separation of concerns. The parent POM pom.xml27-29 defines the project coordinates org.apache.seatunnel:seatunnel and manages dependencies across all modules.
Key Modules:
| Module | Location | Purpose |
|---|---|---|
seatunnel-api | pom.xml47 | Connector API v2 interfaces: SeaTunnelSource, SeaTunnelSink, SeaTunnelTransform |
seatunnel-core | pom.xml44 | Engine starters for Flink (1.13/1.15/1.20) and Spark (2.4/3.3) |
seatunnel-engine | pom.xml51 | Native Zeta engine implementation with master-worker architecture |
seatunnel-connectors-v2 | pom.xml46 | 100+ connector implementations organized by type |
seatunnel-transforms-v2 | pom.xml45 | Transform plugins including SQL engine with 100+ functions |
seatunnel-translation | pom.xml48 | Translation layer converting logical DAGs to engine-specific execution plans |
seatunnel-plugin-discovery | pom.xml49 | SPI-based plugin discovery with @AutoService support |
seatunnel-formats | pom.xml50 | Format support: text, CSV, JSON, Parquet, ORC, Avro, Excel, XML |
seatunnel-dist | seatunnel-dist/pom.xml29 | Distribution assembly and packaging |
seatunnel-e2e | pom.xml53 | End-to-end test framework using Testcontainers |
The project uses Java 8 pom.xml63 and Scala 2.12.15 pom.xml64 as base language versions. It maintains compatibility with multiple engine versions through separate starter modules and careful dependency scoping.
Sources: pom.xml16-56 seatunnel-connectors-v2/pom.xml28-93
SeaTunnel's defining architectural feature is engine independence: the same connector can execute on three different compute engines without modification. This is achieved through the Translation Layer pom.xml48 which converts the unified logical DAG into engine-specific execution plans.
Diagram: Multi-Engine Architecture with Unified Connector API
Engine-Specific Starters:
Each engine has dedicated starter JARs packaged in the distribution seatunnel-dist/src/main/assembly/assembly-bin.xml149-167:
seatunnel-starter.jar in starter/ directoryseatunnel-flink-{13,15,20}-starter.jar with version-specific bindingsseatunnel-spark-{2,3}-starter.jar for Spark 2.4 and 3.3The binary distribution includes all starters simultaneously, allowing users to switch engines by changing command-line parameters without reinstalling.
Sources: pom.xml77-82 seatunnel-dist/pom.xml117-154 seatunnel-dist/src/main/assembly/assembly-bin.xml149-167
SeaTunnel's extensibility is built on a plugin architecture where connectors and transforms are discovered at runtime via Java's Service Provider Interface (SPI) mechanism enhanced with Google's @AutoService annotation pom.xml506-510
Diagram: Plugin Architecture with SPI Discovery and Installation Workflow
Plugin Installation Process:
Configuration: Users specify required plugins in config/plugin_config22-99:
--connectors-v2--
connector-jdbc
connector-kafka
connector-fake
connector-console
--end--
Mapping Resolution: plugin-mapping.properties18-157 maps user-friendly names to Maven artifact IDs:
Download: bin/install-plugin.sh19-53 reads the config and downloads JARs using Maven wrapper:
Discovery: At runtime, ClassLoaderService scans connectors/*.jar for SPI-registered factories implementing TableSourceFactory, TableSinkFactory, or TableTransformFactory.
The system uses @AutoService annotations pom.xml542-546 to automatically generate META-INF/services/ descriptor files during compilation, eliminating manual SPI configuration.
Sources: bin/install-plugin.sh18-53 plugin-mapping.properties18-157 config/plugin_config1-99 seatunnel-e2e/seatunnel-e2e-common/src/test/java/org/apache/seatunnel/e2e/common/util/ContainerUtil.java70-93
SeaTunnel is distributed as a tar.gz archive assembled by Maven's assembly plugin using seatunnel-dist/src/main/assembly/assembly-bin.xml1-225 The distribution follows a minimal core + pluggable connectors model to reduce download size.
Distribution Structure:
apache-seatunnel-${version}-bin/
├── bin/
│ ├── seatunnel.sh # Zeta engine CLI
│ ├── seatunnel-cluster.sh # Zeta cluster mode
│ ├── install-plugin.sh # Connector installer
│ ├── start-seatunnel-flink-*.sh # Flink starters
│ └── start-seatunnel-spark-*.sh # Spark starters
├── config/
│ ├── plugin_config # Plugin selection
│ ├── hazelcast.yaml # Zeta cluster config
│ ├── hazelcast-client.yaml # Zeta client config
│ └── v2.*.conf.template # Example jobs
├── connectors/
│ ├── plugin-mapping.properties # Name-to-artifact mapping
│ ├── connector-fake.jar # Demo connector
│ └── connector-console.jar # Demo connector
├── lib/
│ └── seatunnel-transforms-v2.jar # Transform plugins
├── starter/
│ ├── logging/ # SLF4J + Log4j2 JARs
│ ├── seatunnel-starter.jar # Zeta engine entry
│ ├── seatunnel-flink-*-starter.jar # Flink entries
│ └── seatunnel-spark-*-starter.jar # Spark entries
├── LICENSE
├── NOTICE
└── README.md
Assembly Configuration:
The CI distribution seatunnel-dist/src/main/assembly/assembly-bin-ci.xml1-213 includes all connectors for testing, while the release distribution seatunnel-dist/src/main/assembly/assembly-bin.xml1-225 includes only minimal demo connectors (connector-fake, connector-console) to reduce size.
Docker Image:
The project provides an official Docker image built via seatunnel-dist/src/main/docker/Dockerfile1-18:
The image is published to Docker Hub as apache/seatunnel:${version} pom.xml119-121
Sources: seatunnel-dist/src/main/assembly/assembly-bin.xml27-224 seatunnel-dist/src/main/assembly/assembly-bin-ci.xml27-213 seatunnel-dist/src/main/docker/Dockerfile1-18 pom.xml119-124
Current Release: SeaTunnel 2.3.13 pom.xml60
Engine Compatibility Matrix:
| Engine | Supported Versions | Starter Module |
|---|---|---|
| SeaTunnel Zeta | Native (built-in) | seatunnel-starter |
| Apache Flink | 1.13.6, 1.15.3, 1.18.x, 1.20.1 | seatunnel-flink-{13,15,20}-starter |
| Apache Spark | 2.4.0, 3.3.0 | seatunnel-spark-{2,3}-starter |
Runtime Requirements:
JAVA_HOME environment variable configuredBuild Requirements:
mvnw wrapper)Connector Count: 100+ README.md51 across categories:
All connectors implement the same SeaTunnelSource, SeaTunnelSink, or SeaTunnelTransform interfaces and work identically across all three engines.
Sources: pom.xml60-82 README.md51-63 plugin-mapping.properties23-157
SeaTunnel can be deployed in three modes depending on execution requirements:
Local Mode (Zeta): Single-process execution for development and testing
Cluster Mode (Zeta): Distributed execution with master-worker architecture
SeaTunnelServer nodes with Hazelcast clusteringExternal Engine Mode: Submit jobs to existing Flink or Spark clusters
Typical Installation Steps:
wget https://archive.apache.org/dist/seatunnel/2.3.13/apache-seatunnel-2.3.13-bin.tar.gztar -xzvf apache-seatunnel-2.3.13-bin.tar.gzsh bin/install-plugin.sh 2.3.13.conf file with source, transform, sink definitionssh bin/seatunnel.sh -m local -c job.confsh bin/seatunnel.sh -c job.conf (defaults to cluster mode)sh bin/start-seatunnel-flink-13-connector-v2.sh -c job.confsh bin/start-seatunnel-spark-3-connector-v2.sh -c job.confDocker Quick Start:
Sources: docs/en/start-v2/locally/deployment.md1-91 docs/en/start-v2/docker/docker.md1-144 README.md79-91
Apache SeaTunnel is a top-level project of the Apache Software Foundation README.md1 All code is licensed under Apache License 2.0 README.md137 seatunnel-dist/release-docs/LICENSE1-201
Community Resources:
Third-Party Dependencies:
SeaTunnel includes components with separate licenses documented in seatunnel-dist/release-docs/NOTICE1-600 including:
All dependencies are tracked in tools/dependencies/known-dependencies.txt1-131 for license compliance verification.
Sources: README.md1-166 seatunnel-dist/release-docs/LICENSE1-201 seatunnel-dist/release-docs/NOTICE1-600 tools/dependencies/known-dependencies.txt1-131
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.