Skip to Content
CLIIntegrationsCreating New IntegrationOverview

Creating a New Integration

CloudQuery integrations are modular: a source integration fetches data from any third-party API, and a destination integration writes it to any supported target. You can mix and match any source with any destination.

The CloudQuery protocol is language-agnostic, built on gRPC and Apache Arrow. For faster development, we provide SDKs that handle the protocol details so you can focus on your API logic.

The code and SDKs use the term plugin (e.g. plugin-sdk, cq-source-<name>), while CloudQuery documentation uses integration. These terms refer to the same thing.

Choose Your Language

GoPythonJavaScriptJava
Source SDKYes (Guide)Yes (Guide)Yes (Guide)Yes (Guide)
Destination SDKYes (Examples)Yes (Example)NoNo
Scaffold / templatecq-scaffold CLITemplate repoClone AirtableClone Bitbucket

Go is the most mature option with a dedicated scaffold tool and the widest range of examples. Python, JavaScript, and Java SDKs are all actively maintained and work well for source integrations.

The guides below focus on source integrations. For destination integrations, there is no step-by-step guide yet, but you can reference the Go destination examples (e.g. PostgreSQL, BigQuery) and the Python SQLite destination.

Development Workflow

Regardless of language, building a CloudQuery integration follows the same steps:

  1. Scaffold or clone: Use the scaffold tool (Go) or clone a template/reference integration (other languages)
  2. Define tables: Declare the tables and columns your integration exposes
  3. Implement resolvers: Write functions that fetch data from the API and send it to CloudQuery
  4. Test locally: Run your integration as a gRPC server or local binary and sync with cloudquery sync
  5. Publish: Release your integration to the CloudQuery Hub

Core Concepts

These concepts apply to all CloudQuery integrations regardless of language.

Syncs

A sync is the process triggered by cloudquery sync. It fetches data from a source API and writes it to a destination (database, data lake, stream, etc.). When you build a source integration, you only implement the part that talks to the third-party API. The SDK handles delivering data to the destination.

Tables

A table is CloudQuery’s unit for a collection of related data. In most databases it maps directly to a database table; in other destinations it could be a file, stream, or other medium.

A table is defined by:

  • A name: follows the convention <integration>_<service>_<resource>, e.g. xkcd_comics
  • Columns: usually derived automatically from a struct or class via the SDK’s transformer
  • A resolver: the function that fetches data for this table

Tables can be organized into services (logical groupings that mirror the underlying API structure). For small integrations with only a few tables, you can skip services and put tables directly in a resources directory.

Resolvers

Resolvers are functions that populate table data. There are two types:

  • Table resolvers fetch data from the API and send results to a channel (Go), yield them as a generator (Python), or write them to a stream (JavaScript/Java). For top-level tables, the resolver is called once per multiplexer client. For child tables, it’s called once per parent row.
  • Column resolvers (optional) handle custom column mappings. In most cases, the SDK auto-maps struct/class fields to columns, so you won’t need these.

Multiplexers

Multiplexers parallelize data fetching. If your integration supports multiple accounts, organizations, or regions, a multiplexer calls the table resolver once per entity, in parallel. Many integrations don’t need multiplexers.

Incremental Tables

Instead of fetching all data on every sync, incremental tables use a cursor stored in a state backend to resume from where the last sync ended. This is much more efficient for large datasets but adds complexity.

Consider incremental tables when:

  • The API supports filtering by timestamp or cursor
  • Full syncs are too slow or expensive
  • Data volume is large and mostly unchanged between syncs

Learn more in Managing Incremental Tables.

Configuration & Authentication

When a user runs cloudquery sync, their YAML configuration is passed to your integration. The spec field under your source configuration contains custom settings like API keys, endpoints, and options:

config.yaml
kind: source spec: name: "my-integration" registry: "grpc" path: "localhost:7777" tables: ["*"] destinations: - "sqlite" spec: # These fields are defined by YOUR integration access_token: "${MY_API_TOKEN}" base_url: "https://api.example.com" concurrency: 100

Your integration receives the inner spec block as raw JSON. You parse it into a typed configuration object (a Go struct, Python dataclass, TypeScript object, or Java class), validate it, and use it to create an authenticated API client.

The typical flow:

  1. SDK passes spec as JSON bytes/string to your initialization function
  2. You deserialize it into a typed Spec struct/class
  3. You validate required fields (e.g. access_token must not be empty)
  4. You create an API client using the spec values
  5. You store the client on your Client struct for resolvers to use

See each language guide for the specific parsing patterns.

Never log or expose API tokens or secrets. Use environment variable references like ${MY_API_TOKEN} in configuration files. The CloudQuery CLI resolves these automatically.

Common Patterns

Pagination

Most real-world APIs require pagination. Handle this in your table resolver by looping until all pages are fetched:

  • Cursor-based: Store the cursor from each response, pass it in the next request. Send each page’s results immediately. Don’t accumulate them in memory.
  • Offset-based: Increment the offset by the page size on each iteration.
  • Link-based: Follow next URLs from the response headers or body.

The key principle: stream results as you get them. Send each page’s items to the channel/stream/generator immediately rather than collecting all pages first. This keeps memory usage low and gets data to the destination faster.

Parent-Child Tables

Many APIs are hierarchical: organizations contain repositories, repositories contain commits, etc. CloudQuery supports this via table relations:

  1. Define a parent table (e.g. workspaces) and a child table (e.g. repositories)
  2. Link them via the parent’s relations field
  3. The child resolver receives the parent row and can extract the parent ID to make its API call

This is a common pattern. See the Bitbucket integration (Java) for a clean example of Workspaces → Repositories, or the Kubernetes integration (Go) for a complex hierarchy.

Error Handling

When an API call fails in a resolver:

  • Return the error: the SDK will log it and surface it to the user. Don’t silently swallow errors.
  • Don’t retry internally: the SDK and the user’s infrastructure handle retries at a higher level.
  • Partial results are OK: if you’ve already sent some items to the channel/stream before encountering an error, those items are still written to the destination. Return the error after sending what you can.
  • Rate limiting: if the API returns a rate limit response (HTTP 429), most SDK HTTP clients handle backoff automatically. If you’re making raw HTTP calls, consider adding exponential backoff for 429 and 5xx responses.

Testing Locally

There are two ways to test an integration during development. Both are covered in detail in Running Locally.

gRPC Server Mode

Run your integration with the serve command, then point cloudquery sync at it using registry: grpc:

config.yaml
kind: source spec: name: "my-integration" registry: "grpc" path: "localhost:7777" tables: ["*"] destinations: - "sqlite" --- kind: destination spec: name: sqlite path: cloudquery/sqlite registry: cloudquery version: "v2.14.5" spec: connection_string: ./db.sql

This mode is ideal for debugging. You can attach a debugger to the running process. Errors appear in the integration’s console, not in cloudquery.log.

Local Binary / Docker Mode

Build your integration as a binary or Docker image, then use registry: local or registry: docker in your configuration. This is closest to how users will run your integration in production. See each language guide for specific build commands.

Publishing

When your integration is ready, publish it to the CloudQuery Hub so others can use it. The publishing guide covers all supported languages.

Resources

Last updated on