@cyanheads/socrata-mcp-server

Search and query government open-data portals (Socrata SODA API) via MCP. STDIO or Streamable HTTP.

6 Tools • 2 Resources • 1 Prompt

Public Hosted Server: https://socrata.caseyjhand.com/mcp

Tools

Six tools covering the full Socrata workflow — portal discovery, dataset search, schema inspection, SoQL querying, and DuckDB-powered analytical SQL over large result sets:

Tool	Description
`socrata_list_portals`	List known Socrata-powered government open-data portals with domain, organization name, and dataset count
`socrata_find_datasets`	Search for datasets across all Socrata portals or scope to one portal via the Discovery API
`socrata_get_dataset`	Fetch full metadata and typed column schema for a dataset by ID — required before writing SoQL queries
`socrata_query_dataset`	Execute a SoQL query against any dataset: search, select, where, group, having, order, with DataCanvas spillover
`socrata_dataframe_describe`	List registered tables in a DataCanvas session — schema, row count, column names
`socrata_dataframe_query`	Run SELECT-only SQL against DataCanvas tables populated by `socrata_query_dataset`

`socrata_list_portals`

List known Socrata-powered government open-data portals.

Backed by the Discovery API domains catalog — hundreds of city, county, state, and federal portals
Client-side substring filtering on domain or organization name
Pagination (up to 200 per page) with offset
Returns domain (pass to socrata_find_datasets), organization name, and dataset count
Use this first when you don't know which portal to target

`socrata_find_datasets`

Search for datasets across all Socrata portals or scope to a single portal.

Full-text search across dataset names and descriptions
Scope to a single portal with the domain parameter
Filter by category (e.g. ["Public Safety", "Transportation"]) and tags (e.g. ["covid19"])
Asset type filtering: datasets, maps, files, calendars, stories
Sort by relevance, page views, created date, or updated date
Pagination (up to 100 per page) with offset
Returns dataset IDs, names, abbreviated column previews, domains, and update timestamps
Column names here are preview-only — call socrata_get_dataset for typed schema before writing queries
Recovery hints on empty results — echoes applied filters and suggests how to broaden

`socrata_get_dataset`

Fetch full metadata and column schema for a Socrata dataset by ID.

Returns field names, Socrata data types, descriptions, row count, and licensing
Column data_type determines correct WHERE clause syntax: Number → bare literals (year=2023), Text → single-quoted strings (year='2023')
Excludes computed region columns (:@computed_region_*) to reduce noise
Per-column non-null row counts when available
Always call this before writing a socrata_query_dataset query

`socrata_query_dataset`

Execute a SoQL query against any dataset on any Socrata portal.

search parameter for quick full-text lookup across all text columns ($q)
select, where, group, having, order for full analytical control
SoQL operators: =, !=, >, <, LIKE, IN(...), BETWEEN, IS NULL, starts_with(), contains(), AND, OR, NOT
Aggregation: count(*), sum(), avg(), min(), max() with group and having
Pagination up to 5000 rows per call with offset; total_count returned when result is truncated
assembled_query in the response echoes the SoQL string for learning the syntax
All SODA 2.1 row values are strings — geo/location columns return nested objects
When CANVAS_PROVIDER_TYPE=duckdb and result hits the limit, rows spill to a DataCanvas table for SQL-based analysis

`socrata_dataframe_describe`

List registered tables in a DataCanvas session.

Shows table name, row count, and DuckDB-inferred column types for each registered table
Only meaningful when CANVAS_PROVIDER_TYPE=duckdb is set
Use after socrata_query_dataset spills a large result set
Returns canvas ID for use in socrata_dataframe_query

`socrata_dataframe_query`

Run SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset.

DuckDB infers types from spilled data — numeric columns that SODA returned as strings become queryable with numeric comparisons (year > 2020, amount < 500)
SELECT-only enforcement: DDL, DML, and file-reading functions (read_csv, read_parquet) are rejected
Up to 10,000 rows per call
Only works when CANVAS_PROVIDER_TYPE=duckdb is set

Resources and prompts

Type	Name	Description
Resource	`socrata://datasets/{domain}/{datasetId}`	Fetch full metadata and column schema for a dataset by stable URI — same payload as `socrata_get_dataset`
Resource	`socrata://portals`	Paginated list of known Socrata portals with organization name and dataset count
Prompt	`explore_open_data`	Structured six-step civic data investigation workflow: find portal → discover datasets → inspect schema → query → aggregate → synthesize

All resource data is also reachable via tools. Use the corresponding tool for agent workflows — resources are for clients that support URI-addressable data.

Features

Built on @cyanheads/mcp-ts-core:

Declarative tool, resource, and prompt definitions — single file per primitive, framework handles registration and validation
Unified error handling — handlers throw, framework catches, classifies, and formats
Pluggable auth: none, jwt, oauth
Swappable storage backends: in-memory, filesystem, Supabase, Cloudflare KV/R2/D1
Structured logging with optional OpenTelemetry tracing
STDIO and Streamable HTTP transports
Optional DataCanvas (DuckDB) for analytical SQL over large result sets

Socrata-specific:

Full Socrata SODA 2.1 API integration — SoQL query builder with select, where, group, having, order, search, limit, offset
Discovery API for cross-portal dataset search and portal catalog
App token support (SOCRATA_APP_TOKEN) for higher per-IP rate limits
Configurable default portal domain via SOCRATA_DEFAULT_DOMAIN
Computed region column filtering to reduce noise in wide datasets
DataCanvas spillover — large query results automatically register as DuckDB tables for SQL analysis

Agent-friendly output:

Assembled SoQL string echoed in every socrata_query_dataset response so agents can learn and refine syntax
Recovery hints on empty results — echoes applied filters with specific suggestions for broadening
Column type context embedded in schema output with WHERE-clause quoting rules stated explicitly
Per-item structured error reasons (invalid_id, not_found, soql_error, rate_limited) with actionable recovery text

Getting started

Add the following to your MCP client configuration file.

{
  "mcpServers": {
    "socrata-mcp-server": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/socrata-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with npx (no Bun required):

{
  "mcpServers": {
    "socrata-mcp-server": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cyanheads/socrata-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with Docker:

{
  "mcpServers": {
    "socrata-mcp-server": {
      "type": "stdio",
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "MCP_TRANSPORT_TYPE=stdio",
        "ghcr.io/cyanheads/socrata-mcp-server:latest"
      ]
    }
  }
}

For Streamable HTTP, set the transport and start the server:

MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp

Prerequisites

Bun v1.3.0 or higher (or Node.js v24+).
Optional: A Socrata app token — register for free at any portal (e.g. data.seattle.gov) to get higher rate limits (10 req/s per token vs. shared throttled pool without one).

Installation

Clone the repository:

git clone https://github.com/cyanheads/socrata-mcp-server.git

Navigate into the directory:

cd socrata-mcp-server

Install dependencies:

bun install

Configure environment:

cp .env.example .env
# edit .env and set SOCRATA_APP_TOKEN if you have one

Configuration

All configuration is validated at startup via Zod schemas in src/config/server-config.ts. Key environment variables:

Variable	Description	Default
`SOCRATA_APP_TOKEN`	Socrata app token (X-App-Token header). Without a token, requests share a throttled pool per source IP.	—
`SOCRATA_DEFAULT_DOMAIN`	Default portal domain when `domain` is omitted from tool calls.	`data.seattle.gov`
`MCP_TRANSPORT_TYPE`	Transport: `stdio` or `http`.	`stdio`
`MCP_HTTP_PORT`	Port for HTTP server.	`3010`
`MCP_AUTH_MODE`	Auth mode: `none`, `jwt`, or `oauth`.	`none`
`MCP_LOG_LEVEL`	Log level (RFC 5424): `debug`, `info`, `notice`, `warning`, `error`.	`info`
`CANVAS_PROVIDER_TYPE`	Set to `duckdb` to enable DataCanvas spillover for large result sets.	—
`LOGS_DIR`	Directory for log files (Node.js only).	`<project-root>/logs`
`STORAGE_PROVIDER_TYPE`	Storage backend: `in-memory`, `filesystem`, `supabase`, `cloudflare-kv/r2/d1`.	`in-memory`
`OTEL_ENABLED`	Enable OpenTelemetry instrumentation.	`false`

See .env.example for the full list of optional overrides.

Running the server

Local development

Build and run:

# One-time build
bun run rebuild

# Run the built server
bun run start:stdio
# or
bun run start:http

Run checks and tests:

bun run devcheck   # Lint, format, typecheck, security audit
bun run test       # Vitest test suite

Docker

docker build -t socrata-mcp-server .
docker run --rm -e MCP_TRANSPORT_TYPE=http -p 3010:3010 socrata-mcp-server

The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/socrata-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.

Project structure

Directory	Purpose
`src/index.ts`	`createApp()` entry point — registers tools, resources, prompts, and inits the Socrata service.
`src/config`	Server-specific environment variable parsing and validation with Zod.
`src/mcp-server/tools`	Tool definitions (`*.tool.ts`). Six tools covering portal listing, dataset search, schema fetch, SoQL query, and DataCanvas SQL.
`src/mcp-server/resources`	Resource definitions (`*.resource.ts`). Dataset metadata and portal catalog resources.
`src/mcp-server/prompts`	Prompt definitions (`*.prompt.ts`). Civic data investigation workflow prompt.
`src/services/socrata`	Socrata service layer — SODA 2.1 API client, Discovery API, query builder, type normalization.
`tests/`	Unit and integration tests mirroring `src/`.

Development guide

See CLAUDE.md for development guidelines and architectural rules. The short version:

Handlers throw, framework catches — no try/catch in tool logic
Use ctx.log for request-scoped logging, ctx.state for tenant-scoped storage
Call socrata_get_dataset before writing WHERE clauses — column data_type determines quoting
Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields

Contributing

Issues and pull requests are welcome. Run checks and tests before submitting:

bun run devcheck
bun run test

License

Apache-2.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.claude-plugin		.claude-plugin
.codex-plugin		.codex-plugin
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.vscode		.vscode
changelog		changelog
docs		docs
scripts		scripts
skills		skills
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.mcpbignore		.mcpbignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
bunfig.toml		bunfig.toml
devcheck.config.json		devcheck.config.json
manifest.json		manifest.json
package.json		package.json
server.json		server.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@cyanheads/socrata-mcp-server

Tools

`socrata_list_portals`

`socrata_find_datasets`

`socrata_get_dataset`

`socrata_query_dataset`

`socrata_dataframe_describe`

`socrata_dataframe_query`

Resources and prompts

Features

Getting started

Prerequisites

Installation

Configuration

Running the server

Local development

Docker

Project structure

Development guide

Contributing

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@cyanheads/socrata-mcp-server

Tools

socrata_list_portals

socrata_find_datasets

socrata_get_dataset

socrata_query_dataset

socrata_dataframe_describe

socrata_dataframe_query

Resources and prompts

Features

Getting started

Prerequisites

Installation

Configuration

Running the server

Local development

Docker

Project structure

Development guide

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`socrata_list_portals`

`socrata_find_datasets`

`socrata_get_dataset`

`socrata_query_dataset`

`socrata_dataframe_describe`

`socrata_dataframe_query`

Packages