Search and query government open-data portals (Socrata SODA API) via MCP. STDIO or Streamable HTTP.
Six tools covering the full Socrata workflow — portal discovery, dataset search, schema inspection, SoQL querying, and DuckDB-powered analytical SQL over large result sets:
| Tool | Description |
|---|---|
socrata_list_portals |
List known Socrata-powered government open-data portals with domain, organization name, and dataset count |
socrata_find_datasets |
Search for datasets across all Socrata portals or scope to one portal via the Discovery API |
socrata_get_dataset |
Fetch full metadata and typed column schema for a dataset by ID — required before writing SoQL queries |
socrata_query_dataset |
Execute a SoQL query against any dataset: search, select, where, group, having, order, with DataCanvas spillover |
socrata_dataframe_describe |
List registered tables in a DataCanvas session — schema, row count, column names |
socrata_dataframe_query |
Run SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset |
List known Socrata-powered government open-data portals.
- Backed by the Discovery API domains catalog — hundreds of city, county, state, and federal portals
- Client-side substring filtering on domain or organization name
- Pagination (up to 200 per page) with offset
- Returns domain (pass to
socrata_find_datasets), organization name, and dataset count - Use this first when you don't know which portal to target
Search for datasets across all Socrata portals or scope to a single portal.
- Full-text search across dataset names and descriptions
- Scope to a single portal with the
domainparameter - Filter by category (e.g.
["Public Safety", "Transportation"]) and tags (e.g.["covid19"]) - Asset type filtering: datasets, maps, files, calendars, stories
- Sort by relevance, page views, created date, or updated date
- Pagination (up to 100 per page) with offset
- Returns dataset IDs, names, abbreviated column previews, domains, and update timestamps
- Column names here are preview-only — call
socrata_get_datasetfor typed schema before writing queries - Recovery hints on empty results — echoes applied filters and suggests how to broaden
Fetch full metadata and column schema for a Socrata dataset by ID.
- Returns field names, Socrata data types, descriptions, row count, and licensing
- Column
data_typedetermines correct WHERE clause syntax:Number→ bare literals (year=2023),Text→ single-quoted strings (year='2023') - Excludes computed region columns (
:@computed_region_*) to reduce noise - Per-column non-null row counts when available
- Always call this before writing a
socrata_query_datasetquery
Execute a SoQL query against any dataset on any Socrata portal.
searchparameter for quick full-text lookup across all text columns ($q)select,where,group,having,orderfor full analytical control- SoQL operators:
=,!=,>,<,LIKE,IN(...),BETWEEN,IS NULL,starts_with(),contains(),AND,OR,NOT - Aggregation:
count(*),sum(),avg(),min(),max()withgroupandhaving - Pagination up to 5000 rows per call with offset;
total_countreturned when result is truncated assembled_queryin the response echoes the SoQL string for learning the syntax- All SODA 2.1 row values are strings — geo/location columns return nested objects
- When
CANVAS_PROVIDER_TYPE=duckdband result hits the limit, rows spill to a DataCanvas table for SQL-based analysis
List registered tables in a DataCanvas session.
- Shows table name, row count, and DuckDB-inferred column types for each registered table
- Only meaningful when
CANVAS_PROVIDER_TYPE=duckdbis set - Use after
socrata_query_datasetspills a large result set - Returns canvas ID for use in
socrata_dataframe_query
Run SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset.
- DuckDB infers types from spilled data — numeric columns that SODA returned as strings become queryable with numeric comparisons (
year > 2020,amount < 500) - SELECT-only enforcement: DDL, DML, and file-reading functions (
read_csv,read_parquet) are rejected - Up to 10,000 rows per call
- Only works when
CANVAS_PROVIDER_TYPE=duckdbis set
| Type | Name | Description |
|---|---|---|
| Resource | socrata://datasets/{domain}/{datasetId} |
Fetch full metadata and column schema for a dataset by stable URI — same payload as socrata_get_dataset |
| Resource | socrata://portals |
Paginated list of known Socrata portals with organization name and dataset count |
| Prompt | explore_open_data |
Structured six-step civic data investigation workflow: find portal → discover datasets → inspect schema → query → aggregate → synthesize |
All resource data is also reachable via tools. Use the corresponding tool for agent workflows — resources are for clients that support URI-addressable data.
Built on @cyanheads/mcp-ts-core:
- Declarative tool, resource, and prompt definitions — single file per primitive, framework handles registration and validation
- Unified error handling — handlers throw, framework catches, classifies, and formats
- Pluggable auth:
none,jwt,oauth - Swappable storage backends:
in-memory,filesystem,Supabase,Cloudflare KV/R2/D1 - Structured logging with optional OpenTelemetry tracing
- STDIO and Streamable HTTP transports
- Optional DataCanvas (DuckDB) for analytical SQL over large result sets
Socrata-specific:
- Full Socrata SODA 2.1 API integration — SoQL query builder with select, where, group, having, order, search, limit, offset
- Discovery API for cross-portal dataset search and portal catalog
- App token support (
SOCRATA_APP_TOKEN) for higher per-IP rate limits - Configurable default portal domain via
SOCRATA_DEFAULT_DOMAIN - Computed region column filtering to reduce noise in wide datasets
- DataCanvas spillover — large query results automatically register as DuckDB tables for SQL analysis
Agent-friendly output:
- Assembled SoQL string echoed in every
socrata_query_datasetresponse so agents can learn and refine syntax - Recovery hints on empty results — echoes applied filters with specific suggestions for broadening
- Column type context embedded in schema output with WHERE-clause quoting rules stated explicitly
- Per-item structured error reasons (
invalid_id,not_found,soql_error,rate_limited) with actionable recovery text
Add the following to your MCP client configuration file.
{
"mcpServers": {
"socrata-mcp-server": {
"type": "stdio",
"command": "bunx",
"args": ["@cyanheads/socrata-mcp-server@latest"],
"env": {
"MCP_TRANSPORT_TYPE": "stdio",
"MCP_LOG_LEVEL": "info"
}
}
}
}Or with npx (no Bun required):
{
"mcpServers": {
"socrata-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cyanheads/socrata-mcp-server@latest"],
"env": {
"MCP_TRANSPORT_TYPE": "stdio",
"MCP_LOG_LEVEL": "info"
}
}
}
}Or with Docker:
{
"mcpServers": {
"socrata-mcp-server": {
"type": "stdio",
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "MCP_TRANSPORT_TYPE=stdio",
"ghcr.io/cyanheads/socrata-mcp-server:latest"
]
}
}
}For Streamable HTTP, set the transport and start the server:
MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp- Bun v1.3.0 or higher (or Node.js v24+).
- Optional: A Socrata app token — register for free at any portal (e.g. data.seattle.gov) to get higher rate limits (10 req/s per token vs. shared throttled pool without one).
- Clone the repository:
git clone https://github.com/cyanheads/socrata-mcp-server.git- Navigate into the directory:
cd socrata-mcp-server- Install dependencies:
bun install- Configure environment:
cp .env.example .env
# edit .env and set SOCRATA_APP_TOKEN if you have oneAll configuration is validated at startup via Zod schemas in src/config/server-config.ts. Key environment variables:
| Variable | Description | Default |
|---|---|---|
SOCRATA_APP_TOKEN |
Socrata app token (X-App-Token header). Without a token, requests share a throttled pool per source IP. | — |
SOCRATA_DEFAULT_DOMAIN |
Default portal domain when domain is omitted from tool calls. |
data.seattle.gov |
MCP_TRANSPORT_TYPE |
Transport: stdio or http. |
stdio |
MCP_HTTP_PORT |
Port for HTTP server. | 3010 |
MCP_AUTH_MODE |
Auth mode: none, jwt, or oauth. |
none |
MCP_LOG_LEVEL |
Log level (RFC 5424): debug, info, notice, warning, error. |
info |
CANVAS_PROVIDER_TYPE |
Set to duckdb to enable DataCanvas spillover for large result sets. |
— |
LOGS_DIR |
Directory for log files (Node.js only). | <project-root>/logs |
STORAGE_PROVIDER_TYPE |
Storage backend: in-memory, filesystem, supabase, cloudflare-kv/r2/d1. |
in-memory |
OTEL_ENABLED |
Enable OpenTelemetry instrumentation. | false |
See .env.example for the full list of optional overrides.
-
Build and run:
# One-time build bun run rebuild # Run the built server bun run start:stdio # or bun run start:http
-
Run checks and tests:
bun run devcheck # Lint, format, typecheck, security audit bun run test # Vitest test suite
docker build -t socrata-mcp-server .
docker run --rm -e MCP_TRANSPORT_TYPE=http -p 3010:3010 socrata-mcp-serverThe Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/socrata-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.
| Directory | Purpose |
|---|---|
src/index.ts |
createApp() entry point — registers tools, resources, prompts, and inits the Socrata service. |
src/config |
Server-specific environment variable parsing and validation with Zod. |
src/mcp-server/tools |
Tool definitions (*.tool.ts). Six tools covering portal listing, dataset search, schema fetch, SoQL query, and DataCanvas SQL. |
src/mcp-server/resources |
Resource definitions (*.resource.ts). Dataset metadata and portal catalog resources. |
src/mcp-server/prompts |
Prompt definitions (*.prompt.ts). Civic data investigation workflow prompt. |
src/services/socrata |
Socrata service layer — SODA 2.1 API client, Discovery API, query builder, type normalization. |
tests/ |
Unit and integration tests mirroring src/. |
See CLAUDE.md for development guidelines and architectural rules. The short version:
- Handlers throw, framework catches — no
try/catchin tool logic - Use
ctx.logfor request-scoped logging,ctx.statefor tenant-scoped storage - Call
socrata_get_datasetbefore writing WHERE clauses — columndata_typedetermines quoting - Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields
Issues and pull requests are welcome. Run checks and tests before submitting:
bun run devcheck
bun run testApache-2.0 — see LICENSE for details.