Skip to content

cyanheads/socrata-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@cyanheads/socrata-mcp-server

Search and query government open-data portals (Socrata SODA API) via MCP. STDIO or Streamable HTTP.

6 Tools • 2 Resources • 1 Prompt

Version License Docker MCP SDK npm TypeScript Bun

Install in Claude Desktop Install in Cursor Install in VS Code

Framework

Public Hosted Server: https://socrata.caseyjhand.com/mcp


Tools

Six tools covering the full Socrata workflow — portal discovery, dataset search, schema inspection, SoQL querying, and DuckDB-powered analytical SQL over large result sets:

Tool Description
socrata_list_portals List known Socrata-powered government open-data portals with domain, organization name, and dataset count
socrata_find_datasets Search for datasets across all Socrata portals or scope to one portal via the Discovery API
socrata_get_dataset Fetch full metadata and typed column schema for a dataset by ID — required before writing SoQL queries
socrata_query_dataset Execute a SoQL query against any dataset: search, select, where, group, having, order, with DataCanvas spillover
socrata_dataframe_describe List registered tables in a DataCanvas session — schema, row count, column names
socrata_dataframe_query Run SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset

socrata_list_portals

List known Socrata-powered government open-data portals.

  • Backed by the Discovery API domains catalog — hundreds of city, county, state, and federal portals
  • Client-side substring filtering on domain or organization name
  • Pagination (up to 200 per page) with offset
  • Returns domain (pass to socrata_find_datasets), organization name, and dataset count
  • Use this first when you don't know which portal to target

socrata_find_datasets

Search for datasets across all Socrata portals or scope to a single portal.

  • Full-text search across dataset names and descriptions
  • Scope to a single portal with the domain parameter
  • Filter by category (e.g. ["Public Safety", "Transportation"]) and tags (e.g. ["covid19"])
  • Asset type filtering: datasets, maps, files, calendars, stories
  • Sort by relevance, page views, created date, or updated date
  • Pagination (up to 100 per page) with offset
  • Returns dataset IDs, names, abbreviated column previews, domains, and update timestamps
  • Column names here are preview-only — call socrata_get_dataset for typed schema before writing queries
  • Recovery hints on empty results — echoes applied filters and suggests how to broaden

socrata_get_dataset

Fetch full metadata and column schema for a Socrata dataset by ID.

  • Returns field names, Socrata data types, descriptions, row count, and licensing
  • Column data_type determines correct WHERE clause syntax: Number → bare literals (year=2023), Text → single-quoted strings (year='2023')
  • Excludes computed region columns (:@computed_region_*) to reduce noise
  • Per-column non-null row counts when available
  • Always call this before writing a socrata_query_dataset query

socrata_query_dataset

Execute a SoQL query against any dataset on any Socrata portal.

  • search parameter for quick full-text lookup across all text columns ($q)
  • select, where, group, having, order for full analytical control
  • SoQL operators: =, !=, >, <, LIKE, IN(...), BETWEEN, IS NULL, starts_with(), contains(), AND, OR, NOT
  • Aggregation: count(*), sum(), avg(), min(), max() with group and having
  • Pagination up to 5000 rows per call with offset; total_count returned when result is truncated
  • assembled_query in the response echoes the SoQL string for learning the syntax
  • All SODA 2.1 row values are strings — geo/location columns return nested objects
  • When CANVAS_PROVIDER_TYPE=duckdb and result hits the limit, rows spill to a DataCanvas table for SQL-based analysis

socrata_dataframe_describe

List registered tables in a DataCanvas session.

  • Shows table name, row count, and DuckDB-inferred column types for each registered table
  • Only meaningful when CANVAS_PROVIDER_TYPE=duckdb is set
  • Use after socrata_query_dataset spills a large result set
  • Returns canvas ID for use in socrata_dataframe_query

socrata_dataframe_query

Run SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset.

  • DuckDB infers types from spilled data — numeric columns that SODA returned as strings become queryable with numeric comparisons (year > 2020, amount < 500)
  • SELECT-only enforcement: DDL, DML, and file-reading functions (read_csv, read_parquet) are rejected
  • Up to 10,000 rows per call
  • Only works when CANVAS_PROVIDER_TYPE=duckdb is set

Resources and prompts

Type Name Description
Resource socrata://datasets/{domain}/{datasetId} Fetch full metadata and column schema for a dataset by stable URI — same payload as socrata_get_dataset
Resource socrata://portals Paginated list of known Socrata portals with organization name and dataset count
Prompt explore_open_data Structured six-step civic data investigation workflow: find portal → discover datasets → inspect schema → query → aggregate → synthesize

All resource data is also reachable via tools. Use the corresponding tool for agent workflows — resources are for clients that support URI-addressable data.

Features

Built on @cyanheads/mcp-ts-core:

  • Declarative tool, resource, and prompt definitions — single file per primitive, framework handles registration and validation
  • Unified error handling — handlers throw, framework catches, classifies, and formats
  • Pluggable auth: none, jwt, oauth
  • Swappable storage backends: in-memory, filesystem, Supabase, Cloudflare KV/R2/D1
  • Structured logging with optional OpenTelemetry tracing
  • STDIO and Streamable HTTP transports
  • Optional DataCanvas (DuckDB) for analytical SQL over large result sets

Socrata-specific:

  • Full Socrata SODA 2.1 API integration — SoQL query builder with select, where, group, having, order, search, limit, offset
  • Discovery API for cross-portal dataset search and portal catalog
  • App token support (SOCRATA_APP_TOKEN) for higher per-IP rate limits
  • Configurable default portal domain via SOCRATA_DEFAULT_DOMAIN
  • Computed region column filtering to reduce noise in wide datasets
  • DataCanvas spillover — large query results automatically register as DuckDB tables for SQL analysis

Agent-friendly output:

  • Assembled SoQL string echoed in every socrata_query_dataset response so agents can learn and refine syntax
  • Recovery hints on empty results — echoes applied filters with specific suggestions for broadening
  • Column type context embedded in schema output with WHERE-clause quoting rules stated explicitly
  • Per-item structured error reasons (invalid_id, not_found, soql_error, rate_limited) with actionable recovery text

Getting started

Add the following to your MCP client configuration file.

{
  "mcpServers": {
    "socrata-mcp-server": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/socrata-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with npx (no Bun required):

{
  "mcpServers": {
    "socrata-mcp-server": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cyanheads/socrata-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with Docker:

{
  "mcpServers": {
    "socrata-mcp-server": {
      "type": "stdio",
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "MCP_TRANSPORT_TYPE=stdio",
        "ghcr.io/cyanheads/socrata-mcp-server:latest"
      ]
    }
  }
}

For Streamable HTTP, set the transport and start the server:

MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp

Prerequisites

  • Bun v1.3.0 or higher (or Node.js v24+).
  • Optional: A Socrata app token — register for free at any portal (e.g. data.seattle.gov) to get higher rate limits (10 req/s per token vs. shared throttled pool without one).

Installation

  1. Clone the repository:
git clone https://github.com/cyanheads/socrata-mcp-server.git
  1. Navigate into the directory:
cd socrata-mcp-server
  1. Install dependencies:
bun install
  1. Configure environment:
cp .env.example .env
# edit .env and set SOCRATA_APP_TOKEN if you have one

Configuration

All configuration is validated at startup via Zod schemas in src/config/server-config.ts. Key environment variables:

Variable Description Default
SOCRATA_APP_TOKEN Socrata app token (X-App-Token header). Without a token, requests share a throttled pool per source IP.
SOCRATA_DEFAULT_DOMAIN Default portal domain when domain is omitted from tool calls. data.seattle.gov
MCP_TRANSPORT_TYPE Transport: stdio or http. stdio
MCP_HTTP_PORT Port for HTTP server. 3010
MCP_AUTH_MODE Auth mode: none, jwt, or oauth. none
MCP_LOG_LEVEL Log level (RFC 5424): debug, info, notice, warning, error. info
CANVAS_PROVIDER_TYPE Set to duckdb to enable DataCanvas spillover for large result sets.
LOGS_DIR Directory for log files (Node.js only). <project-root>/logs
STORAGE_PROVIDER_TYPE Storage backend: in-memory, filesystem, supabase, cloudflare-kv/r2/d1. in-memory
OTEL_ENABLED Enable OpenTelemetry instrumentation. false

See .env.example for the full list of optional overrides.

Running the server

Local development

  • Build and run:

    # One-time build
    bun run rebuild
    
    # Run the built server
    bun run start:stdio
    # or
    bun run start:http
  • Run checks and tests:

    bun run devcheck   # Lint, format, typecheck, security audit
    bun run test       # Vitest test suite

Docker

docker build -t socrata-mcp-server .
docker run --rm -e MCP_TRANSPORT_TYPE=http -p 3010:3010 socrata-mcp-server

The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/socrata-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.

Project structure

Directory Purpose
src/index.ts createApp() entry point — registers tools, resources, prompts, and inits the Socrata service.
src/config Server-specific environment variable parsing and validation with Zod.
src/mcp-server/tools Tool definitions (*.tool.ts). Six tools covering portal listing, dataset search, schema fetch, SoQL query, and DataCanvas SQL.
src/mcp-server/resources Resource definitions (*.resource.ts). Dataset metadata and portal catalog resources.
src/mcp-server/prompts Prompt definitions (*.prompt.ts). Civic data investigation workflow prompt.
src/services/socrata Socrata service layer — SODA 2.1 API client, Discovery API, query builder, type normalization.
tests/ Unit and integration tests mirroring src/.

Development guide

See CLAUDE.md for development guidelines and architectural rules. The short version:

  • Handlers throw, framework catches — no try/catch in tool logic
  • Use ctx.log for request-scoped logging, ctx.state for tenant-scoped storage
  • Call socrata_get_dataset before writing WHERE clauses — column data_type determines quoting
  • Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields

Contributing

Issues and pull requests are welcome. Run checks and tests before submitting:

bun run devcheck
bun run test

License

Apache-2.0 — see LICENSE for details.