Skip to content

cyanheads/gutenberg-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@cyanheads/gutenberg-mcp-server

Search, browse, and read 75,000+ public-domain books from Project Gutenberg with full plain-text retrieval and offset/limit chunking via MCP. STDIO or Streamable HTTP.

4 Tools

Version License Docker MCP SDK npm TypeScript Bun

Install in Claude Desktop Install in Cursor Install in VS Code

Framework


Tools

Four tools for searching and reading Project Gutenberg's public-domain library:

Tool Description
gutenberg_search_books Search the Gutenberg catalog by title, author, topic, language, or author lifespan — returns popularity-ordered results with IDs ready for follow-up calls
gutenberg_get_book Fetch complete metadata for a book by ID — full formats map, translators, editors, subjects, bookshelves, copyright status, and the has_plain_text flag
gutenberg_get_text Retrieve the plain-text content of a book, stripped of license boilerplate, with offset/limit chunking for context-budget management
gutenberg_browse_popular Browse the most-downloaded books, optionally filtered by language or topic — useful as a discovery entry point

gutenberg_search_books

Search the Project Gutenberg catalog of 78,000+ public-domain books.

  • Full-text search against titles and author names (space-separated words, case-insensitive)
  • Topic filter matches subject headings and bookshelf categories
  • Language filter by ISO 639-1 two-character codes (e.g., ["en"], ["fr", "de"])
  • Author lifespan range filter via author_year_start / author_year_end
  • Sort by popularity (download count), or by Gutenberg ID ascending/descending
  • Batch lookup by known ID list via ids parameter
  • Paginated — up to 32 books per page; use totalCount to determine total pages
  • Each result includes has_plain_text to indicate whether gutenberg_get_text will work

gutenberg_get_book

Fetch complete metadata for a single Project Gutenberg book.

  • Returns the full formats map (MIME type → download URL) including plain text, HTML, EPUB, and cover image
  • Includes translators and editors alongside authors, each with birth/death years
  • has_plain_text flag confirms whether a UTF-8 or ASCII plain-text format is available
  • media_type distinguishes readable text books from audio recordings
  • Use this before gutenberg_get_text to confirm text availability and inspect the formats map

gutenberg_get_text

Retrieve the plain-text content of a Project Gutenberg book, stripped of license boilerplate.

  • Strips the standard Gutenberg license header and footer — response contains only the literary work
  • Offset/limit chunking for long works: novels routinely run 500 KB–2 MB; read in manageable chunks without loading the whole file
  • Response includes totalChars, offset, length, and remainingChars for precise pagination
  • Paragraph-boundary trimming: actual returned length may be slightly less than limit — use length (not limit) to compute the next offset
  • Prefers UTF-8 plain text; falls back to ASCII plain text; converts HTML as a last resort
  • Refuses audio books (media_type "Sound") with a clear recovery hint
  • provenance field carries the Gutenberg ID, title, and license URL for attribution

gutenberg_browse_popular

Browse the most-downloaded Project Gutenberg books.

  • Returns up to 32 titles ordered by download count (most popular first)
  • Optionally filter by language (ISO 639-1 codes) and/or topic keyword
  • Useful as a discovery entry point: "what are the most popular classics in French?"
  • totalInCatalog provides full context — "top 20 of 60,000"

Features

Built on @cyanheads/mcp-ts-core:

  • Declarative tool definitions — single file per tool, framework handles registration and validation
  • Unified error handling — handlers throw, framework catches, classifies, and formats with recovery hints
  • Pluggable auth: none, jwt, oauth
  • Swappable storage backends: in-memory, filesystem, Supabase, Cloudflare KV/R2/D1
  • Structured logging with optional OpenTelemetry tracing
  • STDIO and Streamable HTTP transports

Project Gutenberg integration:

  • Catalog search and metadata via Gutendex — an unofficial but stable JSON API over the Gutenberg dataset
  • Full plain-text retrieval directly from Project Gutenberg file servers with transparent UTF-8/ASCII/HTML fallback chain
  • In-session text caching: book text is fetched once per session and served from cache for subsequent chunk reads
  • No API key required — Project Gutenberg data is freely available; no registration needed

Agent-friendly output:

  • has_plain_text flag on every search/browse result so agents can pre-filter before attempting text retrieval
  • Precise chunking contract: offset, length, totalChars, remainingChars, hasMore on every gutenberg_get_text response for reliable sequential reads
  • provenance field on every text response for attribution
  • Discriminated sourceFormat field (text/plain; charset=utf-8, text/plain; charset=us-ascii, text/html) so agents know the fidelity of the text

Getting started

No API key required. Add the following to your MCP client configuration file:

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/gutenberg-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with npx (no Bun required):

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cyanheads/gutenberg-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with Docker:

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "stdio",
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "MCP_TRANSPORT_TYPE=stdio",
        "ghcr.io/cyanheads/gutenberg-mcp-server:latest"
      ]
    }
  }
}

For Streamable HTTP, set the transport and start the server:

MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp

Prerequisites

  • Bun v1.3.11 or higher (or Node.js v24+).
  • No API key required — Project Gutenberg data is freely available.

Installation

  1. Clone the repository:
git clone https://github.com/cyanheads/gutenberg-mcp-server.git
  1. Navigate into the directory:
cd gutenberg-mcp-server
  1. Install dependencies:
bun install
  1. Configure environment:
cp .env.example .env
# edit .env if you need to override any defaults

Configuration

Variable Description Default
GUTENDEX_BASE_URL Base URL for the Gutendex catalog API. Override for self-hosted instances. https://gutendex.com/books/
GUTENBERG_TEXT_BASE_URL Base URL for Project Gutenberg file servers. Override for mirrors. https://www.gutenberg.org
MCP_TRANSPORT_TYPE Transport: stdio or http. stdio
MCP_HTTP_PORT Port for HTTP server. 3010
MCP_AUTH_MODE Auth mode: none, jwt, or oauth. none
MCP_LOG_LEVEL Log level (RFC 5424). info
LOGS_DIR Directory for log files (Node.js only). <project-root>/logs
STORAGE_PROVIDER_TYPE Storage backend. in-memory
OTEL_ENABLED Enable OpenTelemetry instrumentation. false

See .env.example for the full list of optional overrides.


Running the server

Local development

  • Build and run:

    # One-time build
    bun run rebuild
    
    # Run the built server
    bun run start:stdio
    # or
    bun run start:http
  • Run checks and tests:

    bun run devcheck   # Lint, format, typecheck, security
    bun run test       # Vitest test suite
    bun run lint:mcp   # Validate MCP definitions against spec

Docker

docker build -t gutenberg-mcp-server .
docker run --rm -p 3010:3010 gutenberg-mcp-server

The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/gutenberg-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.


Project structure

Path Purpose
src/index.ts createApp() entry point — registers tools and inits services.
src/config/server-config.ts Server-specific environment variable parsing (Gutendex and file-server URL overrides).
src/mcp-server/tools/definitions/ Tool definitions (*.tool.ts).
src/services/gutendex/ Gutendex catalog API client — search and book metadata.
src/services/gutenberg-text/ Full plain-text retrieval, boilerplate stripping, in-session caching, and chunking.
tests/ Unit and integration tests mirroring src/.

Development guide

See CLAUDE.md / AGENTS.md for development guidelines and architectural rules. The short version:

  • Handlers throw, framework catches — no try/catch in tool logic
  • Use ctx.log for request-scoped logging, ctx.state for tenant-scoped storage
  • Register new tools via the entry arrays in src/index.ts
  • Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields

Contributing

Issues and pull requests are welcome. Run checks and tests before submitting:

bun run devcheck
bun run test

License

Apache-2.0 — see LICENSE for details.

Data from Project Gutenberg is in the public domain. Catalog metadata sourced from Gutendex (MIT license).

About

Search, browse, and read 75,000+ public-domain books from Project Gutenberg with full plain-text retrieval and offset/limit chunking via MCP. STDIO or Streamable HTTP.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors