Skip to content

feat: add PDF analysis tool with native provider support#31319

Merged
tyler6204 merged 2 commits intomainfrom
feat/pdf-analysis-tool
Mar 2, 2026
Merged

feat: add PDF analysis tool with native provider support#31319
tyler6204 merged 2 commits intomainfrom
feat/pdf-analysis-tool

Conversation

@tyler6204
Copy link
Member

@tyler6204 tyler6204 commented Mar 2, 2026

Summary

Adds a new pdf tool for analyzing PDF documents, similar to the existing image tool.

Native PDF support

  • Anthropic Claude — Native via DocumentBlockParam
  • Google Gemini — Native via inlineData with application/pdf MIME
  • All other providers — Extraction fallback: text via pdfjs-dist, rasterized images via @napi-rs/canvas

Tool parameters

  • prompt — what to analyze
  • pdf / pdfs — single or multiple PDF paths/URLs (up to 10)
  • pages — page range (e.g. "1-5", "1,3,5-7")
  • maxBytesMb — max file size
  • model — override model (defaults to image model / session default)

Config

  • tools.pdf.model — model override for PDF analysis
  • tools.pdf.maxBytesMb — max file size (default 10)
  • tools.pdf.maxPages — max pages (default 20)

Implementation

  • New files: pdf-tool.ts, pdf-tool.helpers.ts, pdf-native-providers.ts, pdf-tool.test.ts
  • Extended model catalog with document input type for capability detection
  • Registered in tool factory alongside image tool
  • 43 tests, all passing

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: XL maintainer Maintainer-authored PR labels Mar 2, 2026
@tyler6204 tyler6204 self-assigned this Mar 2, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91a866ed15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +337 to +341
if (providerSupportsNativePdf(provider)) {
const pdfs = params.pdfBuffers.map((p) => ({
base64: p.base64,
filename: p.filename,
}));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply page range filtering for native PDF requests

The pages argument is parsed and used only for local extraction, but this native-provider path sends the original pdfs buffers directly and never applies that subset. In the common Anthropic/Google path, a call like pages: "1-2" still analyzes the full document, which can materially change results and increase token/cost/latency on large PDFs. Either slice the uploaded PDF to the selected pages or force the extraction fallback whenever pages is provided.

Useful? React with 👍 / 👎.

Comment on lines +616 to +620
if (media.kind !== "document") {
// Check MIME type more specifically
const ct = (media.contentType ?? "").toLowerCase();
if (!ct.includes("pdf") && !ct.includes("application/pdf")) {
throw new Error(`Expected PDF but got ${media.contentType ?? media.kind}: ${pdfRaw}`);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate MIME type even when media kind is document

This check only runs when media.kind !== "document", but loadWebMediaRaw classifies all application/* files as document, so non-PDF inputs (for example DOCX) bypass validation and are processed as if they were PDFs. That causes late, confusing failures in extraction/native calls instead of a clear early rejection. The tool should verify PDF MIME/signature even when the kind is already document.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR adds a new pdf tool for analyzing PDF documents with native support for Anthropic Claude and Google Gemini providers, plus extraction fallback for other providers.

Key changes:

  • New PDF tool (src/agents/tools/pdf-tool.ts, 692 lines) - follows existing image tool pattern with sandbox support, file size limits, and page range filtering
  • Native provider integrations (pdf-native-providers.ts) - direct SDK/HTTP calls for Anthropic (Messages API with pdfs-2024-09-25 beta) and Google Gemini (generateContent API) to send raw PDF bytes instead of extracting content first
  • Extraction fallback - for non-native providers, uses pdfjs-dist for text extraction and @napi-rs/canvas for rasterized images when text is insufficient
  • Model catalog extension - added document as a new ModelInputType alongside text and image, plus modelSupportsDocument() helper
  • Configuration options - pdfModel, pdfMaxBytesMb (default 10), pdfMaxPages (default 20) in agent defaults with proper Zod schemas and help text
  • Tool registration - integrated into openclaw-tools.ts factory alongside image tool with consistent sandbox/fsPolicy handling
  • Comprehensive tests - 599-line test file with 43 tests covering page parsing, provider detection, model resolution, input validation, and API calls

Implementation quality:

  • Follows project patterns (mirrors image tool structure)
  • Proper error handling and input validation
  • Security controls (sandbox support, size limits, local roots)
  • Good test coverage across unit and integration scenarios
  • Respects tool schema guardrails (no Type.Union, uses Type.Optional)
  • Clean TypeScript with proper typing

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects well-structured implementation following established patterns, comprehensive test coverage (43 tests), proper error handling, security controls, and alignment with project coding standards. The feature is additive (new tool) with no breaking changes to existing functionality.
  • No files require special attention

Last reviewed commit: 91a866e

@tyler6204 tyler6204 force-pushed the feat/pdf-analysis-tool branch from 91a866e to 4179e3a Compare March 2, 2026 06:29
@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation gateway Gateway runtime labels Mar 2, 2026
New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4179e3a1ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +50 to +51
if (num <= maxPages) {
pages.add(num);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse page ranges without capping by page index

The page parser currently drops any explicit page number above maxPages (if (num <= maxPages)), so with the default pdfMaxPages=20 a request like pages: "50" on a long PDF becomes an empty selection. In extraction mode this yields no PDF context, and in native mode the empty array bypasses the pages guard and sends the full document, so accuracy/cost no longer match the caller’s requested pages. maxPages should cap how many selected pages are processed, not the highest page number users can request.

Useful? React with 👍 / 👎.

Comment on lines +199 to +201
const effectiveCfg: OpenClawConfig | undefined = params.cfg
? {
...params.cfg,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve inferred PDF model config when cfg is undefined

Auto-detection in createPdfTool can produce a valid pdfModelConfig without a config object (for example from env auth), but runPdfPrompt sets effectiveCfg to undefined whenever params.cfg is missing. That means runWithImageModelFallback receives no imageModel candidates and throws No image model configured, so the tool can be created successfully but fails at execution unless the caller manually provides a model override.

Useful? React with 👍 / 👎.

@tyler6204 tyler6204 force-pushed the feat/pdf-analysis-tool branch from 4179e3a to 0b57ca3 Compare March 2, 2026 06:38
@tyler6204 tyler6204 merged commit d0ac1b0 into main Mar 2, 2026
13 of 14 checks passed
@tyler6204 tyler6204 deleted the feat/pdf-analysis-tool branch March 2, 2026 06:39
@tyler6204
Copy link
Member Author

Merged via squash.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b57ca3b5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +140 to +144
const baseUrl = (params.baseUrl ?? "https://generativelanguage.googleapis.com").replace(
/\/+$/,
"",
);
const url = `${baseUrl}/v1beta/models/${encodeURIComponent(params.modelId)}:generateContent?key=${encodeURIComponent(apiKey)}`;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize Gemini PDF base URL before appending v1beta

geminiAnalyzePdf always appends "/v1beta/models/..." to params.baseUrl, so configurations that already provide a Google base URL ending in /v1beta generate a malformed endpoint like .../v1beta/v1beta/models/... and native PDF calls fail. This repo already treats /v1beta as a valid Google base URL in other paths, so this breaks real configured deployments rather than just edge inputs.

Useful? React with 👍 / 👎.

Comment on lines +54 to +56
const effectivePages: number[] = pageNumbers
? pageNumbers.filter((p) => p >= 1 && p <= pdf.numPages).slice(0, maxPages)
: Array.from({ length: Math.min(pdf.numPages, maxPages) }, (_, i) => i + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject page filters that match zero PDF pages

When explicit pageNumbers are supplied but all values are outside the document bounds, effectivePages becomes empty and extraction proceeds without error; downstream, the tool can send only the user prompt to the model and return an answer with no PDF context. A request like pages: "99" on a short PDF should fail fast instead of silently analyzing nothing.

Useful? React with 👍 / 👎.

...(webSearchTool ? [webSearchTool] : []),
...(webFetchTool ? [webFetchTool] : []),
...(imageTool ? [imageTool] : []),
...(pdfTool ? [pdfTool] : []),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Register PDF in core tool profiles and group allowlists

This adds pdf to the runtime tool list, but profile/group filtering is driven by src/agents/tool-catalog.ts, which has no pdf entry, so tools.profile users (especially coding) and group:openclaw allowlists will silently filter the new built-in tool out. As shipped, the feature is unavailable in those common policy configurations unless users add manual overrides.

Useful? React with 👍 / 👎.

robertchang-ga pushed a commit to robertchang-ga/openclaw that referenced this pull request Mar 2, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
hanqizheng pushed a commit to hanqizheng/openclaw that referenced this pull request Mar 2, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
robertchang-ga pushed a commit to robertchang-ga/openclaw that referenced this pull request Mar 2, 2026
execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
sachinkundu pushed a commit to sachinkundu/openclaw that referenced this pull request Mar 6, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (openclaw#31319) (thanks @tyler6204)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation gateway Gateway runtime maintainer Maintainer-authored PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant