Skip to content

pdf-tool adds ~3.5s overhead per agent run from eager pdfjs-dist import in core-plugin-tool stages #76500

@asa1525

Description

@asa1525

Description

OpenClaw version: 2026.5.2 (8b2a6e5)
Node: v22.22.0
OS: Linux 6.8.0-101-generic (x64), 5-core Intel Xeon Gold 6152 @ 2.10GHz, 5.8GB RAM

Every agent run incurs a ~3.5s delay during the core-plugin-tool stages phase caused by pdf-tool loading pdfjs-dist/legacy/build/pdf.mjs (~1MB) via dynamic import().

Evidence

core-plugin-tool stages: totalMs=4395 ... openclaw-tools:pdf-tool:3961ms@4395ms ...

Consistently reproducible across multiple runs:

Run totalMs pdf-tool
1 4083ms 3655ms
2 4136ms 3736ms
3 4395ms 3961ms

The pdf-tool stage alone accounts for ~90% of total tool initialization time.

Root Cause

In dist/extensions/document-extract/document-extractor.js:

const PDFJS_MODULE = "pdfjs-dist/legacy/build/pdf.mjs";

async function loadPdfJsModule() {
  if (!pdfJsModulePromise)
    pdfJsModulePromise = import(PDFJS_MODULE)...
  return pdfJsModulePromise;
}

The pdfjs-dist/legacy/build/pdf.mjs file is ~1MB. Even though it uses a promise cache (pdfJsModulePromise), the import() is triggered during tool registration/staging, not on first actual PDF use. This means every agent session pays the 3.5s cost even when no PDF is involved.

Why This Is New

The core-plugin-tool stages trace logging and potentially the eager tool staging behavior appear to be introduced in 2026.5.2. Logs from 2026-05-02 (previous version) show zero core-plugin-tool stages entries, suggesting the tool initialization path changed.

Environment Context

Load average: 0.37, 0.23, 0.23
Memory: 5.8GB total, 1.7GB used, 4.1GB available
Swap: 5.0GB total, 1.6GB used

VPS is not under resource pressure. The delay is purely from pdfjs-dist module parse/compile time.

Suggested Fixes

  1. Truly lazy-load pdfjs-dist: Only import() when a PDF file is actually being processed, not during tool registration/staging
  2. Pre-compile/cache: Pre-compile the pdfjs-dist module during gateway startup (background warmup) so subsequent import() calls are near-instant
  3. Make tools.deny configurable: Currently tools.deny is a protected path that cannot be set via config.patch or config.apply, only via CLI which the gateway ignores at runtime — making it impossible for users to disable specific expensive tools
  4. Configurable enabled flag: Add a tools.media.pdf.enabled: false option (similar to tools.media.audio.enabled)

Workaround Attempted

  • Renaming pdf.mjspdf.mjs.disabled did not prevent the delay (tool staging still attempted the import via fallback path)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions