Skip to content

fix(mcp): extract_facts entity_hints add items:string for OpenAI strict validator#832

Closed
bautrey wants to merge 1 commit into
garrytan:masterfrom
bautrey:fix/extract-facts-entity-hints-items
Closed

fix(mcp): extract_facts entity_hints add items:string for OpenAI strict validator#832
bautrey wants to merge 1 commit into
garrytan:masterfrom
bautrey:fix/extract-facts-entity-hints-items

Conversation

@bautrey

@bautrey bautrey commented May 10, 2026

Copy link
Copy Markdown

Problem

OpenAI's strict tool-schema validator (used by gpt-5.5 / gpt-5-codex and any client that runs tools/list through the official validator) rejects the entire MCP tool list with:

LLM request rejected: Invalid schema for function 'gbrain__extract_facts':
In context=('properties', 'entity_hints'), array schema missing items.

When this fires, every gbrain MCP tool becomes unavailable to that agent — not just extract_facts — because the validator rejects the entire tool list when any single tool def is invalid. Per JSON Schema (and OpenAI's strict mode), an array MUST declare items.

Root cause

src/core/operations.ts declares entity_hints as { type: 'array', ... } with no items field. The handler already treats the value as string[]:

entityHints: Array.isArray(p.entity_hints) ? (p.entity_hints as string[]) : undefined,

so items: { type: 'string' } matches the implementation. ParamDef already supports items?: ParamDef, and pages_updated (same file) already declares it correctly:

pages_updated: { type: 'array', required: true, items: { type: 'string' } },

buildToolDefs in src/mcp/tool-defs.ts already passes items through to the emitted JSON schema when present, so the fix is purely at the operation declaration site.

Patch

-    entity_hints: { type: 'array', description: '...' },
+    entity_hints: { type: 'array', items: { type: 'string' }, description: '...' },

Scope check

I scanned src/core/operations.ts and src/mcp/ for type: 'array' props missing items: and found only this one in MCP-exposed tool definitions. The only other hit (candidates: { type: 'array' } in src/core/resolvers/builtin/x-api/handle-to-tweet.ts:102) is an internal resolver outputSchema that doesn't go through buildToolDefs, so it doesn't reach the OpenAI tool validator. Left it out of this PR to keep scope tight — happy to send a follow-up if you'd like a clean sweep.

Verified

I patched our pinned install (commit 9c60b3a, v0.31.3) on a live pod, restarted the MCP server, ran tools/list, and confirmed:

  • The emitted extract_facts.inputSchema.properties.entity_hints now has "items": {"type": "string"}.
  • All 61 gbrain tools pass a recursive "every type: array has items" check.
  • The OpenAI strict-validator rejection no longer reproduces — agents can use gbrain tools again.

Closes #831


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…ct validator

The MCP `extract_facts` tool declared `entity_hints` as a JSON-schema
array without an `items` field. OpenAI's strict tool-schema validator
(used by gpt-5.5/gpt-5-codex and the official tool-list validator)
rejects array schemas with no `items` declaration:

  Invalid schema for function 'gbrain__extract_facts':
  In context=('properties', 'entity_hints'), array schema missing items.

When this fires, every gbrain MCP tool becomes unavailable to that
agent — not just extract_facts — because the validator rejects the
entire tool list when any single tool def is invalid.

The handler at line 2471 already treats the value as string[]:

  entityHints: Array.isArray(p.entity_hints) ? (p.entity_hints as string[]) : undefined,

so `items: { type: 'string' }` matches the implementation. The
ParamDef type already supports this (items?: ParamDef at line 198),
and pages_updated at line 1738 already uses it correctly.

Closes garrytan#831
garrytan added a commit that referenced this pull request May 17, 2026
… placement (#1053)

* refactor(mcp): centralize ParamDef→JSON Schema via shared paramDefToSchema

Three duplicate inline mappers existed across the MCP surface:
- src/mcp/tool-defs.ts (stdio MCP buildToolDefs)
- src/commands/serve-http.ts:837 (live HTTP MCP tools/list)
- src/core/minions/tools/brain-allowlist.ts:84 (subagent tool registry)

Each had subtly different items propagation. The HTTP MCP variant dropped
items entirely, leaving extract_facts.entity_hints broken for OAuth-
authenticated remote agents even after a buildToolDefs-only patch. The
subagent variant propagated one level of items but used the same shallow
shape so nested arrays would silently drop.

Extract a single recursive paramDefToSchema helper exported from
src/mcp/tool-defs.ts and have all three mappers consume it. Closes the
bug class at the architecture level instead of patching one site at a
time. The helper copies type, description, enum, default, and recursively
rebuilds items so array-of-arrays preserves inner shape.

Key ordering (type, description, enum, default, items) matches the
pre-v0.34 inline mappers so JSON.stringify output stays byte-stable for
every existing operation that does not use nested arrays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(schema): add items to extract_facts.entity_hints and handle-to-tweet candidates

Two array fields shipped without the items property required by JSON
Schema. Strict-mode validators (Gemini Pro structured outputs, OpenAI
strict tool definitions) reject the entire schema when any type:'array'
lacks items. Downstream agents on those providers couldn't use
extract_facts or the x_handle_to_tweet resolver.

extract_facts.entity_hints — declared items: { type: 'string' } matching
the handler at src/core/operations.ts:2733 which already coerces the
runtime value to string[].

handle_to_tweet outputSchema.candidates — full XTweetCandidate spec
including required + additionalProperties: false. The XTweetCandidate
TypeScript interface declares all five fields as required; without
required in the JSON Schema, a validator would accept {} as a valid
candidate. additionalProperties: false closes the OpenAI strict-mode
contract.

19 community PRs (#1028 #999 #980 #979 #910 #904 #847 #832 #863 #862
#812 for entity_hints; #910 caught candidates) converged on these
locations. This wave cherry-picks the deepest variant (#910 surfaced
both bugs) and centralizes via the paramDefToSchema helper from the
preceding commit so the live HTTP MCP tools/list path is also fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: DmitryBMsk (PR #910)

* fix(git-remote): move --no-recurse-submodules after the subcommand verb

Git CLI accepts two flag positions:
  git [global -c flags] <subcommand> [subcommand flags] [args]

Global -c config flags belong before the verb. Subcommand-specific
flags (like --no-recurse-submodules) belong after. Pre-v0.34
GIT_SSRF_FLAGS spliced both kinds before the verb, so cloneRepo
invoked:
  git -c http.followRedirects=false ... --no-recurse-submodules clone URL DIR

Real git rejects this with exit 129 ("unknown option:
--no-recurse-submodules") because --no-recurse-submodules is a clone
subcommand flag, not a global config flag. Every remote-source clone
broke in production from v0.28 onward. The fake-git harness in
test/git-remote.test.ts exits 0 regardless of argv shape, which is
why CI never caught it.

Split GIT_SSRF_FLAGS (3 -c config flags, spread BEFORE the verb) from
GIT_SSRF_SUBCOMMAND_FLAGS (--no-recurse-submodules, spread AFTER the
verb). cloneRepo and pullRepo both spread the new constant after
their respective verbs. The constant names signal the position rule
so future additions land in the right place.

7 community PRs converged on this location (#1023 #1020 #985 #963
#846 #842#800 doesn't exist). This wave cherry-picks the semantic-
constant approach from #846's GIT_SSRF_SUBCOMMAND_FLAGS name (the
clearest signal of the position rule).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(mcp+git+resolvers): structural array-items + subcommand-position guards

Three new tests / test groups close the bug classes the wave fixes:

test/mcp-tool-defs.test.ts — recursive structural guard walks every
operation's inputSchema and fails with a property path if any
type:'array' lacks items.type. Explicit fixture assertions for
extract_facts.entity_hints.items.type and a synthetic nested-array
ParamDef pinning items.items.type recursion. Without the explicit
fixtures the legacyInlineMap byte-equality test is mirror-theater —
mirroring both sides of the equality preserves the blind spot.

test/git-remote.test.ts — split snapshot test into GIT_SSRF_FLAGS
(3 global -c entries) and GIT_SSRF_SUBCOMMAND_FLAGS
(--no-recurse-submodules). cloneRepo + pullRepo argv tests now assert
the subcommand flag appears AFTER the verb index. Pre-v0.34 the
pinned argv slice prefix included --no-recurse-submodules, which
baked the bug into the test suite (codex catch).

test/resolvers.test.ts — recursive walk over both inputSchema AND
outputSchema for builtin resolvers (xHandleToTweetResolver,
urlReachableResolver). Explicit imports rather than
getDefaultRegistry(), which starts empty until commands/resolvers.ts
runs — codex catch on a hollow-walk failure mode. Dedicated case
pins candidates items shape including required + additionalProperties.

Reference legacyInlineMap in mcp-tool-defs.test.ts mirrors the new
recursive paramDefToSchema helper. No current op uses nested arrays so
the byte-equality test stays green for every existing operation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): raise rerank timeouts for ZE live cold-start

The first rerank call of a CI run hits ZeroEntropy's cold-start latency
(observed ~5-6s on Tier 2 LLM Skills runners; subsequent calls < 500ms).
Two timeouts fired simultaneously at ~5s:

1. bun:test's default 5000ms per-test timeout caused (fail).
2. gateway.rerank's DEFAULT_RERANK_TIMEOUT_MS = 5000 fired right after,
   reported as "Unhandled error between tests".

The next rerank test (top_n=2) ran in 409ms because the API was already
warm. Cold-start is the only issue.

Pass explicit timeoutMs to each rerank() call and a longer per-test
timeout (30s) on both ZE rerank tests. Production DEFAULT_RERANK_TIMEOUT_MS
stays at 5s for the search hot path — these E2E tests bypass it locally
without changing the default that protects user latency.

Unrelated to the fix-wave in this PR (mcp-tool-defs + git-remote + resolver
guards). Lands here to keep Tier 2 LLM Skills green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.35.2.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync for v0.35.2.0

Update CLAUDE.md Key files annotations for the v0.35.2.0 fix wave:

- src/mcp/tool-defs.ts: document new exported recursive paramDefToSchema
  helper and the three-consumer centralization (stdio MCP, HTTP MCP
  tools/list, subagent registry).
- src/core/minions/tools/brain-allowlist.ts: paramsToInputSchema now
  consumes the shared helper.
- src/commands/serve-http.ts: tools/list handler now consumes the shared
  helper (closes the HTTP MCP items-dropped bug class).
- src/core/git-remote.ts: new entry. Documents the GIT_SSRF_FLAGS (global
  config, pre-verb) vs GIT_SSRF_SUBCOMMAND_FLAGS (subcommand-scoped,
  post-verb) split, the 7-month silent regression, and the position-anchored
  regression guard in test/git-remote.test.ts.

Regenerated llms-full.txt to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: rebump version to v0.35.3.0

Queue moved while this PR was open — v0.35.2.0 was claimed by master's
v0.35.1.0 sibling work. Advancing one slot. No code changes; only:
- VERSION + package.json: 0.35.2.0 → 0.35.3.0
- CHANGELOG.md: rewritten header + inline references
- CLAUDE.md: rewritten 4 key-file annotations
- llms-full.txt + llms.txt: regenerated to mirror CLAUDE.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on.

We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs).

Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏

@garrytan garrytan closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(mcp): extract_facts entity_hints array schema missing items field (rejected by OpenAI strict validator)

2 participants