fix(x_search): surface degraded results + validate dates by kshitijk4poor · Pull Request #29484 · NousResearch/hermes-agent

kshitijk4poor · 2026-05-20T21:09:45Z

fix(x_search): surface degraded results + validate dates

Summary

The x_search tool currently returns success=true in two failure modes that callers — and the model invoking the tool — cannot distinguish from a real, citation-backed result:

Filter miss. Any narrowing filter (allowed_x_handles, excluded_x_handles, from_date, to_date) is set, but the X index returns no matching posts. xAI still 200s and Grok synthesizes an answer from its own training data. The response looks identical to a real result except citations and inline_citations are both empty.
Impossible / malformed date range. from_date/to_date is not YYYY-MM-DD, or from_date > to_date, or from_date is in the future. The API accepts the request, burns the quota, and returns a generic answer with no citations.

This PR adds two purely client-side mitigations.

Date validation

from_date / to_date are now validated before the HTTP call:

Both, if provided, must parse as YYYY-MM-DD.
When both are set, from_date <= to_date.
from_date must not be later than today UTC — no posts can exist in a window that hasn't started yet.
to_date in the future is allowed (callers may legitimately request "from yesterday to tomorrow" to catch posts as they arrive).

Validation failures return a structured {"error": "..."} tool result via the existing tool_error() helper — no HTTP call to xAI, no wasted quota.

Degraded-result flag

Successful responses now carry two additional fields:

degraded: bool — true iff any narrowing filter was active AND both the top-level citations array and the inline url_citation annotations came back empty.
degraded_reason: str | None — short string naming which filters were active when set, null otherwise.

A broad query with no filters that returns no citations is not flagged degraded. That case is just an unsourced answer — the caller can already tell from inline_citations == [] if they care. The flag specifically targets the "I asked for X under filter Y and got an answer that ignores Y" case, which is the misleading one.

The fields are additive on the success-path response shape; no existing field is removed or changed.

Why this matters for agent loops

The agent calling x_search currently has no way to tell a model-synthesized fluff answer from a real citation-backed result. With degraded, the agent can branch: retry with broader filters, fall back to xurl for direct X API reads, or surface "no real X posts found, here's what the model knows" to the user instead of presenting fluff as fact. For cron jobs and skills that consume x_search non-interactively, this is the difference between silently shipping hallucinated content and explicitly flagging it.

Scope

This PR intentionally does one thing: defensive output. It does not:

Touch credential resolution.
Add a warnings field (PR feat(x_search): add structured output and response chaining #27416 explicitly listed that as out-of-scope; we use a typed degraded boolean instead).
Overlap with structured output / response chaining (PR feat(x_search): add structured output and response chaining #27416).
Overlap with the x_search vs xurl routing docs (PR docs(xai): clarify x_search and xurl routing #29423).

If #27416 lands first, this PR rebases cleanly — both touch tools/x_search_tool.py but in non-overlapping regions (validation helper + success-path return).

Discovery

Reproduced end-to-end while testing the x_search toolset:

from_date=2030-01-01, to_date=2030-01-07, query="anything" → 200 OK with a sassy generic response, zero citations. Now: rejected client-side with from_date (2030-01-01) is in the future; X Search only indexes past posts.
allowed_x_handles=["Teknium1"], query="Nous Research" → confident encyclopedic write-up of Nous Research with zero citations (the @teknium1 handle is intermittently missing from xAI's X index even when the account is actively posting). Now: degraded=true, degraded_reason="no citations returned despite filters: allowed_x_handles".

Tests

tests/tools/test_x_search_tool.py adds 12 tests in two groups:

Date validation (6 tests):

Malformed from_date rejected with clear error
Malformed to_date rejected with clear error
Inverted range rejected (from > to)
Future from_date rejected (today UTC frozen for determinism)
Future to_date allowed when from_date is in the past
from_date == today UTC accepted as edge case

All rejection paths include a _no_post_allowed fence that fails the test if requests.post is called — proving validation happens before HTTP.

Degraded flag (6 tests):

allowed_x_handles + empty citations → degraded=true
excluded_x_handles + empty citations → degraded=true
Date range + empty citations → degraded=true, reason names both fields
Filter + inline url_citation annotation → degraded=false
Filter + top-level citations array → degraded=false
No filters + empty citations → degraded=false (broad-query baseline)

$ bash scripts/run_tests.sh tests/tools/test_x_search_tool.py tests/test_toolsets.py
============================== 52 passed in 1.11s ==============================

Existing 13 test_x_search_tool.py tests continue to pass unchanged.

Docs

website/docs/user-guide/features/x-search.md:

Adds degraded and degraded_reason to the documented return-value list.
Adds a "Date validation" subsection under "Tool parameters".
Adds a "degraded: true — answer with no citations" troubleshooting entry covering the three real-world causes (typo, narrow window, index gap).

Out of scope (worth flagging for follow-up)

Automatic xurl fallback when degraded=true. Could be done in the agent loop or a dedicated wrapper; doesn't belong in the tool itself. Tracked mentally; will open a separate issue if there's appetite.
Reporting the @teknium1 index gap upstream to xAI. Reproducible: from:Teknium1 query returns the empty-citations fluff path. Not actionable from our side.

@teknium1

The xAI Responses API for x_search returns 200 OK with a synthesized fluff answer in two failure modes that callers currently cannot distinguish from a real, citation-backed result: 1. Any narrowing filter (allowed_x_handles, excluded_x_handles, from_date, to_date) was active, but the X index returned no matching posts. The model then answers from training data. 2. The date range is malformed, inverted, or pure-future (e.g. from_date=2030-01-01). The API call burns quota and Grok responds with a generic answer. Mitigations, both client-side: * Validate from_date / to_date before the HTTP call: - Strict YYYY-MM-DD. - from_date <= to_date when both set. - from_date <= today UTC (no posts in a window that hasn't started). to_date in the future remains allowed so callers can request 'from yesterday to tomorrow'. * Add 'degraded' + 'degraded_reason' to successful responses. degraded=True iff any narrowing filter was active AND both the top-level 'citations' array and inline 'url_citation' annotations came back empty. A broad query with no filters that returns no citations is *not* flagged degraded — that case is just an unsourced answer, not a filter miss. Tests cover all four validation paths plus six degraded-flag scenarios (each filter type, inline vs top-level citation recovery, broad query baseline). All existing tests continue to pass; the additions are purely additive on the success-path response shape. Discovered while testing the x_search toolset end-to-end: queries scoped to @teknium1 returned confident-sounding generic text about Nous Research with zero citations, and from_date in 2030 produced sassy non-answers. Both are now detectable by the caller.

…degraded-flag Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.

Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/tools Tool registry, model_tools, toolsets tool/web Web search and extraction labels May 20, 2026

kshitijk4poor merged commit 3ce1cf2 into NousResearch:main May 20, 2026
16 of 17 checks passed

Haderach-Ram mentioned this pull request May 21, 2026

Ecosystem Digest — 2026-05-21 Haderach-Ram/openclaw-radar#14

Open

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

alt-glitch pushed a commit that referenced this pull request Jun 14, 2026

Merge pull request #29484 from kshitijk4poor/kp/x-search-degraded-flag

6c7fe70

Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(x_search): surface degraded results + validate dates#29484

fix(x_search): surface degraded results + validate dates#29484
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
kshitijk4poor:kp/x-search-degraded-flag

kshitijk4poor commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented May 20, 2026

fix(x_search): surface degraded results + validate dates

Summary

Date validation

Degraded-result flag

Why this matters for agent loops

Scope

Discovery

Tests

Docs

Out of scope (worth flagging for follow-up)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants