Skip to content

fix(x_search): surface degraded results + validate dates#29484

Merged
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
kshitijk4poor:kp/x-search-degraded-flag
May 20, 2026
Merged

fix(x_search): surface degraded results + validate dates#29484
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
kshitijk4poor:kp/x-search-degraded-flag

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

fix(x_search): surface degraded results + validate dates

Summary

The x_search tool currently returns success=true in two failure modes that callers — and the model invoking the tool — cannot distinguish from a real, citation-backed result:

  1. Filter miss. Any narrowing filter (allowed_x_handles, excluded_x_handles, from_date, to_date) is set, but the X index returns no matching posts. xAI still 200s and Grok synthesizes an answer from its own training data. The response looks identical to a real result except citations and inline_citations are both empty.
  2. Impossible / malformed date range. from_date/to_date is not YYYY-MM-DD, or from_date > to_date, or from_date is in the future. The API accepts the request, burns the quota, and returns a generic answer with no citations.

This PR adds two purely client-side mitigations.

Date validation

from_date / to_date are now validated before the HTTP call:

  • Both, if provided, must parse as YYYY-MM-DD.
  • When both are set, from_date <= to_date.
  • from_date must not be later than today UTC — no posts can exist in a window that hasn't started yet.
  • to_date in the future is allowed (callers may legitimately request "from yesterday to tomorrow" to catch posts as they arrive).

Validation failures return a structured {"error": "..."} tool result via the existing tool_error() helper — no HTTP call to xAI, no wasted quota.

Degraded-result flag

Successful responses now carry two additional fields:

  • degraded: booltrue iff any narrowing filter was active AND both the top-level citations array and the inline url_citation annotations came back empty.
  • degraded_reason: str | None — short string naming which filters were active when set, null otherwise.

A broad query with no filters that returns no citations is not flagged degraded. That case is just an unsourced answer — the caller can already tell from inline_citations == [] if they care. The flag specifically targets the "I asked for X under filter Y and got an answer that ignores Y" case, which is the misleading one.

The fields are additive on the success-path response shape; no existing field is removed or changed.

Why this matters for agent loops

The agent calling x_search currently has no way to tell a model-synthesized fluff answer from a real citation-backed result. With degraded, the agent can branch: retry with broader filters, fall back to xurl for direct X API reads, or surface "no real X posts found, here's what the model knows" to the user instead of presenting fluff as fact. For cron jobs and skills that consume x_search non-interactively, this is the difference between silently shipping hallucinated content and explicitly flagging it.

Scope

This PR intentionally does one thing: defensive output. It does not:

If #27416 lands first, this PR rebases cleanly — both touch tools/x_search_tool.py but in non-overlapping regions (validation helper + success-path return).

Discovery

Reproduced end-to-end while testing the x_search toolset:

  • from_date=2030-01-01, to_date=2030-01-07, query="anything"200 OK with a sassy generic response, zero citations. Now: rejected client-side with from_date (2030-01-01) is in the future; X Search only indexes past posts.
  • allowed_x_handles=["Teknium1"], query="Nous Research" → confident encyclopedic write-up of Nous Research with zero citations (the @teknium1 handle is intermittently missing from xAI's X index even when the account is actively posting). Now: degraded=true, degraded_reason="no citations returned despite filters: allowed_x_handles".

Tests

tests/tools/test_x_search_tool.py adds 12 tests in two groups:

Date validation (6 tests):

  • Malformed from_date rejected with clear error
  • Malformed to_date rejected with clear error
  • Inverted range rejected (from > to)
  • Future from_date rejected (today UTC frozen for determinism)
  • Future to_date allowed when from_date is in the past
  • from_date == today UTC accepted as edge case

All rejection paths include a _no_post_allowed fence that fails the test if requests.post is called — proving validation happens before HTTP.

Degraded flag (6 tests):

  • allowed_x_handles + empty citations → degraded=true
  • excluded_x_handles + empty citations → degraded=true
  • Date range + empty citations → degraded=true, reason names both fields
  • Filter + inline url_citation annotation → degraded=false
  • Filter + top-level citations array → degraded=false
  • No filters + empty citations → degraded=false (broad-query baseline)
$ bash scripts/run_tests.sh tests/tools/test_x_search_tool.py tests/test_toolsets.py
============================== 52 passed in 1.11s ==============================

Existing 13 test_x_search_tool.py tests continue to pass unchanged.

Docs

website/docs/user-guide/features/x-search.md:

  • Adds degraded and degraded_reason to the documented return-value list.
  • Adds a "Date validation" subsection under "Tool parameters".
  • Adds a "degraded: true — answer with no citations" troubleshooting entry covering the three real-world causes (typo, narrow window, index gap).

Out of scope (worth flagging for follow-up)

  • Automatic xurl fallback when degraded=true. Could be done in the agent loop or a dedicated wrapper; doesn't belong in the tool itself. Tracked mentally; will open a separate issue if there's appetite.
  • Reporting the @teknium1 index gap upstream to xAI. Reproducible: from:Teknium1 query returns the empty-citations fluff path. Not actionable from our side.

The xAI Responses API for x_search returns 200 OK with a
synthesized fluff answer in two failure modes that callers currently
cannot distinguish from a real, citation-backed result:

1. Any narrowing filter (allowed_x_handles, excluded_x_handles,
   from_date, to_date) was active, but the X index returned no
   matching posts. The model then answers from training data.
2. The date range is malformed, inverted, or pure-future (e.g.
   from_date=2030-01-01). The API call burns quota and Grok
   responds with a generic answer.

Mitigations, both client-side:

* Validate from_date / to_date before the HTTP call:
  - Strict YYYY-MM-DD.
  - from_date <= to_date when both set.
  - from_date <= today UTC (no posts in a window that hasn't
    started). to_date in the future remains allowed so callers
    can request 'from yesterday to tomorrow'.

* Add 'degraded' + 'degraded_reason' to successful responses.
  degraded=True iff any narrowing filter was active AND both the
  top-level 'citations' array and inline 'url_citation'
  annotations came back empty. A broad query with no filters that
  returns no citations is *not* flagged degraded — that case is
  just an unsourced answer, not a filter miss.

Tests cover all four validation paths plus six degraded-flag
scenarios (each filter type, inline vs top-level citation
recovery, broad query baseline). All existing tests continue to
pass; the additions are purely additive on the success-path
response shape.

Discovered while testing the x_search toolset end-to-end:
queries scoped to @teknium1 returned confident-sounding generic
text about Nous Research with zero citations, and from_date in
2030 produced sassy non-answers. Both are now detectable by the
caller.
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/tools Tool registry, model_tools, toolsets tool/web Web search and extraction labels May 20, 2026
@kshitijk4poor kshitijk4poor merged commit 3ce1cf2 into NousResearch:main May 20, 2026
16 of 17 checks passed
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…degraded-flag

Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.
Seven74AI pushed a commit to Seven74AI/hermes-agent that referenced this pull request Jun 13, 2026
…degraded-flag

Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.
alt-glitch pushed a commit that referenced this pull request Jun 14, 2026
Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have tool/web Web search and extraction type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants