fix(x_search): surface degraded results + validate dates#29484
Merged
kshitijk4poor merged 1 commit intoMay 20, 2026
Conversation
The xAI Responses API for x_search returns 200 OK with a
synthesized fluff answer in two failure modes that callers currently
cannot distinguish from a real, citation-backed result:
1. Any narrowing filter (allowed_x_handles, excluded_x_handles,
from_date, to_date) was active, but the X index returned no
matching posts. The model then answers from training data.
2. The date range is malformed, inverted, or pure-future (e.g.
from_date=2030-01-01). The API call burns quota and Grok
responds with a generic answer.
Mitigations, both client-side:
* Validate from_date / to_date before the HTTP call:
- Strict YYYY-MM-DD.
- from_date <= to_date when both set.
- from_date <= today UTC (no posts in a window that hasn't
started). to_date in the future remains allowed so callers
can request 'from yesterday to tomorrow'.
* Add 'degraded' + 'degraded_reason' to successful responses.
degraded=True iff any narrowing filter was active AND both the
top-level 'citations' array and inline 'url_citation'
annotations came back empty. A broad query with no filters that
returns no citations is *not* flagged degraded — that case is
just an unsourced answer, not a filter miss.
Tests cover all four validation paths plus six degraded-flag
scenarios (each filter type, inline vs top-level citation
recovery, broad query baseline). All existing tests continue to
pass; the additions are purely additive on the success-path
response shape.
Discovered while testing the x_search toolset end-to-end:
queries scoped to @teknium1 returned confident-sounding generic
text about Nous Research with zero citations, and from_date in
2030 produced sassy non-answers. Both are now detectable by the
caller.
1 task
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…degraded-flag Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.
Seven74AI
pushed a commit
to Seven74AI/hermes-agent
that referenced
this pull request
Jun 13, 2026
…degraded-flag Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.
alt-glitch
pushed a commit
that referenced
this pull request
Jun 14, 2026
Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(x_search): surface degraded results + validate dates
Summary
The
x_searchtool currently returnssuccess=truein two failure modes that callers — and the model invoking the tool — cannot distinguish from a real, citation-backed result:allowed_x_handles,excluded_x_handles,from_date,to_date) is set, but the X index returns no matching posts. xAI still 200s and Grok synthesizes an answer from its own training data. The response looks identical to a real result exceptcitationsandinline_citationsare both empty.from_date/to_dateis notYYYY-MM-DD, orfrom_date > to_date, orfrom_dateis in the future. The API accepts the request, burns the quota, and returns a generic answer with no citations.This PR adds two purely client-side mitigations.
Date validation
from_date/to_dateare now validated before the HTTP call:YYYY-MM-DD.from_date <= to_date.from_datemust not be later than today UTC — no posts can exist in a window that hasn't started yet.to_datein the future is allowed (callers may legitimately request "from yesterday to tomorrow" to catch posts as they arrive).Validation failures return a structured
{"error": "..."}tool result via the existingtool_error()helper — no HTTP call to xAI, no wasted quota.Degraded-result flag
Successful responses now carry two additional fields:
degraded: bool—trueiff any narrowing filter was active AND both the top-levelcitationsarray and the inlineurl_citationannotations came back empty.degraded_reason: str | None— short string naming which filters were active when set,nullotherwise.A broad query with no filters that returns no citations is not flagged degraded. That case is just an unsourced answer — the caller can already tell from
inline_citations == []if they care. The flag specifically targets the "I asked for X under filter Y and got an answer that ignores Y" case, which is the misleading one.The fields are additive on the success-path response shape; no existing field is removed or changed.
Why this matters for agent loops
The agent calling
x_searchcurrently has no way to tell a model-synthesized fluff answer from a real citation-backed result. Withdegraded, the agent can branch: retry with broader filters, fall back toxurlfor direct X API reads, or surface "no real X posts found, here's what the model knows" to the user instead of presenting fluff as fact. For cron jobs and skills that consumex_searchnon-interactively, this is the difference between silently shipping hallucinated content and explicitly flagging it.Scope
This PR intentionally does one thing: defensive output. It does not:
warningsfield (PR feat(x_search): add structured output and response chaining #27416 explicitly listed that as out-of-scope; we use a typeddegradedboolean instead).x_searchvsxurlrouting docs (PR docs(xai): clarify x_search and xurl routing #29423).If #27416 lands first, this PR rebases cleanly — both touch
tools/x_search_tool.pybut in non-overlapping regions (validation helper + success-path return).Discovery
Reproduced end-to-end while testing the
x_searchtoolset:from_date=2030-01-01, to_date=2030-01-07, query="anything"→200 OKwith a sassy generic response, zero citations. Now: rejected client-side withfrom_date (2030-01-01) is in the future; X Search only indexes past posts.allowed_x_handles=["Teknium1"], query="Nous Research"→ confident encyclopedic write-up of Nous Research with zero citations (the @teknium1 handle is intermittently missing from xAI's X index even when the account is actively posting). Now:degraded=true,degraded_reason="no citations returned despite filters: allowed_x_handles".Tests
tests/tools/test_x_search_tool.pyadds 12 tests in two groups:Date validation (6 tests):
from_daterejected with clear errorto_daterejected with clear errorfrom > to)from_daterejected (today UTC frozen for determinism)to_dateallowed whenfrom_dateis in the pastfrom_date == today UTCaccepted as edge caseAll rejection paths include a
_no_post_allowedfence that fails the test ifrequests.postis called — proving validation happens before HTTP.Degraded flag (6 tests):
allowed_x_handles+ empty citations →degraded=trueexcluded_x_handles+ empty citations →degraded=truedegraded=true, reason names both fieldsurl_citationannotation →degraded=falsecitationsarray →degraded=falsedegraded=false(broad-query baseline)Existing 13
test_x_search_tool.pytests continue to pass unchanged.Docs
website/docs/user-guide/features/x-search.md:degradedanddegraded_reasonto the documented return-value list.degraded: true— answer with no citations" troubleshooting entry covering the three real-world causes (typo, narrow window, index gap).Out of scope (worth flagging for follow-up)
degraded=true. Could be done in the agent loop or a dedicated wrapper; doesn't belong in the tool itself. Tracked mentally; will open a separate issue if there's appetite.from:Teknium1query returns the empty-citations fluff path. Not actionable from our side.