feat(gateway): pre-LLM intent fast-path for weather (~0.4s vs 21-63s)#2
feat(gateway): pre-LLM intent fast-path for weather (~0.4s vs 21-63s)#2davidgut1982 wants to merge 2 commits into
Conversation
Weather questions currently traverse the two-level orchestrator/worker LLM
agent loop, taking 21-63s. This adds a deterministic fast-path that intercepts
weather intents BEFORE any agent/LLM runs, answers directly from the Open-Meteo
HTTP API, and returns in ~300-500ms. Anything that does not cleanly match
falls through untouched to the normal agent pipeline.
New module intent_fast_path.py (repo root, added to pyproject py-modules):
- register_intent()/_intent_fast_path() registry so future intents are one
line. Any matcher/handler exception is swallowed -> fall through.
- End-anchored weather matcher that ignores conversational prose
("weather affects my mood") and slash commands.
- httpx.AsyncClient(timeout=2.0) handler. Default location is hardcoded
Woodstock, IL (no geocoding). Named locations are cleaned of ", ST ZIP"
before geocoding (Open-Meteo geocoder chokes on those). WMO codes mapped
to short text; terse Fahrenheit current + 3-day forecast, Telegram-safe.
- STRICT FALL-THROUGH: returns None on no match, empty geocoding, timeout,
HTTP 4xx/5xx, JSON/parse error, or missing current.temperature_2m. Never
returns an empty/partial/wrong string.
Two insertion points, both behind a guarded import (missing module = safe
no-op stub that always defers to the agent):
- gateway/platforms/api_server.py _handle_chat_completions: after message
validation, honoring the stream flag (OpenAI chat.completion JSON or SSE
chunks + [DONE]); CORS via _cors_headers_for_origin; INFO log on hit.
- gateway/run.py GatewayRunner._handle_message: between the Telegram lobby
block and the session-sentinel claim, so the session lock is NOT taken for
a fast-path reply. Downstream of auth + mention-gating, which are preserved.
Tests: tests/test_intent_fast_path.py (34 cases) cover matcher positives/
negatives, location cleaning, handler fall-through (timeout/500/empty
geocoding/missing temp), success render, and dispatch exception safety. No
network in the unit suite (httpx monkeypatched). 34 passed; ruff clean.
Tradeoff: fast-path replies are not written to the session transcript
(history bypass) — acceptable for ephemeral weather answers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
3 |
unresolved-attribute |
1 |
invalid-argument-type |
1 |
First entries
tests/test_intent_fast_path.py:376: [unresolved-import] unresolved-import: Cannot resolve imported module `httpx`
intent_fast_path.py:573: [unresolved-attribute] unresolved-attribute: Object of type `object` has no attribute `get`
intent_fast_path.py:622: [unresolved-import] unresolved-import: Cannot resolve imported module `httpx`
intent_fast_path.py:264: [invalid-argument-type] invalid-argument-type: Argument to constructor `int.__new__` is incorrect: Expected `str | Buffer | SupportsInt | SupportsIndex | SupportsTrunc`, found `object`
tests/test_intent_fast_path.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
✅ Fixed issues: none
Unchanged: 4957 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
… review
Adversarial review of the weather fast-path found the connector path
("in|for|at|near|around <X>") accepting ANY trailing noun as a location,
so non-weather prose got geocoded into a bogus weather answer (CONFIRMED
LIVE: "forecast for the meeting" -> Nenagh, Ireland; "forecast in the lab"
-> Indiana). It also let a noun-filler bridge to a connector
("weather report for the Q3 sales").
HIGH-1: Add _connector_location_ok() guarding loc1/loc2 connector captures
in both _weather_matcher and _extract_location — reject "the <noun>",
>5 tokens, >60 chars, or any _NON_PLACE_WORDS stopword (meeting, lab,
code, budget, sales, quarter, report, market, stock, project, team, …).
Tolerates ZIP/state tokens so "Woodstock, IL 60098" still passes.
HIGH-2: Restructure _WEATHER_RE into two end-anchored, non-optional
alternations: (a) keyword + optional filler + EOL, or (b) keyword +
optional idiomatic "like" + connector + location + EOL. A noun filler
(report/budget/…) can no longer be followed by connector+location.
("what's the weather like in Denver" preserved via the "like" bridge.)
MEDIUM: Replace flat httpx timeout=2.0 (per-call, ~4s worst case) with
httpx.Timeout(connect=1.0, read=1.5) and wrap the two-call named-location
branch in asyncio.wait_for(..., 3.5s) hard ceiling. Default Woodstock
path stays single-call.
LOW: _is_place_like rejects single tokens >60 chars.
Tests: +20 cases (54 total) — all confirmed-live false positives asserted
no-match AND _extract_location None; true positives preserved; named-branch
hard-ceiling test with a mocked slow httpx. ruff clean; full fast-path suite
and 150 gateway api_server tests green.
SSE (api_server): split the fast-path role+content single delta into
spec-standard chunks (role-only, content, finish_reason:stop, [DONE]) so
strict OpenAI/OpenWebUI stream consumers parse cleanly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Closing — deprioritized. Adversarial review found the named-location matcher has an unbounded false-positive class (non-weather prose geocoding to real places, e.g. 'weather for mom' -> Mauritania); a denylist can't close it. With Lore prefetch + parent init now ~345ms (was ~5s), the normal agent path is no longer catastrophic, so a pre-LLM intercept isn't worth the correctness risk. Branch kept for reference if we revisit a home-only scoped version later. |
…NousResearch#34192) (NousResearch#34382) NousResearch#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup with: /usr/bin/tini: No such file or directory The image moved from tini to s6-overlay as PID 1 (/init) earlier in 2026. Orchestration templates that still pin /usr/bin/tini as the entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no binary to exec and the container crashes immediately. Hermes has no control over the Hostinger catalog template, but we can make the image backward-compatible by symlinking /usr/bin/tini -> /init during the s6-overlay install step. External wrappers that exec /usr/bin/tini will land on the same s6-overlay reaper they would have landed on if they'd used the canonical /init entrypoint. The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim is purely for legacy external wrappers, not for the image's own runtime path. Once affected catalogs are updated, the symlink can be removed. Other issues NousResearch#34192 raises that are NOT addressed by this PR: * Problem #2 (UID 1024 vs 10000 mismatch): already fixed by NousResearch#33148 (S6_KEEP_ENV=1) and NousResearch#32412 (with-contenv shebangs). The Hostinger template likely needs to update its env-var propagation. * Problem #3 (incompatible session formats): RFC for pluggable SessionDB is tracked in NousResearch#23717. * Problem #4 (Telegram polling conflict): an operations problem on Hostinger's side, not in this codebase. This PR is scoped to the one issue that can be fixed inside Dockerfile: the missing /usr/bin/tini binary. Tests (3 in test_dockerfile_tini_compat_shim.py): - test_tini_compat_symlink_present Guard: the symlink line must exist in Dockerfile. - test_tini_compat_comment_explains_why The NousResearch#34192 anchor comment must be present so future readers know why the shim is there (avoid accidental removal). - test_entrypoint_still_init_not_tini Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is only for external wrappers. Refs: NousResearch#34192 Partial fix: addresses the immediate tini-binary crash. Catalog-side fixes still needed by Hostinger for the UID and session-format problems documented in the issue. Co-authored-by: Cursor <cursoragent@cursor.com>
What
Adds a pre-LLM intent fast-path that intercepts weather questions before any agent/LLM runs, answers them directly from the Open-Meteo HTTP API, and returns in ~300-500ms. Anything that doesn't cleanly match falls through, untouched, to the normal agent pipeline.
Why
Weather queries currently traverse the two-level orchestrator/worker LLM agent loop and take 21-63s. A deterministic weather answer needs none of that. This brings the common case down to ~0.4s (measured below).
How it works
New module
intent_fast_path.py(repo root, added topyprojectpy-modulesso it installs):register_intent(matcher, handler)+async _intent_fast_path(text). The first handler returning a non-None string wins; any matcher/handler exception is swallowed and treated as "no match".httpx.AsyncClient(timeout=2.0):42.3147, -88.4487) — no geocoding call., ST ZIPbefore geocoding (Open-Meteo's geocoder chokes on "City, ST ZIP"); display name built fromadmin1/country.Two insertion points
Both are behind a guarded import — if
intent_fast_pathis missing, a stub that always returnsNoneis used, so the framework behaves exactly as before (safe no-op):gateway/platforms/api_server.py→_handle_chat_completions(HTTP/dashboard): inserted right after message validation. Honors thestreamflag — non-streaming returns the standard OpenAIchat.completionJSON, streaming returns SSE chunks (deltathenfinish_reason: stopthendata: [DONE]) withtext/event-stream+ CORS via the existing_cors_headers_for_origin. Logs an INFO line with elapsed ms on a hit.gateway/run.py→GatewayRunner._handle_message(Telegram / all adapters): inserted between the Telegram lobby block and the session-sentinel claim, so the per-session lock is not taken for a fast-path reply.Guarantees
None(deferring to the agent) on any of: no match, empty geocoding, httpx timeout (>2s), HTTP 4xx/5xx, JSON/parse error, or missingcurrent.temperature_2m. It never returns an empty or partial/wrong string._should_process_message/require_mention; no auth check is added or weakened.Tests (this PR's verification — not deployed)
tests/test_intent_fast_path.py, 34 cases, mirroring the repo's flattests/+@pytest.mark.asyncioconventions:Woodstock, IL 60098→Woodstock;Denver, CO→Denver).None.ruff check(repo config): All checks passed!Live smoke run against real Open-Meteo (kept out of the unit suite — no network in CI):
weather(default Woodstock)Now: 62°F, Clear, wind 5 mph+ 3-day forecastwhat's the weather in DenverDenver, Colorado, United States—Now: 70°F, Clear, wind 5 mph+ 3-day forecastweather in Zxqwffville(bad city)None→ falls through to agentNotes for the reviewer
hermes-agent0.14.0 (the spec referenced installed 0.15.x; both anchor blocks matched verbatim, only line numbers differed).🤖 Generated with Claude Code