feat(gateway): pre-LLM intent fast-path for weather (~0.4s vs 21-63s) by davidgut1982 · Pull Request #2 · davidgut1982/hermes-agent

davidgut1982 · 2026-06-01T03:00:25Z

What

Adds a pre-LLM intent fast-path that intercepts weather questions before any agent/LLM runs, answers them directly from the Open-Meteo HTTP API, and returns in ~300-500ms. Anything that doesn't cleanly match falls through, untouched, to the normal agent pipeline.

Why

Weather queries currently traverse the two-level orchestrator/worker LLM agent loop and take 21-63s. A deterministic weather answer needs none of that. This brings the common case down to ~0.4s (measured below).

How it works

New module intent_fast_path.py (repo root, added to pyproject py-modules so it installs):

A one-line intent registry — register_intent(matcher, handler) + async _intent_fast_path(text). The first handler returning a non-None string wins; any matcher/handler exception is swallowed and treated as "no match".
A weather matcher anchored to end-of-string so it fires on questions about the weather ("weather", "weather woodstock il", "what's the weather in Denver", "is it raining") but not on conversational prose ("weather affects my mood") or slash commands.
A weather handler using httpx.AsyncClient(timeout=2.0):
- Default location is hardcoded Woodstock, IL (42.3147, -88.4487) — no geocoding call.
- Named locations are cleaned of , ST ZIP before geocoding (Open-Meteo's geocoder chokes on "City, ST ZIP"); display name built from admin1/country.
- WMO weather codes mapped to short text; terse Fahrenheit current + 3-day forecast, Telegram-safe markdown.

Two insertion points

Both are behind a guarded import — if intent_fast_path is missing, a stub that always returns None is used, so the framework behaves exactly as before (safe no-op):

gateway/platforms/api_server.py → _handle_chat_completions (HTTP/dashboard): inserted right after message validation. Honors the stream flag — non-streaming returns the standard OpenAI chat.completion JSON, streaming returns SSE chunks (delta then finish_reason: stop then data: [DONE]) with text/event-stream + CORS via the existing _cors_headers_for_origin. Logs an INFO line with elapsed ms on a hit.
gateway/run.py → GatewayRunner._handle_message (Telegram / all adapters): inserted between the Telegram lobby block and the session-sentinel claim, so the per-session lock is not taken for a fast-path reply.

Guarantees

Strict fall-through. The handler returns None (deferring to the agent) on any of: no match, empty geocoding, httpx timeout (>2s), HTTP 4xx/5xx, JSON/parse error, or missing current.temperature_2m. It never returns an empty or partial/wrong string.
Mention-gating & auth preserved. Insertion point B is downstream of auth and Telegram _should_process_message / require_mention; no auth check is added or weakened.
History-bypass tradeoff. Fast-path replies are not written to the session transcript. This is acceptable for ephemeral weather answers; it's called out here so reviewers can object if undesired.

Tests (this PR's verification — not deployed)

tests/test_intent_fast_path.py, 34 cases, mirroring the repo's flat tests/ + @pytest.mark.asyncio conventions:

Matcher positives/negatives.
Location cleaning (Woodstock, IL 60098 → Woodstock; Denver, CO → Denver).
Handler fall-through: monkeypatched httpx for timeout / HTTP 500 / empty geocoding / missing current temp → all return None.
Handler success: canned Open-Meteo current+daily payload → terse Fahrenheit string with a 3-day forecast (default + named location).
Dispatch exception safety.

34 passed in 0.41s

ruff check (repo config): All checks passed!

Live smoke run against real Open-Meteo (kept out of the unit suite — no network in CI):

Query	Wall-clock	Result
`weather` (default Woodstock)	761 ms	`Now: 62°F, Clear, wind 5 mph` + 3-day forecast
`what's the weather in Denver`	1037 ms	`Denver, Colorado, United States` — `Now: 70°F, Clear, wind 5 mph` + 3-day forecast
`weather in Zxqwffville` (bad city)	531 ms	`None` → falls through to agent

Notes for the reviewer

Fork source: hermes-agent 0.14.0 (the spec referenced installed 0.15.x; both anchor blocks matched verbatim, only line numbers differed).
Not deployed. Nothing here touches the live venv or restarts the gateway — PR only.

🤖 Generated with Claude Code

Weather questions currently traverse the two-level orchestrator/worker LLM agent loop, taking 21-63s. This adds a deterministic fast-path that intercepts weather intents BEFORE any agent/LLM runs, answers directly from the Open-Meteo HTTP API, and returns in ~300-500ms. Anything that does not cleanly match falls through untouched to the normal agent pipeline. New module intent_fast_path.py (repo root, added to pyproject py-modules): - register_intent()/_intent_fast_path() registry so future intents are one line. Any matcher/handler exception is swallowed -> fall through. - End-anchored weather matcher that ignores conversational prose ("weather affects my mood") and slash commands. - httpx.AsyncClient(timeout=2.0) handler. Default location is hardcoded Woodstock, IL (no geocoding). Named locations are cleaned of ", ST ZIP" before geocoding (Open-Meteo geocoder chokes on those). WMO codes mapped to short text; terse Fahrenheit current + 3-day forecast, Telegram-safe. - STRICT FALL-THROUGH: returns None on no match, empty geocoding, timeout, HTTP 4xx/5xx, JSON/parse error, or missing current.temperature_2m. Never returns an empty/partial/wrong string. Two insertion points, both behind a guarded import (missing module = safe no-op stub that always defers to the agent): - gateway/platforms/api_server.py _handle_chat_completions: after message validation, honoring the stream flag (OpenAI chat.completion JSON or SSE chunks + [DONE]); CORS via _cors_headers_for_origin; INFO log on hit. - gateway/run.py GatewayRunner._handle_message: between the Telegram lobby block and the session-sentinel claim, so the session lock is NOT taken for a fast-path reply. Downstream of auth + mention-gating, which are preserved. Tests: tests/test_intent_fast_path.py (34 cases) cover matcher positives/ negatives, location cleaning, handler fall-through (timeout/500/empty geocoding/missing temp), success render, and dispatch exception safety. No network in the unit suite (httpx monkeypatched). 34 passed; ruff clean. Tradeoff: fast-path replies are not written to the session transcript (history bypass) — acceptable for ephemeral weather answers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T03:01:10Z

🔎 Lint report: `feat/intent-fast-path-weather` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9383 on HEAD, 9373 on base (🆕 +10)

🆕 New issues (5):

Rule	Count
`unresolved-import`	3
`unresolved-attribute`	1
`invalid-argument-type`	1

First entries

tests/test_intent_fast_path.py:376: [unresolved-import] unresolved-import: Cannot resolve imported module `httpx`
intent_fast_path.py:573: [unresolved-attribute] unresolved-attribute: Object of type `object` has no attribute `get`
intent_fast_path.py:622: [unresolved-import] unresolved-import: Cannot resolve imported module `httpx`
intent_fast_path.py:264: [invalid-argument-type] invalid-argument-type: Argument to constructor `int.__new__` is incorrect: Expected `str | Buffer | SupportsInt | SupportsIndex | SupportsTrunc`, found `object`
tests/test_intent_fast_path.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 4957 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

… review Adversarial review of the weather fast-path found the connector path ("in|for|at|near|around <X>") accepting ANY trailing noun as a location, so non-weather prose got geocoded into a bogus weather answer (CONFIRMED LIVE: "forecast for the meeting" -> Nenagh, Ireland; "forecast in the lab" -> Indiana). It also let a noun-filler bridge to a connector ("weather report for the Q3 sales"). HIGH-1: Add _connector_location_ok() guarding loc1/loc2 connector captures in both _weather_matcher and _extract_location — reject "the <noun>", >5 tokens, >60 chars, or any _NON_PLACE_WORDS stopword (meeting, lab, code, budget, sales, quarter, report, market, stock, project, team, …). Tolerates ZIP/state tokens so "Woodstock, IL 60098" still passes. HIGH-2: Restructure _WEATHER_RE into two end-anchored, non-optional alternations: (a) keyword + optional filler + EOL, or (b) keyword + optional idiomatic "like" + connector + location + EOL. A noun filler (report/budget/…) can no longer be followed by connector+location. ("what's the weather like in Denver" preserved via the "like" bridge.) MEDIUM: Replace flat httpx timeout=2.0 (per-call, ~4s worst case) with httpx.Timeout(connect=1.0, read=1.5) and wrap the two-call named-location branch in asyncio.wait_for(..., 3.5s) hard ceiling. Default Woodstock path stays single-call. LOW: _is_place_like rejects single tokens >60 chars. Tests: +20 cases (54 total) — all confirmed-live false positives asserted no-match AND _extract_location None; true positives preserved; named-branch hard-ceiling test with a mocked slow httpx. ruff clean; full fast-path suite and 150 gateway api_server tests green. SSE (api_server): split the fast-path role+content single delta into spec-standard chunks (role-only, content, finish_reason:stop, [DONE]) so strict OpenAI/OpenWebUI stream consumers parse cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

davidgut1982 · 2026-06-01T14:59:00Z

Closing — deprioritized. Adversarial review found the named-location matcher has an unbounded false-positive class (non-weather prose geocoding to real places, e.g. 'weather for mom' -> Mauritania); a denylist can't close it. With Lore prefetch + parent init now ~345ms (was ~5s), the normal agent path is no longer catastrophic, so a pre-LLM intercept isn't worth the correctness risk. Branch kept for reference if we revisit a home-only scoped version later.

…NousResearch#34192) (NousResearch#34382) NousResearch#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup with: /usr/bin/tini: No such file or directory The image moved from tini to s6-overlay as PID 1 (/init) earlier in 2026. Orchestration templates that still pin /usr/bin/tini as the entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no binary to exec and the container crashes immediately. Hermes has no control over the Hostinger catalog template, but we can make the image backward-compatible by symlinking /usr/bin/tini -> /init during the s6-overlay install step. External wrappers that exec /usr/bin/tini will land on the same s6-overlay reaper they would have landed on if they'd used the canonical /init entrypoint. The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim is purely for legacy external wrappers, not for the image's own runtime path. Once affected catalogs are updated, the symlink can be removed. Other issues NousResearch#34192 raises that are NOT addressed by this PR: * Problem #2 (UID 1024 vs 10000 mismatch): already fixed by NousResearch#33148 (S6_KEEP_ENV=1) and NousResearch#32412 (with-contenv shebangs). The Hostinger template likely needs to update its env-var propagation. * Problem #3 (incompatible session formats): RFC for pluggable SessionDB is tracked in NousResearch#23717. * Problem #4 (Telegram polling conflict): an operations problem on Hostinger's side, not in this codebase. This PR is scoped to the one issue that can be fixed inside Dockerfile: the missing /usr/bin/tini binary. Tests (3 in test_dockerfile_tini_compat_shim.py): - test_tini_compat_symlink_present Guard: the symlink line must exist in Dockerfile. - test_tini_compat_comment_explains_why The NousResearch#34192 anchor comment must be present so future readers know why the shim is there (avoid accidental removal). - test_entrypoint_still_init_not_tini Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is only for external wrappers. Refs: NousResearch#34192 Partial fix: addresses the immediate tini-binary crash. Catalog-side fixes still needed by Hostinger for the UID and session-format problems documented in the issue. Co-authored-by: Cursor <cursoragent@cursor.com>

davidgut1982 closed this Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): pre-LLM intent fast-path for weather (~0.4s vs 21-63s)#2

feat(gateway): pre-LLM intent fast-path for weather (~0.4s vs 21-63s)#2
davidgut1982 wants to merge 2 commits into
mainfrom
feat/intent-fast-path-weather

davidgut1982 commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

davidgut1982 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidgut1982 commented Jun 1, 2026

What

Why

How it works

Two insertion points

Guarantees

Tests (this PR's verification — not deployed)

Notes for the reviewer

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: feat/intent-fast-path-weather vs origin/main

ruff

ty (type checker)

Uh oh!

davidgut1982 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 1, 2026 •

edited

Loading

🔎 Lint report: `feat/intent-fast-path-weather` vs `origin/main`