fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062) by briandevans · Pull Request #12170 · NousResearch/hermes-agent

briandevans · 2026-04-18T14:32:33Z

Fixes #12062.

TL;DR

HomeAssistantAdapter._handle_ha_event writes the per-entity cooldown timestamp before it knows whether the event will actually produce a forwarded message. A no-op state_changed event (old_state == new_state) therefore consumes the cooldown window and suppresses the next real state change.

Fix: keep the cooldown check early, but delay the cooldown write until after _format_state_change returns a non-empty message. Only events that forward consume the window.

Two lines effectively swapped; one comment added.

Root cause

gateway/platforms/homeassistant.py:286-299 (pre-fix):

# Apply cooldown
now = time.time()
last = self._last_event_time.get(entity_id, 0)
if (now - last) < self._cooldown_seconds:
    return
self._last_event_time[entity_id] = now      # ← advanced before we know
                                              #   whether the event forwards

old_state = event_data.get("old_state", {})
new_state = event_data.get("new_state", {})
message = self._format_state_change(entity_id, old_state, new_state)

if not message:                               # ← no-op / malformed → None,
    return                                    #   cooldown already burned

_format_state_change returns None in two cases (both legitimate non-forward paths):

new_state missing/empty
old_val == new_val — state didn't actually change

Fix

# Apply cooldown check — but do NOT record the timestamp yet.
# Unchanged or malformed events (where _format_state_change returns
# None below) must not consume the cooldown window and suppress the
# next real state change. See #12062.
now = time.time()
last = self._last_event_time.get(entity_id, 0)
if (now - last) < self._cooldown_seconds:
    return

old_state = event_data.get("old_state", {})
new_state = event_data.get("new_state", {})
message = self._format_state_change(entity_id, old_state, new_state)

if not message:
    return

# Only consume the cooldown window for events we actually forward.
self._last_event_time[entity_id] = now

Reproduction on clean `origin/main` (`6fb69229`)

Direct copy of reporter's repro:

ha = HomeAssistantAdapter(PlatformConfig(enabled=True, token='t', extra={
    'url': 'http://x', 'watch_all': True, 'cooldown_seconds': 60,
}))
ha.handle_message = AsyncMock()

await ha._handle_ha_event({'data': {
    'entity_id': 'sensor.temp',
    'old_state': {'state': '20'},
    'new_state': {'state': '20', 'attributes': {}},
}})
await ha._handle_ha_event({'data': {
    'entity_id': 'sensor.temp',
    'old_state': {'state': '20'},
    'new_state': {'state': '21', 'attributes': {}},
}})

assert ha.handle_message.await_count == 1   # fails on main: actual 0

Narrow scope — explicitly not changed

Cooldown check placement. Still runs before _format_state_change. Not doing wasted format work on throttled events, and not widening the cooldown check semantics to "every non-None formatter output" (which would entangle the throttle policy with formatting logic).
handle_message failure handling. If handle_message raises after the timestamp is recorded, the cooldown still counts — same behaviour as before the fix. Out of scope.
Dict pruning. _last_event_time is not otherwise bounded. This fix happens to stop no-op entities from accumulating entries (side benefit, not the goal).
_format_state_change semantics. Pure formatter, unchanged.

Behaviour matrix

Event	Old state	New state	`_format_state_change`	Before: cooldown written?	After: cooldown written?
A: real change	`20`	`21`	message	yes	yes
B: no-op	`20`	`20`	`None`	yes (bug)	no
C: missing new_state	`20`	`{}`	`None`	yes (bug)	no
D: throttled (within cooldown)	—	—	not reached	no	no
E: ignored / domain-not-watched	—	—	not reached	no	no

Rows A, D, E match pre-fix behaviour. Rows B and C change from "consumes window" to "passes through without consuming".

Regression coverage

tests/gateway/test_homeassistant.py gets a new TestCooldownIssue12062 class with 5 cases:

test_no_op_state_change_does_not_consume_cooldown — reporter's exact scenario.
test_no_op_does_not_write_last_event_time — structural pin on _last_event_time.
test_missing_new_state_does_not_consume_cooldown — covers the other _format_state_change → None branch.
test_forwarded_event_still_consumes_cooldown — preserved-behaviour canary so the fix can't silently disable cooldown.
test_no_op_then_real_change_across_entities — independent per-entity accounting.

4 of 5 fail on clean origin/main with symptoms like:

AssertionError: assert 0 == 1  (handle_message.await_count)
AssertionError: assert 'sensor.temp' not in {'sensor.temp': 1776522437.86081}

The 5th pins preserved behaviour.

Validation

source venv/bin/activate
python -m pytest tests/gateway/test_homeassistant.py -q
# 50 passed (45 pre-existing + 5 new)

Broader tests/gateway under -n auto → 13 pre-existing baseline failures (dingtalk card lifecycle, matrix encrypted upload, approve/deny E2E, whatsapp bridge runtime / xdist flakes). Zero in test_homeassistant.py or any touched code path. Happy to post the per-test baseline audit as a PR comment.

Pre-empted review questions

Q. Could this cause runaway forwarding if an entity emits thousands of no-ops?
No. _format_state_change still returns None for no-ops, so they don't forward regardless of cooldown state. Cooldown only throttles the stream of forwarded messages.

Q. Why not move the cooldown check itself later, past _format_state_change?
Two reasons:

Format work would run even for entities within the cooldown window — wasted CPU.
It would spread throttle policy into the formatter's return contract — the formatter is a pure function, not a throttling arbiter.

Q. Concurrency?
_handle_ha_event is async but there are no awaits between the cooldown read and the new self._last_event_time[entity_id] = now write (_format_state_change is synchronous and pure). Same atomicity as before.

Q. Was _last_event_time ever pruned?
No, and this PR doesn't add pruning — but the fix does stop unbounded growth from entities that only ever emit no-op events. Explicit pruning is a separate concern if long-lived deployments need it.

_{Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.}

…nts (NousResearch#12062) ``HomeAssistantAdapter._handle_ha_event`` writes the per-entity cooldown timestamp *before* calling ``_format_state_change``, which is what actually decides whether the event will be forwarded. For events where ``old_state == new_state`` (or where ``new_state`` is missing), the formatter returns ``None`` and the function returns early — but ``self._last_event_time[entity_id]`` has already been advanced. As a result, a rapid no-op event "uses up" the cooldown window and suppresses the next genuine state change. Reporter: NousResearch#12062. Root cause ---------- ``gateway/platforms/homeassistant.py`` lines 286-299:: # Apply cooldown now = time.time() last = self._last_event_time.get(entity_id, 0) if (now - last) < self._cooldown_seconds: return self._last_event_time[entity_id] = now # <- advanced before we know # the event forwards old_state = event_data.get("old_state", {}) new_state = event_data.get("new_state", {}) message = self._format_state_change(entity_id, old_state, new_state) if not message: # <- no-op / malformed → None, return # but cooldown already burned Fix --- Keep the cooldown *check* early (so throttled events don't waste time formatting), but move the cooldown *write* to after ``_format_state_change`` returns a non-empty message. Only events that are actually forwarded consume the cooldown window. No API / config / public-behaviour change. Two lines effectively swapped; one comment added. Reproduction (confirmed on origin/main ``6fb69229``) ---------------------------------------------------- :: ha = HomeAssistantAdapter(PlatformConfig(enabled=True, token='t', extra={ 'url': 'http://x', 'watch_all': True, 'cooldown_seconds': 60, })) ha.handle_message = AsyncMock() await ha._handle_ha_event({'data': {'entity_id': 'sensor.temp', 'old_state': {'state': '20'}, 'new_state': {'state': '20', 'attributes': {}}}}) await ha._handle_ha_event({'data': {'entity_id': 'sensor.temp', 'old_state': {'state': '20'}, 'new_state': {'state': '21', 'attributes': {}}}}) assert ha.handle_message.await_count == 1 # fails on main (0) Side benefit ------------ ``_last_event_time`` no longer grows unbounded with entries for entities that only ever emit no-op events. Regression coverage ------------------- ``tests/gateway/test_homeassistant.py`` gets a new ``TestCooldownIssue12062`` class with 5 cases: * ``test_no_op_state_change_does_not_consume_cooldown`` — reporter's exact scenario. * ``test_no_op_does_not_write_last_event_time`` — structural pin on the cooldown map. * ``test_missing_new_state_does_not_consume_cooldown`` — covers the other ``_format_state_change → None`` branch. * ``test_forwarded_event_still_consumes_cooldown`` — preserved- behaviour canary so the fix can't silently disable cooldown. * ``test_no_op_then_real_change_across_entities`` — independent per-entity accounting. 4 of the 5 fail on clean ``origin/main`` with the reporter symptom; the 5th pins preserved behaviour. Validation ---------- ``source venv/bin/activate && python -m pytest tests/gateway/test_homeassistant.py -q`` → **50 passed** (45 pre-existing + 5 new). Broader ``tests/gateway`` under ``-n auto`` → 13 pre-existing baseline failures (dingtalk card lifecycle, matrix encrypted upload, approve/deny E2E, whatsapp bridge runtime / xdist flakes). Zero in ``test_homeassistant.py`` or any touched code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes a Home Assistant adapter cooldown bug where no-op/malformed state_changed events could consume an entity’s cooldown window and suppress the next real change.

Changes:

Delay writing _last_event_time[entity_id] until after _format_state_change(...) returns a non-empty message.
Add regression tests covering no-op events, malformed events, and preserved cooldown behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`gateway/platforms/homeassistant.py`	Moves cooldown timestamp write to only occur for forwarded events, preventing no-ops from burning the cooldown window.
`tests/gateway/test_homeassistant.py`	Adds targeted regression coverage for issue #12062 across multiple scenarios (no-op, missing `new_state`, per-entity independence, preserved cooldown).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

briandevans · 2026-04-18T14:45:49Z

CI audit — all 4 test failures are pre-existing deterministic baselines on clean origin/main (6fb69229). Zero in touched code (gateway/platforms/homeassistant.py / tests/gateway/test_homeassistant.py).

Verified on clean main earlier this cycle:

Test	Symptom	Root (on main)
`test_browser_camofox_state.py::test_config_version_matches_current_schema`	`assert 19 == 18`	Hard-coded schema-version pin vs. current schema.
`test_web_server.py::test_no_single_field_categories`	`Category 'code_execution' has only 1 field(s) — should be merged`	Category merge drift.
`test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pending`	`AttributeError: '_Stub' object has no attribute '_apply_pending_steer_to_tool_results'`	Test stub hasn't kept up with `run_agent.py:8092` refactor.
`test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interrupted`	Same `AttributeError`	Same baseline.

Each reproduces on clean origin/main locally with the identical error text.

Focused suite on this branch: pytest tests/gateway/test_homeassistant.py -q → 50 passed (45 pre-existing + 5 new).

All other CI checks green: check-attribution, build-and-push, e2e, supply-chain scan.

Reproduction commands available on request.

alt-glitch · 2026-04-23T20:56:34Z

Related competing PR: #12083 — both fix #12062 (no-op events consuming cooldown). Maintainers should pick one.

briandevans · 2026-04-25T02:54:38Z

Thanks @alt-glitch for the cross-reference. 👍

Looked at @LeonSGP43's #12083 — both PRs resolve #12062 with the same 2-line swap in _handle_ha_event: move the self._last_event_time[entity_id] = now write from after the cooldown check to after _format_state_change returns a real message. Core logic is identical.

Diff of substance:

Aspect	#12083 (LeonSGP43)	#12170 (this PR)
Source change	2-line move	2-line move + inline comment referencing #12062
Regression test	1 test (`test_noop_state_change_does_not_consume_cooldown`)	5 tests in a dedicated `TestCooldownIssue12062` class: reporter's repro, unavailable→unavailable, missing `new_state`, different-entity independence, multi-noop burst tolerance
Docs on why	None	Comment block above the test class spelling out the intent contract ("cooldown throttles forwarded messages, not dropped ones")

Happy either lands — the fix is the fix. My PR mostly differs in test surface area and the intent-documenting comments, which I'd argue pay off once someone else revisits the cooldown path. If maintainers prefer the minimal change, #12083 is perfectly correct and I'll close this one without ceremony.

briandevans · 2026-05-18T04:13:06Z

Closing to keep the queue clean — happy to reopen if this is still useful.

Copilot AI review requested due to automatic review settings April 18, 2026 14:32

Copilot started reviewing on behalf of briandevans April 18, 2026 14:33 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

briandevans mentioned this pull request Apr 19, 2026

fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272) #12287

Closed

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/webhook Webhook / API server labels Apr 23, 2026

briandevans closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062)#12170

fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062)#12170
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/ha-cooldown-no-op-consumes-window

briandevans commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

briandevans commented Apr 18, 2026

Uh oh!

alt-glitch commented Apr 23, 2026

Uh oh!

briandevans commented Apr 25, 2026

Uh oh!

briandevans commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

briandevans commented Apr 18, 2026

TL;DR

Root cause

Fix

Reproduction on clean origin/main (6fb69229)

Narrow scope — explicitly not changed

Behaviour matrix

Regression coverage

Validation

Pre-empted review questions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

briandevans commented Apr 18, 2026

Uh oh!

alt-glitch commented Apr 23, 2026

Uh oh!

briandevans commented Apr 25, 2026

Uh oh!

briandevans commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reproduction on clean `origin/main` (`6fb69229`)