fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062)#12170
fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062)#12170briandevans wants to merge 1 commit into
Conversation
…nts (NousResearch#12062) ``HomeAssistantAdapter._handle_ha_event`` writes the per-entity cooldown timestamp *before* calling ``_format_state_change``, which is what actually decides whether the event will be forwarded. For events where ``old_state == new_state`` (or where ``new_state`` is missing), the formatter returns ``None`` and the function returns early — but ``self._last_event_time[entity_id]`` has already been advanced. As a result, a rapid no-op event "uses up" the cooldown window and suppresses the next genuine state change. Reporter: NousResearch#12062. Root cause ---------- ``gateway/platforms/homeassistant.py`` lines 286-299:: # Apply cooldown now = time.time() last = self._last_event_time.get(entity_id, 0) if (now - last) < self._cooldown_seconds: return self._last_event_time[entity_id] = now # <- advanced before we know # the event forwards old_state = event_data.get("old_state", {}) new_state = event_data.get("new_state", {}) message = self._format_state_change(entity_id, old_state, new_state) if not message: # <- no-op / malformed → None, return # but cooldown already burned Fix --- Keep the cooldown *check* early (so throttled events don't waste time formatting), but move the cooldown *write* to after ``_format_state_change`` returns a non-empty message. Only events that are actually forwarded consume the cooldown window. No API / config / public-behaviour change. Two lines effectively swapped; one comment added. Reproduction (confirmed on origin/main ``6fb69229``) ---------------------------------------------------- :: ha = HomeAssistantAdapter(PlatformConfig(enabled=True, token='t', extra={ 'url': 'http://x', 'watch_all': True, 'cooldown_seconds': 60, })) ha.handle_message = AsyncMock() await ha._handle_ha_event({'data': {'entity_id': 'sensor.temp', 'old_state': {'state': '20'}, 'new_state': {'state': '20', 'attributes': {}}}}) await ha._handle_ha_event({'data': {'entity_id': 'sensor.temp', 'old_state': {'state': '20'}, 'new_state': {'state': '21', 'attributes': {}}}}) assert ha.handle_message.await_count == 1 # fails on main (0) Side benefit ------------ ``_last_event_time`` no longer grows unbounded with entries for entities that only ever emit no-op events. Regression coverage ------------------- ``tests/gateway/test_homeassistant.py`` gets a new ``TestCooldownIssue12062`` class with 5 cases: * ``test_no_op_state_change_does_not_consume_cooldown`` — reporter's exact scenario. * ``test_no_op_does_not_write_last_event_time`` — structural pin on the cooldown map. * ``test_missing_new_state_does_not_consume_cooldown`` — covers the other ``_format_state_change → None`` branch. * ``test_forwarded_event_still_consumes_cooldown`` — preserved- behaviour canary so the fix can't silently disable cooldown. * ``test_no_op_then_real_change_across_entities`` — independent per-entity accounting. 4 of the 5 fail on clean ``origin/main`` with the reporter symptom; the 5th pins preserved behaviour. Validation ---------- ``source venv/bin/activate && python -m pytest tests/gateway/test_homeassistant.py -q`` → **50 passed** (45 pre-existing + 5 new). Broader ``tests/gateway`` under ``-n auto`` → 13 pre-existing baseline failures (dingtalk card lifecycle, matrix encrypted upload, approve/deny E2E, whatsapp bridge runtime / xdist flakes). Zero in ``test_homeassistant.py`` or any touched code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a Home Assistant adapter cooldown bug where no-op/malformed state_changed events could consume an entity’s cooldown window and suppress the next real change.
Changes:
- Delay writing
_last_event_time[entity_id]until after_format_state_change(...)returns a non-empty message. - Add regression tests covering no-op events, malformed events, and preserved cooldown behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
gateway/platforms/homeassistant.py |
Moves cooldown timestamp write to only occur for forwarded events, preventing no-ops from burning the cooldown window. |
tests/gateway/test_homeassistant.py |
Adds targeted regression coverage for issue #12062 across multiple scenarios (no-op, missing new_state, per-entity independence, preserved cooldown). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
CI audit — all 4 Verified on clean main earlier this cycle:
Each reproduces on clean Focused suite on this branch: All other CI checks green: Reproduction commands available on request. |
|
Thanks @alt-glitch for the cross-reference. 👍 Looked at @LeonSGP43's #12083 — both PRs resolve #12062 with the same 2-line swap in Diff of substance:
Happy either lands — the fix is the fix. My PR mostly differs in test surface area and the intent-documenting comments, which I'd argue pay off once someone else revisits the cooldown path. If maintainers prefer the minimal change, #12083 is perfectly correct and I'll close this one without ceremony. |
|
Closing to keep the queue clean — happy to reopen if this is still useful. |
Fixes #12062.
TL;DR
HomeAssistantAdapter._handle_ha_eventwrites the per-entity cooldown timestamp before it knows whether the event will actually produce a forwarded message. A no-opstate_changedevent (old_state == new_state) therefore consumes the cooldown window and suppresses the next real state change.Fix: keep the cooldown check early, but delay the cooldown write until after
_format_state_changereturns a non-empty message. Only events that forward consume the window.Two lines effectively swapped; one comment added.
Root cause
gateway/platforms/homeassistant.py:286-299(pre-fix):_format_state_changereturnsNonein two cases (both legitimate non-forward paths):new_statemissing/emptyold_val == new_val— state didn't actually changeFix
Reproduction on clean
origin/main(6fb69229)Direct copy of reporter's repro:
Narrow scope — explicitly not changed
_format_state_change. Not doing wasted format work on throttled events, and not widening the cooldown check semantics to "every non-None formatter output" (which would entangle the throttle policy with formatting logic).handle_messagefailure handling. Ifhandle_messageraises after the timestamp is recorded, the cooldown still counts — same behaviour as before the fix. Out of scope._last_event_timeis not otherwise bounded. This fix happens to stop no-op entities from accumulating entries (side benefit, not the goal)._format_state_changesemantics. Pure formatter, unchanged.Behaviour matrix
_format_state_change20212020None20{}NoneRows A, D, E match pre-fix behaviour. Rows B and C change from "consumes window" to "passes through without consuming".
Regression coverage
tests/gateway/test_homeassistant.pygets a newTestCooldownIssue12062class with 5 cases:test_no_op_state_change_does_not_consume_cooldown— reporter's exact scenario.test_no_op_does_not_write_last_event_time— structural pin on_last_event_time.test_missing_new_state_does_not_consume_cooldown— covers the other_format_state_change → Nonebranch.test_forwarded_event_still_consumes_cooldown— preserved-behaviour canary so the fix can't silently disable cooldown.test_no_op_then_real_change_across_entities— independent per-entity accounting.4 of 5 fail on clean
origin/mainwith symptoms like:The 5th pins preserved behaviour.
Validation
Broader
tests/gatewayunder-n auto→ 13 pre-existing baseline failures (dingtalk card lifecycle, matrix encrypted upload, approve/deny E2E, whatsapp bridge runtime / xdist flakes). Zero intest_homeassistant.pyor any touched code path. Happy to post the per-test baseline audit as a PR comment.Pre-empted review questions
Q. Could this cause runaway forwarding if an entity emits thousands of no-ops?
No.
_format_state_changestill returnsNonefor no-ops, so they don't forward regardless of cooldown state. Cooldown only throttles the stream of forwarded messages.Q. Why not move the cooldown check itself later, past
_format_state_change?Two reasons:
Q. Concurrency?
_handle_ha_eventis async but there are noawaits between the cooldown read and the newself._last_event_time[entity_id] = nowwrite (_format_state_changeis synchronous and pure). Same atomicity as before.Q. Was
_last_event_timeever pruned?No, and this PR doesn't add pruning — but the fix does stop unbounded growth from entities that only ever emit no-op events. Explicit pruning is a separate concern if long-lived deployments need it.
Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.