Skip to content

fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062)#12170

Closed
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/ha-cooldown-no-op-consumes-window
Closed

fix(homeassistant): don't consume cooldown on no-op state_changed events (#12062)#12170
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/ha-cooldown-no-op-consumes-window

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

Fixes #12062.

TL;DR

HomeAssistantAdapter._handle_ha_event writes the per-entity cooldown timestamp before it knows whether the event will actually produce a forwarded message. A no-op state_changed event (old_state == new_state) therefore consumes the cooldown window and suppresses the next real state change.

Fix: keep the cooldown check early, but delay the cooldown write until after _format_state_change returns a non-empty message. Only events that forward consume the window.

Two lines effectively swapped; one comment added.

Root cause

gateway/platforms/homeassistant.py:286-299 (pre-fix):

# Apply cooldown
now = time.time()
last = self._last_event_time.get(entity_id, 0)
if (now - last) < self._cooldown_seconds:
    return
self._last_event_time[entity_id] = now      # ← advanced before we know
                                              #   whether the event forwards

old_state = event_data.get("old_state", {})
new_state = event_data.get("new_state", {})
message = self._format_state_change(entity_id, old_state, new_state)

if not message:                               # ← no-op / malformed → None,
    return                                    #   cooldown already burned

_format_state_change returns None in two cases (both legitimate non-forward paths):

  • new_state missing/empty
  • old_val == new_val — state didn't actually change

Fix

# Apply cooldown check — but do NOT record the timestamp yet.
# Unchanged or malformed events (where _format_state_change returns
# None below) must not consume the cooldown window and suppress the
# next real state change. See #12062.
now = time.time()
last = self._last_event_time.get(entity_id, 0)
if (now - last) < self._cooldown_seconds:
    return

old_state = event_data.get("old_state", {})
new_state = event_data.get("new_state", {})
message = self._format_state_change(entity_id, old_state, new_state)

if not message:
    return

# Only consume the cooldown window for events we actually forward.
self._last_event_time[entity_id] = now

Reproduction on clean origin/main (6fb69229)

Direct copy of reporter's repro:

ha = HomeAssistantAdapter(PlatformConfig(enabled=True, token='t', extra={
    'url': 'http://x', 'watch_all': True, 'cooldown_seconds': 60,
}))
ha.handle_message = AsyncMock()

await ha._handle_ha_event({'data': {
    'entity_id': 'sensor.temp',
    'old_state': {'state': '20'},
    'new_state': {'state': '20', 'attributes': {}},
}})
await ha._handle_ha_event({'data': {
    'entity_id': 'sensor.temp',
    'old_state': {'state': '20'},
    'new_state': {'state': '21', 'attributes': {}},
}})

assert ha.handle_message.await_count == 1   # fails on main: actual 0

Narrow scope — explicitly not changed

  • Cooldown check placement. Still runs before _format_state_change. Not doing wasted format work on throttled events, and not widening the cooldown check semantics to "every non-None formatter output" (which would entangle the throttle policy with formatting logic).
  • handle_message failure handling. If handle_message raises after the timestamp is recorded, the cooldown still counts — same behaviour as before the fix. Out of scope.
  • Dict pruning. _last_event_time is not otherwise bounded. This fix happens to stop no-op entities from accumulating entries (side benefit, not the goal).
  • _format_state_change semantics. Pure formatter, unchanged.

Behaviour matrix

Event Old state New state _format_state_change Before: cooldown written? After: cooldown written?
A: real change 20 21 message yes yes
B: no-op 20 20 None yes (bug) no
C: missing new_state 20 {} None yes (bug) no
D: throttled (within cooldown) not reached no no
E: ignored / domain-not-watched not reached no no

Rows A, D, E match pre-fix behaviour. Rows B and C change from "consumes window" to "passes through without consuming".

Regression coverage

tests/gateway/test_homeassistant.py gets a new TestCooldownIssue12062 class with 5 cases:

  • test_no_op_state_change_does_not_consume_cooldown — reporter's exact scenario.
  • test_no_op_does_not_write_last_event_time — structural pin on _last_event_time.
  • test_missing_new_state_does_not_consume_cooldown — covers the other _format_state_change → None branch.
  • test_forwarded_event_still_consumes_cooldown — preserved-behaviour canary so the fix can't silently disable cooldown.
  • test_no_op_then_real_change_across_entities — independent per-entity accounting.

4 of 5 fail on clean origin/main with symptoms like:

AssertionError: assert 0 == 1  (handle_message.await_count)
AssertionError: assert 'sensor.temp' not in {'sensor.temp': 1776522437.86081}

The 5th pins preserved behaviour.

Validation

source venv/bin/activate
python -m pytest tests/gateway/test_homeassistant.py -q
# 50 passed (45 pre-existing + 5 new)

Broader tests/gateway under -n auto → 13 pre-existing baseline failures (dingtalk card lifecycle, matrix encrypted upload, approve/deny E2E, whatsapp bridge runtime / xdist flakes). Zero in test_homeassistant.py or any touched code path. Happy to post the per-test baseline audit as a PR comment.

Pre-empted review questions

Q. Could this cause runaway forwarding if an entity emits thousands of no-ops?
No. _format_state_change still returns None for no-ops, so they don't forward regardless of cooldown state. Cooldown only throttles the stream of forwarded messages.

Q. Why not move the cooldown check itself later, past _format_state_change?
Two reasons:

  1. Format work would run even for entities within the cooldown window — wasted CPU.
  2. It would spread throttle policy into the formatter's return contract — the formatter is a pure function, not a throttling arbiter.

Q. Concurrency?
_handle_ha_event is async but there are no awaits between the cooldown read and the new self._last_event_time[entity_id] = now write (_format_state_change is synchronous and pure). Same atomicity as before.

Q. Was _last_event_time ever pruned?
No, and this PR doesn't add pruning — but the fix does stop unbounded growth from entities that only ever emit no-op events. Explicit pruning is a separate concern if long-lived deployments need it.


Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.

…nts (NousResearch#12062)

``HomeAssistantAdapter._handle_ha_event`` writes the per-entity cooldown
timestamp *before* calling ``_format_state_change``, which is what
actually decides whether the event will be forwarded.  For events
where ``old_state == new_state`` (or where ``new_state`` is missing),
the formatter returns ``None`` and the function returns early — but
``self._last_event_time[entity_id]`` has already been advanced.

As a result, a rapid no-op event "uses up" the cooldown window and
suppresses the next genuine state change.  Reporter: NousResearch#12062.

Root cause
----------
``gateway/platforms/homeassistant.py`` lines 286-299::

    # Apply cooldown
    now = time.time()
    last = self._last_event_time.get(entity_id, 0)
    if (now - last) < self._cooldown_seconds:
        return
    self._last_event_time[entity_id] = now   # <- advanced before we know
                                             #    the event forwards

    old_state = event_data.get("old_state", {})
    new_state = event_data.get("new_state", {})
    message = self._format_state_change(entity_id, old_state, new_state)

    if not message:                           # <- no-op / malformed → None,
        return                                #    but cooldown already burned

Fix
---
Keep the cooldown *check* early (so throttled events don't waste time
formatting), but move the cooldown *write* to after ``_format_state_change``
returns a non-empty message.  Only events that are actually forwarded
consume the cooldown window.

No API / config / public-behaviour change.  Two lines effectively
swapped; one comment added.

Reproduction (confirmed on origin/main ``6fb69229``)
----------------------------------------------------
::

    ha = HomeAssistantAdapter(PlatformConfig(enabled=True, token='t', extra={
        'url': 'http://x', 'watch_all': True, 'cooldown_seconds': 60,
    }))
    ha.handle_message = AsyncMock()
    await ha._handle_ha_event({'data': {'entity_id': 'sensor.temp',
        'old_state': {'state': '20'},
        'new_state': {'state': '20', 'attributes': {}}}})
    await ha._handle_ha_event({'data': {'entity_id': 'sensor.temp',
        'old_state': {'state': '20'},
        'new_state': {'state': '21', 'attributes': {}}}})
    assert ha.handle_message.await_count == 1   # fails on main (0)

Side benefit
------------
``_last_event_time`` no longer grows unbounded with entries for
entities that only ever emit no-op events.

Regression coverage
-------------------
``tests/gateway/test_homeassistant.py`` gets a new
``TestCooldownIssue12062`` class with 5 cases:

* ``test_no_op_state_change_does_not_consume_cooldown`` — reporter's
  exact scenario.
* ``test_no_op_does_not_write_last_event_time`` — structural pin on
  the cooldown map.
* ``test_missing_new_state_does_not_consume_cooldown`` — covers the
  other ``_format_state_change → None`` branch.
* ``test_forwarded_event_still_consumes_cooldown`` — preserved-
  behaviour canary so the fix can't silently disable cooldown.
* ``test_no_op_then_real_change_across_entities`` — independent
  per-entity accounting.

4 of the 5 fail on clean ``origin/main`` with the reporter symptom;
the 5th pins preserved behaviour.

Validation
----------
``source venv/bin/activate && python -m pytest
tests/gateway/test_homeassistant.py -q`` → **50 passed** (45
pre-existing + 5 new).

Broader ``tests/gateway`` under ``-n auto`` → 13 pre-existing
baseline failures (dingtalk card lifecycle, matrix encrypted upload,
approve/deny E2E, whatsapp bridge runtime / xdist flakes).  Zero in
``test_homeassistant.py`` or any touched code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 18, 2026 14:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Home Assistant adapter cooldown bug where no-op/malformed state_changed events could consume an entity’s cooldown window and suppress the next real change.

Changes:

  • Delay writing _last_event_time[entity_id] until after _format_state_change(...) returns a non-empty message.
  • Add regression tests covering no-op events, malformed events, and preserved cooldown behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
gateway/platforms/homeassistant.py Moves cooldown timestamp write to only occur for forwarded events, preventing no-ops from burning the cooldown window.
tests/gateway/test_homeassistant.py Adds targeted regression coverage for issue #12062 across multiple scenarios (no-op, missing new_state, per-entity independence, preserved cooldown).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@briandevans

Copy link
Copy Markdown
Contributor Author

CI audit — all 4 test failures are pre-existing deterministic baselines on clean origin/main (6fb69229). Zero in touched code (gateway/platforms/homeassistant.py / tests/gateway/test_homeassistant.py).

Verified on clean main earlier this cycle:

Test Symptom Root (on main)
test_browser_camofox_state.py::test_config_version_matches_current_schema assert 19 == 18 Hard-coded schema-version pin vs. current schema.
test_web_server.py::test_no_single_field_categories Category 'code_execution' has only 1 field(s) — should be merged Category merge drift.
test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pending AttributeError: '_Stub' object has no attribute '_apply_pending_steer_to_tool_results' Test stub hasn't kept up with run_agent.py:8092 refactor.
test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interrupted Same AttributeError Same baseline.

Each reproduces on clean origin/main locally with the identical error text.

Focused suite on this branch: pytest tests/gateway/test_homeassistant.py -q50 passed (45 pre-existing + 5 new).

All other CI checks green: check-attribution, build-and-push, e2e, supply-chain scan.

Reproduction commands available on request.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/webhook Webhook / API server labels Apr 23, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related competing PR: #12083 — both fix #12062 (no-op events consuming cooldown). Maintainers should pick one.

@briandevans

Copy link
Copy Markdown
Contributor Author

Thanks @alt-glitch for the cross-reference. 👍

Looked at @LeonSGP43's #12083 — both PRs resolve #12062 with the same 2-line swap in _handle_ha_event: move the self._last_event_time[entity_id] = now write from after the cooldown check to after _format_state_change returns a real message. Core logic is identical.

Diff of substance:

Aspect #12083 (LeonSGP43) #12170 (this PR)
Source change 2-line move 2-line move + inline comment referencing #12062
Regression test 1 test (test_noop_state_change_does_not_consume_cooldown) 5 tests in a dedicated TestCooldownIssue12062 class: reporter's repro, unavailable→unavailable, missing new_state, different-entity independence, multi-noop burst tolerance
Docs on why None Comment block above the test class spelling out the intent contract ("cooldown throttles forwarded messages, not dropped ones")

Happy either lands — the fix is the fix. My PR mostly differs in test surface area and the intent-documenting comments, which I'd argue pay off once someone else revisits the cooldown path. If maintainers prefer the minimal change, #12083 is perfectly correct and I'll close this one without ceremony.

@briandevans

Copy link
Copy Markdown
Contributor Author

Closing to keep the queue clean — happy to reopen if this is still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists platform/webhook Webhook / API server type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Home Assistant no-op state_changed events consume cooldown and suppress the next real change

3 participants