fix(ha-mqtt): restore HA alarm panel arm/disarm end-to-end + PIN perf#46
Conversation
HA's alarm_control_panel.mqtt exposes only `action` and `code` to
command_template — `value` is lock.mqtt's variable, not alarm's. With
`{{ value }}`, Jinja rendered the action as the literal string "None"
(undefined → str(None)) and HA published payloads like
`{"action": "None", "code": "1234"}`. Latchpoint's parser then returned
"Unknown action." without invoking the state machine, so arm/disarm
clicks in HA had no effect.
The previous test asserted the broken template against itself, which is
why nothing caught it — replaced with one that locks the correct
variable name and added a render-simulation test that asserts the
action survives as a real string.
Verified live: broker capture before this change showed
`{"action": "None", "code": "1234"}` on every click; HA docs confirm
the variable scope.
https://www.home-assistant.io/integrations/alarm_control_panel.mqtt/#command_template
There was a problem hiding this comment.
Pull request overview
Fixes Home Assistant MQTT alarm_control_panel discovery payload so arm/disarm commands publish a valid action value by using HA’s documented {{ action }} variable (instead of {{ value }}), preventing commands from being ignored by the server-side parser.
Changes:
- Update HA MQTT discovery
command_templateto use{{ action }}and expand the inline documentation to match HA’s variable scope. - Replace the previous “self-equality” regression test with one that asserts the correct HA placeholders.
- Add a render-simulation test to ensure the command template produces valid JSON with a concrete
action/code.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| backend/integrations_home_assistant/mqtt_alarm_entity.py | Switches discovery payload command_template to {{ action }} and documents HA’s template variable scope. |
| backend/integrations_home_assistant/tests/test_mqtt_alarm_entity.py | Strengthens regression coverage around the command template placeholders and rendering behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two paho clients sharing the configured `client_id` (default "latchpoint-alarm") fight for the same broker-side identity. The broker boots the older one with "session taken over" and the loser reconnects within ~1s, ping-ponging forever. Each on_connect fires `_after_connect`, which republishes discovery + availability + state — measured at HA as constant retained-message spam. The collision shape we hit in practice was a diagnostic script run via `docker exec` into the live container: `django.setup()` triggered `transports_mqtt.apps.ready()` and spawned a *second* MqttConnectionManager in the same container, both connecting as `latchpoint-alarm`. The same shape happens for any second instance — parallel deploys, dev side-cars, accidental imports. Defensively append a per-process random suffix to the configured `client_id` when building paho clients. The suffix is generated once in `MqttConnectionManager.__init__` so it stays stable across reconnects and settings PATCHes within one process; across processes it diverges. `test_connection()` opts into a fresh suffix per call so its throwaway client never knocks the live one off, even from inside the same process. The user-facing `client_id` setting is unchanged — it now functions as a prefix and existing configurations keep working.
…ffix Existing tests prove the `client_id_suffix=` override mechanism on `_build_client` works when called directly, but no test asserted that `test_connection()` actually passes a non-None override at runtime. If that override regressed, a `test_connection` invoked from inside the live container would reuse the manager's stable suffix and the broker would evict the running daphne client mid-validation — the exact shape of the bug PR #46 was meant to prevent. Adds a SimpleTestCase that monkey-patches `_build_client` and `_import_paho` on a manager instance, calls `test_connection()` against a stub settings dict, and asserts the captured `client_id_suffix` is non-None and differs from the manager's live `_client_id_suffix`. Short-circuits via `RuntimeError` before threading or `connect_async` are touched, so no broker / no I/O.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
backend/transports_mqtt/manager.py:399
- In the
except TypeErrorfallback, the code buildsmqtt.Client()without anyclient_id, which drops the configured prefix + suffix entirely. If this fallback is ever hit (e.g., a paho version/signature mismatch), it defeats the collision-avoidance and the “stable within a process” guarantee. Consider retrying client construction with a positionalclient_id(or otherwise ensuring the computedclient_idis still applied) rather than omitting it.
try:
client = mqtt.Client(client_id=client_id)
except TypeError:
client = mqtt.Client()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Drop redundant assertNotEqual in HA mqtt alarm template test (implied by the prior assertEqual that already pins action to ARM_AWAY). - Make _build_client's TypeError fallback preserve the computed <prefix>-<suffix> client_id by retrying positionally instead of falling through to a no-arg mqtt.Client(); a no-arg fallback silently dropped the per-process suffix and undid the collision fix it was paired with. - Add a wiring guard test that fakes a kwarg-rejecting paho Client and asserts the fallback path still ships the suffix to the broker.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Alarm PINs are 4-8 digits with a 10K-100M keyspace. Stretching them at the default user-password Argon2 cost (~150-300ms/verify) doesn't materially raise an offline-brute-force attacker's bar - the keyspace is so small that the entire space is recovered in minutes-to-hours even at strong cost - while the slow verify shows up as a multi- hundred-ms tax on every arm/disarm round trip from the HA MQTT alarm panel. Add UserCodeArgon2Hasher (algorithm "argon2-pin", time_cost=1, memory_cost=8 MiB, parallelism=2; ~30ms/verify) and route create_user_code / update_user_code through it. User passwords keep the strong default. Existing UserCode rows hashed under the original "argon2" algorithm continue to verify on the strong hasher with no migration required - check_password resolves by hash prefix. Combined with the PR #46 MQTT-template fix, this gets HA-driven arm/ disarm clicks down from ~2s to subsecond round-trip on the live home-lab broker.
HA's MQTT alarm panel renders the command template with `code = code or ""`,
so when the user arms without typing a PIN the broker payload is
`{"action":"ARM_HOME","code":""}` — the field is present but empty.
`_handle_command_payload` returned `""` for that case while the arm branch
of `handle_mqtt_alarm_command` checked `raw_code is not None`. Empty string
is not None, so code validation was forced even when `code_arm_required`
was false, and the command failed with "Code required.".
Normalize an empty/whitespace `code` to None at the parser boundary so
both arm and disarm branches see the same "no code" contract. The disarm
branch already used truthiness (`if not raw_code:`) and is unaffected.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
backend/config/settings.py:117
- This PR introduces a new password hasher (UserCodeArgon2Hasher) and changes UserCode hashing behavior, but the PR title/description are scoped to HA MQTT fixes. Please either update the PR description to cover this security/perf change (and its rollout considerations), or split it into a separate PR so the MQTT fix can be reviewed/merged independently.
PASSWORD_HASHERS = [
"django.contrib.auth.hashers.Argon2PasswordHasher",
"accounts.hashers.UserCodeArgon2Hasher",
"django.contrib.auth.hashers.PBKDF2PasswordHasher",
"django.contrib.auth.hashers.PBKDF2SHA1PasswordHasher",
"django.contrib.auth.hashers.BCryptSHA256PasswordHasher",
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Drop the `try mqtt.Client(client_id=...) except TypeError: mqtt.Client(client_id)` fallback in `_build_client`. paho-mqtt is pinned >=2.0; the kwarg call always succeeds. If the fallback ever did fire on paho 2.x, the positional arg would be fed to `callback_api_version` (expects an enum) and crash with a worse error than the kwarg call itself. Removed the matching test that locked in the broken behavior. - Shrink the per-process client_id suffix from 8 hex chars (4 bytes) to 6 hex chars (3 bytes). Default composed `latchpoint-alarm-XXXXXX` now lands at exactly 23 bytes — the MQTT 3.1 spec maximum. paho 2.0+ defaults to 3.1.1 which raises the cap, but staying inside 3.1 protects against stricter brokers. 6 hex still gives 16M distinct values, far above what's needed for 1-2 process collision avoidance. - Add `test_default_composed_client_id_fits_mqtt_3_1_length_limit` regression guard so a future suffix/prefix bump can't silently push us over. - Update user-facing strings: settings registry title `"Client ID"` → `"Client ID prefix"` with description explaining the suffix and the 16-char prefix budget; rc=2 connect-error messages now reference "client_id prefix" so users know which knob they're turning.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| base_client_id = str(settings.get("client_id") or "").strip() or "latchpoint-alarm" | ||
| suffix = client_id_suffix or self._client_id_suffix | ||
| client_id = f"{base_client_id}-{suffix}" |
Summary
Four changes that together restore arm/disarm functionality from the Home Assistant alarm panel and cut its end-to-end latency from ~2s to subsecond:
{{ action }}not{{ value }}in the alarmcommand_template. Without this, every click landed on the broker as{"action": "None", "code": "..."}and was rejected as "Unknown action."client_id. Defends against twoMqttConnectionManagerinstances inside the same container (e.g. a diagnostic script that randjango.setup()) fighting the live daphne client for the same MQTT identity.codefield in the MQTT command payload as no-code. Without this, arming withcode_arm_required=Falsewas rejected with "Code required." because HA always sendscode: ""in the rendered template, and our parser surfaced that as""rather thanNone.Bug 1 —
{{ action }}not{{ value }}HA's
alarm_control_panel.mqttexposes onlyactionandcodetocommand_template—valueislock.mqtt's variable, not alarm's (HA docs). With{{ value }}, Jinja sawvalueas undefined and rendered it as the literal string"None", so HA published payloads like:{"action": "None", "code": "1234"}Latchpoint's parser uppercased
"None"→"NONE", found no match in_ACTION_TO_STATE, and replied"Unknown action."to the error topic without invoking the state machine.Why nothing caught it: the previous test round-tripped the broken template through
json.loadsand asserted equality against itself — it locked the bug in place. Replaced with a test that asserts the documented HA variable name, plus a render-simulation test using Django's template engine that proves the action survives substitution as a real string.Bug 2 — Per-process
client_idsuffixWhen two paho clients share the configured
client_id(default"latchpoint-alarm"), the MQTT broker enforces single-client-per-id and boots the older one withsession taken over. The loser auto-reconnects within ~1s, ping-ponging forever. Each successful connect fires_after_connect, republishing retained discovery + availability + state.The collision shape encountered in practice during this debug session: a diagnostic script run via
docker execinto the live container calleddjango.setup(), which triggeredtransports_mqtt.apps.ready()and spawned a secondMqttConnectionManagerin the same container. Both clients connected aslatchpoint-alarm. Same shape would hit any future scenario where a second consumer (parallel deploy, dev side-car, accidental import path) importstransports_mqtt.Fix: append a per-process random suffix (
secrets.token_hex(3), 6 hex chars) to the configuredclient_idwhen building the paho client. Stable within one process so reconnects don't churn the broker's view of the live client; divergent across processes so they cannot collide.test_connection()opts into a fresh suffix per call so its throwaway client never knocks the live one off, even from inside the same process.The 6-hex suffix is sized so the default composed
client_idlatchpoint-alarm-XXXXXXlands at exactly 23 bytes — the MQTT 3.1 spec limit. paho-mqtt 2.0+ defaults to 3.1.1 (which raises this to 65535), but staying inside the 3.1 limit guards against any stricter broker. 6 hex still gives 16M distinct values, far above what's needed for 1-2 process collision avoidance. The user-facing setting was renamed toClient ID prefixwith description noting the 16-char prefix budget; rc=2 connect-error messages were updated to match.Bug 3 — Empty
codefield treated as missingHA's MQTT alarm panel renders the command template with
code = code or "", so when a user arms (or disarms) without typing a PIN the broker payload is{"action": "ARM_HOME", "code": ""}— the field is present but empty._handle_command_payloadreturned""for that case, while the arm branch ofhandle_mqtt_alarm_commandcheckedcode_required = get_setting_bool(profile, "code_arm_required") or raw_code is not None. Empty string is not None, so the handler forced code validation even whencode_arm_requiredwas False, and the broker payload came back aserror: "Code required.".Fix: normalize empty/whitespace
codetoNoneonce at the parser boundary in_handle_command_payload. The disarm branch already used truthiness (if not raw_code:) and was unaffected — collapsing the two representations at the parser removes the boundary inconsistency for any future caller too.Verified on the live HA instance via the MCP server:
alarm_control_panel.latchpoint_alarmalready hadcode_arm_required: falsein its state attributes, confirming the discovery republish pipeline was healthy and the failure was purely backend-side.Perf — Lower-cost Argon2 for PINs
UserCode PINs are 4-8 digits (10K-100M keyspace). Stretching them at the default user-password Argon2 cost (~150-300ms/verify) doesn't materially raise an offline-brute-force attacker's bar — the entire keyspace is recoverable in minutes-to-hours even at strong cost — while the slow verify shows up as a multi-hundred-ms tax on every arm/disarm round trip from the HA MQTT alarm panel.
Add
UserCodeArgon2Hasher(algorithmargon2-pin,time_cost=1,memory_cost=8 MiB,parallelism=2; ~30ms/verify) and routecreate_user_code/update_user_codethrough it. User passwords keep the strong default. Existing UserCode rows hashed under the originalargon2algorithm continue to verify on the strong hasher with no migration required —check_passwordresolves by hash prefix.Combined with Bug 1, HA-driven arm/disarm clicks now land subsecond on the live home-lab broker.
Verification
python manage.py test transports_mqtt integrations_home_assistant.tests.test_mqtt_alarm_entity accounts.tests.test_user_codes_hasher -v 2→ 34 tests passuvx ruff check backend/transports_mqtt/ backend/integrations_home_assistant/ backend/accounts/ backend/alarm/settings_registry.py→ cleanuvx ruff format --check ...→ clean{"action": "None", "code": "1234"}on every clickcode_arm_required: falseafter the discovery republish — confirmed via MCPha_get_state alarm_control_panel.latchpoint_alarmTest plan
After merge and a fresh image is built/redeployed:
latchpoint_alarm/alarm/#on the broker{"action": "ARM_AWAY", "code": "..."}followed bylatchpoint_alarm/alarm/state :: armed_awaywithin ~1smqttlog shows a single stablelatchpoint-alarm-XXXXXXclient (6 hex chars, nosession taken overcycling)code_arm_required: falsein alarm settings, click arm-away in HA without typing a PIN → expect successful state transition (noterror: "Code required.")code_arm_required: true, arm with a valid PIN → still worksLatchpointdevice under HA's MQTT integration; Latchpoint's_after_connectwill republish discovery and HA will recreate the entity with the corrected template.Files changed
backend/integrations_home_assistant/mqtt_alarm_entity.py— template variable fix + empty-code normalization in_handle_command_payloadbackend/integrations_home_assistant/tests/test_mqtt_alarm_entity.py— replace mirror-test, add render simulation, addHandleCommandPayloadTestsfor empty/whitespace/missing/real codesbackend/transports_mqtt/manager.py—_client_id_suffixin__init__(3 bytes / 6 hex chars to fit MQTT 3.1's 23-byte limit);_build_clientbecomes an instance method that appends the suffix and uses the paho 2.0+ kwarg call without a fallback;test_connectionoverrides with a fresh suffix; rc=2 error messages reference "client_id prefix"backend/transports_mqtt/tests/test_manager.py— six tests covering distinct/stable/prefix-preserving/default-prefix/override/MQTT-3.1-length behaviors plus a wiring guard thattest_connectionforwards a non-live suffixbackend/alarm/settings_registry.py—client_idschema title renamed to "Client ID prefix" with description explaining the auto-suffix and the 16-char prefix budgetbackend/accounts/hashers.py— newUserCodeArgon2Hasher(argon2-pin, low cost)backend/accounts/tests/test_user_codes_hasher.py— coverage for the PIN hasherbackend/accounts/use_cases/user_codes.py— route create/update through the PIN hasherbackend/config/settings.py— register the new hasher