fix(ha-mqtt): restore HA alarm panel arm/disarm end-to-end + PIN perf by ljmerza · Pull Request #46 · latchpoint/latchpoint

ljmerza · 2026-05-01T23:17:59Z

Summary

Four changes that together restore arm/disarm functionality from the Home Assistant alarm panel and cut its end-to-end latency from ~2s to subsecond:

Use {{ action }} not {{ value }} in the alarm command_template. Without this, every click landed on the broker as {"action": "None", "code": "..."} and was rejected as "Unknown action."
Append a per-process random suffix to the broker client_id. Defends against two MqttConnectionManager instances inside the same container (e.g. a diagnostic script that ran django.setup()) fighting the live daphne client for the same MQTT identity.
Treat an empty code field in the MQTT command payload as no-code. Without this, arming with code_arm_required=False was rejected with "Code required." because HA always sends code: "" in the rendered template, and our parser surfaced that as "" rather than None.
Lower-cost Argon2 hasher for UserCode PINs (~150-300ms → ~30ms per verify). Alarm PINs are 4-8 digits; the strong user-password cost gives no real attacker bar against a tiny keyspace but stacks ~hundreds of ms on every arm/disarm round trip.

Bug 1 — `{{ action }}` not `{{ value }}`

HA's alarm_control_panel.mqtt exposes only action and code to command_template — value is lock.mqtt's variable, not alarm's (HA docs). With {{ value }}, Jinja saw value as undefined and rendered it as the literal string "None", so HA published payloads like:

{"action": "None", "code": "1234"}

Latchpoint's parser uppercased "None" → "NONE", found no match in _ACTION_TO_STATE, and replied "Unknown action." to the error topic without invoking the state machine.

Why nothing caught it: the previous test round-tripped the broken template through json.loads and asserted equality against itself — it locked the bug in place. Replaced with a test that asserts the documented HA variable name, plus a render-simulation test using Django's template engine that proves the action survives substitution as a real string.

Bug 2 — Per-process `client_id` suffix

When two paho clients share the configured client_id (default "latchpoint-alarm"), the MQTT broker enforces single-client-per-id and boots the older one with session taken over. The loser auto-reconnects within ~1s, ping-ponging forever. Each successful connect fires _after_connect, republishing retained discovery + availability + state.

The collision shape encountered in practice during this debug session: a diagnostic script run via docker exec into the live container called django.setup(), which triggered transports_mqtt.apps.ready() and spawned a second MqttConnectionManager in the same container. Both clients connected as latchpoint-alarm. Same shape would hit any future scenario where a second consumer (parallel deploy, dev side-car, accidental import path) imports transports_mqtt.

Fix: append a per-process random suffix (secrets.token_hex(3), 6 hex chars) to the configured client_id when building the paho client. Stable within one process so reconnects don't churn the broker's view of the live client; divergent across processes so they cannot collide. test_connection() opts into a fresh suffix per call so its throwaway client never knocks the live one off, even from inside the same process.

The 6-hex suffix is sized so the default composed client_id latchpoint-alarm-XXXXXX lands at exactly 23 bytes — the MQTT 3.1 spec limit. paho-mqtt 2.0+ defaults to 3.1.1 (which raises this to 65535), but staying inside the 3.1 limit guards against any stricter broker. 6 hex still gives 16M distinct values, far above what's needed for 1-2 process collision avoidance. The user-facing setting was renamed to Client ID prefix with description noting the 16-char prefix budget; rc=2 connect-error messages were updated to match.

Bug 3 — Empty `code` field treated as missing

HA's MQTT alarm panel renders the command template with code = code or "", so when a user arms (or disarms) without typing a PIN the broker payload is {"action": "ARM_HOME", "code": ""} — the field is present but empty.

_handle_command_payload returned "" for that case, while the arm branch of handle_mqtt_alarm_command checked code_required = get_setting_bool(profile, "code_arm_required") or raw_code is not None. Empty string is not None, so the handler forced code validation even when code_arm_required was False, and the broker payload came back as error: "Code required.".

Fix: normalize empty/whitespace code to None once at the parser boundary in _handle_command_payload. The disarm branch already used truthiness (if not raw_code:) and was unaffected — collapsing the two representations at the parser removes the boundary inconsistency for any future caller too.

Verified on the live HA instance via the MCP server: alarm_control_panel.latchpoint_alarm already had code_arm_required: false in its state attributes, confirming the discovery republish pipeline was healthy and the failure was purely backend-side.

Perf — Lower-cost Argon2 for PINs

UserCode PINs are 4-8 digits (10K-100M keyspace). Stretching them at the default user-password Argon2 cost (~150-300ms/verify) doesn't materially raise an offline-brute-force attacker's bar — the entire keyspace is recoverable in minutes-to-hours even at strong cost — while the slow verify shows up as a multi-hundred-ms tax on every arm/disarm round trip from the HA MQTT alarm panel.

Add UserCodeArgon2Hasher (algorithm argon2-pin, time_cost=1, memory_cost=8 MiB, parallelism=2; ~30ms/verify) and route create_user_code / update_user_code through it. User passwords keep the strong default. Existing UserCode rows hashed under the original argon2 algorithm continue to verify on the strong hasher with no migration required — check_password resolves by hash prefix.

Combined with Bug 1, HA-driven arm/disarm clicks now land subsecond on the live home-lab broker.

Verification

✅ python manage.py test transports_mqtt integrations_home_assistant.tests.test_mqtt_alarm_entity accounts.tests.test_user_codes_hasher -v 2 → 34 tests pass
✅ uvx ruff check backend/transports_mqtt/ backend/integrations_home_assistant/ backend/accounts/ backend/alarm/settings_registry.py → clean
✅ uvx ruff format --check ... → clean
✅ Empirical: live broker capture before this change showed {"action": "None", "code": "1234"} on every click
✅ Empirical: HA entity attributes already report code_arm_required: false after the discovery republish — confirmed via MCP ha_get_state alarm_control_panel.latchpoint_alarm

Test plan

After merge and a fresh image is built/redeployed:

Sniff latchpoint_alarm/alarm/# on the broker
Click arm-away in HA → expect {"action": "ARM_AWAY", "code": "..."} followed by latchpoint_alarm/alarm/state :: armed_away within ~1s
Repeat for arm-home, arm-night, and disarm
Confirm broker mqtt log shows a single stable latchpoint-alarm-XXXXXX client (6 hex chars, no session taken over cycling)
With code_arm_required: false in alarm settings, click arm-away in HA without typing a PIN → expect successful state transition (not error: "Code required.")
With code_arm_required: true, arm with a valid PIN → still works
Disarm with a valid PIN → still works and feels noticeably faster (PIN verify dropped from hundreds of ms to ~30ms)
In the MQTT settings UI, confirm the field is now labelled "Client ID prefix" with the suffix-explanation description
If HA's UI still doesn't update after a successful state publish, remove the Latchpoint device under HA's MQTT integration; Latchpoint's _after_connect will republish discovery and HA will recreate the entity with the corrected template.

Files changed

backend/integrations_home_assistant/mqtt_alarm_entity.py — template variable fix + empty-code normalization in _handle_command_payload
backend/integrations_home_assistant/tests/test_mqtt_alarm_entity.py — replace mirror-test, add render simulation, add HandleCommandPayloadTests for empty/whitespace/missing/real codes
backend/transports_mqtt/manager.py — _client_id_suffix in __init__ (3 bytes / 6 hex chars to fit MQTT 3.1's 23-byte limit); _build_client becomes an instance method that appends the suffix and uses the paho 2.0+ kwarg call without a fallback; test_connection overrides with a fresh suffix; rc=2 error messages reference "client_id prefix"
backend/transports_mqtt/tests/test_manager.py — six tests covering distinct/stable/prefix-preserving/default-prefix/override/MQTT-3.1-length behaviors plus a wiring guard that test_connection forwards a non-live suffix
backend/alarm/settings_registry.py — client_id schema title renamed to "Client ID prefix" with description explaining the auto-suffix and the 16-char prefix budget
backend/accounts/hashers.py — new UserCodeArgon2Hasher (argon2-pin, low cost)
backend/accounts/tests/test_user_codes_hasher.py — coverage for the PIN hasher
backend/accounts/use_cases/user_codes.py — route create/update through the PIN hasher
backend/config/settings.py — register the new hasher

HA's alarm_control_panel.mqtt exposes only `action` and `code` to command_template — `value` is lock.mqtt's variable, not alarm's. With `{{ value }}`, Jinja rendered the action as the literal string "None" (undefined → str(None)) and HA published payloads like `{"action": "None", "code": "1234"}`. Latchpoint's parser then returned "Unknown action." without invoking the state machine, so arm/disarm clicks in HA had no effect. The previous test asserted the broken template against itself, which is why nothing caught it — replaced with one that locks the correct variable name and added a render-simulation test that asserts the action survives as a real string. Verified live: broker capture before this change showed `{"action": "None", "code": "1234"}` on every click; HA docs confirm the variable scope. https://www.home-assistant.io/integrations/alarm_control_panel.mqtt/#command_template

Copilot

Pull request overview

Fixes Home Assistant MQTT alarm_control_panel discovery payload so arm/disarm commands publish a valid action value by using HA’s documented {{ action }} variable (instead of {{ value }}), preventing commands from being ignored by the server-side parser.

Changes:

Update HA MQTT discovery command_template to use {{ action }} and expand the inline documentation to match HA’s variable scope.
Replace the previous “self-equality” regression test with one that asserts the correct HA placeholders.
Add a render-simulation test to ensure the command template produces valid JSON with a concrete action/code.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
backend/integrations_home_assistant/mqtt_alarm_entity.py	Switches discovery payload `command_template` to `{{ action }}` and documents HA’s template variable scope.
backend/integrations_home_assistant/tests/test_mqtt_alarm_entity.py	Strengthens regression coverage around the command template placeholders and rendering behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two paho clients sharing the configured `client_id` (default "latchpoint-alarm") fight for the same broker-side identity. The broker boots the older one with "session taken over" and the loser reconnects within ~1s, ping-ponging forever. Each on_connect fires `_after_connect`, which republishes discovery + availability + state — measured at HA as constant retained-message spam. The collision shape we hit in practice was a diagnostic script run via `docker exec` into the live container: `django.setup()` triggered `transports_mqtt.apps.ready()` and spawned a *second* MqttConnectionManager in the same container, both connecting as `latchpoint-alarm`. The same shape happens for any second instance — parallel deploys, dev side-cars, accidental imports. Defensively append a per-process random suffix to the configured `client_id` when building paho clients. The suffix is generated once in `MqttConnectionManager.__init__` so it stays stable across reconnects and settings PATCHes within one process; across processes it diverges. `test_connection()` opts into a fresh suffix per call so its throwaway client never knocks the live one off, even from inside the same process. The user-facing `client_id` setting is unchanged — it now functions as a prefix and existing configurations keep working.

…ffix Existing tests prove the `client_id_suffix=` override mechanism on `_build_client` works when called directly, but no test asserted that `test_connection()` actually passes a non-None override at runtime. If that override regressed, a `test_connection` invoked from inside the live container would reuse the manager's stable suffix and the broker would evict the running daphne client mid-validation — the exact shape of the bug PR #46 was meant to prevent. Adds a SimpleTestCase that monkey-patches `_build_client` and `_import_paho` on a manager instance, calls `test_connection()` against a stub settings dict, and asserts the captured `client_id_suffix` is non-None and differs from the manager's live `_client_id_suffix`. Short-circuits via `RuntimeError` before threading or `connect_async` are touched, so no broker / no I/O.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

backend/transports_mqtt/manager.py:399

In the except TypeError fallback, the code builds mqtt.Client() without any client_id, which drops the configured prefix + suffix entirely. If this fallback is ever hit (e.g., a paho version/signature mismatch), it defeats the collision-avoidance and the “stable within a process” guarantee. Consider retrying client construction with a positional client_id (or otherwise ensuring the computed client_id is still applied) rather than omitting it.

        try:
            client = mqtt.Client(client_id=client_id)
        except TypeError:
            client = mqtt.Client()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Drop redundant assertNotEqual in HA mqtt alarm template test (implied by the prior assertEqual that already pins action to ARM_AWAY). - Make _build_client's TypeError fallback preserve the computed <prefix>-<suffix> client_id by retrying positionally instead of falling through to a no-arg mqtt.Client(); a no-arg fallback silently dropped the per-process suffix and undid the collision fix it was paired with. - Add a wiring guard test that fakes a kwarg-rejecting paho Client and asserts the fallback path still ships the suffix to the broker.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Alarm PINs are 4-8 digits with a 10K-100M keyspace. Stretching them at the default user-password Argon2 cost (~150-300ms/verify) doesn't materially raise an offline-brute-force attacker's bar - the keyspace is so small that the entire space is recovered in minutes-to-hours even at strong cost - while the slow verify shows up as a multi- hundred-ms tax on every arm/disarm round trip from the HA MQTT alarm panel. Add UserCodeArgon2Hasher (algorithm "argon2-pin", time_cost=1, memory_cost=8 MiB, parallelism=2; ~30ms/verify) and route create_user_code / update_user_code through it. User passwords keep the strong default. Existing UserCode rows hashed under the original "argon2" algorithm continue to verify on the strong hasher with no migration required - check_password resolves by hash prefix. Combined with the PR #46 MQTT-template fix, this gets HA-driven arm/ disarm clicks down from ~2s to subsecond round-trip on the live home-lab broker.

HA's MQTT alarm panel renders the command template with `code = code or ""`, so when the user arms without typing a PIN the broker payload is `{"action":"ARM_HOME","code":""}` — the field is present but empty. `_handle_command_payload` returned `""` for that case while the arm branch of `handle_mqtt_alarm_command` checked `raw_code is not None`. Empty string is not None, so code validation was forced even when `code_arm_required` was false, and the command failed with "Code required.". Normalize an empty/whitespace `code` to None at the parser boundary so both arm and disarm branches see the same "no code" contract. The disarm branch already used truthiness (`if not raw_code:`) and is unaffected.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

backend/config/settings.py:117

This PR introduces a new password hasher (UserCodeArgon2Hasher) and changes UserCode hashing behavior, but the PR title/description are scoped to HA MQTT fixes. Please either update the PR description to cover this security/perf change (and its rollout considerations), or split it into a separate PR so the MQTT fix can be reviewed/merged independently.

PASSWORD_HASHERS = [
    "django.contrib.auth.hashers.Argon2PasswordHasher",
    "accounts.hashers.UserCodeArgon2Hasher",
    "django.contrib.auth.hashers.PBKDF2PasswordHasher",
    "django.contrib.auth.hashers.PBKDF2SHA1PasswordHasher",
    "django.contrib.auth.hashers.BCryptSHA256PasswordHasher",
]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Drop the `try mqtt.Client(client_id=...) except TypeError: mqtt.Client(client_id)` fallback in `_build_client`. paho-mqtt is pinned >=2.0; the kwarg call always succeeds. If the fallback ever did fire on paho 2.x, the positional arg would be fed to `callback_api_version` (expects an enum) and crash with a worse error than the kwarg call itself. Removed the matching test that locked in the broken behavior. - Shrink the per-process client_id suffix from 8 hex chars (4 bytes) to 6 hex chars (3 bytes). Default composed `latchpoint-alarm-XXXXXX` now lands at exactly 23 bytes — the MQTT 3.1 spec maximum. paho 2.0+ defaults to 3.1.1 which raises the cap, but staying inside 3.1 protects against stricter brokers. 6 hex still gives 16M distinct values, far above what's needed for 1-2 process collision avoidance. - Add `test_default_composed_client_id_fits_mqtt_3_1_length_limit` regression guard so a future suffix/prefix bump can't silently push us over. - Update user-facing strings: settings registry title `"Client ID"` → `"Client ID prefix"` with description explaining the suffix and the 16-char prefix budget; rc=2 connect-error messages now reference "client_id prefix" so users know which knob they're turning.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        base_client_id = str(settings.get("client_id") or "").strip() or "latchpoint-alarm"
+        suffix = client_id_suffix or self._client_id_suffix
+        client_id = f"{base_client_id}-{suffix}"


Copilot AI review requested due to automatic review settings May 1, 2026 23:18

Copilot started reviewing on behalf of ljmerza May 1, 2026 23:18 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread backend/integrations_home_assistant/tests/test_mqtt_alarm_entity.py Outdated

ljmerza changed the title ~~fix(ha-mqtt): use {{ action }} not {{ value }} in alarm command_template~~ fix(ha-mqtt): use {{ action }} in alarm command_template + unique broker client_id per process May 1, 2026

Copilot AI review requested due to automatic review settings May 1, 2026 23:32

Copilot started reviewing on behalf of ljmerza May 1, 2026 23:33 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread backend/transports_mqtt/tests/test_manager.py

ljmerza requested a review from Copilot May 1, 2026 23:41

Copilot started reviewing on behalf of ljmerza May 1, 2026 23:42 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread backend/transports_mqtt/manager.py

Comment thread backend/transports_mqtt/manager.py Outdated

Comment thread backend/transports_mqtt/tests/test_manager.py

ljmerza added 2 commits May 1, 2026 20:44

Copilot AI review requested due to automatic review settings May 2, 2026 01:06

Copilot started reviewing on behalf of ljmerza May 2, 2026 01:06 View session

ljmerza changed the title ~~fix(ha-mqtt): use {{ action }} in alarm command_template + unique broker client_id per process~~ fix(ha-mqtt): restore HA alarm panel arm/disarm end-to-end + PIN perf May 2, 2026

Copilot AI reviewed May 2, 2026

View reviewed changes

Comment thread backend/accounts/use_cases/user_codes.py

Comment thread backend/accounts/hashers.py

ljmerza added 2 commits May 1, 2026 21:20

Address Copilot review feedback on PR #46

ad4af4b

ljmerza requested review from Copilot and removed request for Copilot May 2, 2026 01:32

Copilot started reviewing on behalf of ljmerza May 2, 2026 01:33 View session

ljmerza requested review from Copilot and removed request for Copilot May 2, 2026 01:34

Copilot started reviewing on behalf of ljmerza May 2, 2026 01:34 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

Address Copilot review feedback on PR #46

887547c

ljmerza requested a review from Copilot May 2, 2026 01:44

Copilot started reviewing on behalf of ljmerza May 2, 2026 01:44 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

Comment thread backend/transports_mqtt/manager.py

Comment on lines +408 to +410

base_client_id = str(settings.get("client_id") or "").strip() or "latchpoint-alarm"

suffix = client_id_suffix or self._client_id_suffix

client_id = f"{base_client_id}-{suffix}"

ljmerza merged commit 8e2f4b5 into main May 2, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ha-mqtt): restore HA alarm panel arm/disarm end-to-end + PIN perf#46

fix(ha-mqtt): restore HA alarm panel arm/disarm end-to-end + PIN perf#46
ljmerza merged 9 commits into
mainfrom
fix/ha-mqtt-alarm-action-template-variable

ljmerza commented May 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ljmerza commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1 — {{ action }} not {{ value }}

Bug 2 — Per-process client_id suffix

Bug 3 — Empty code field treated as missing

Perf — Lower-cost Argon2 for PINs

Verification

Test plan

Files changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ljmerza commented May 1, 2026 •

edited

Loading

Bug 1 — `{{ action }}` not `{{ value }}`

Bug 2 — Per-process `client_id` suffix

Bug 3 — Empty `code` field treated as missing