Skip to content

feat(automations): Automation Engine — Phase 1a (#3653)#3668

Merged
Yeraze merged 23 commits into
mainfrom
feat/automation-engine-phase1a
Jun 23, 2026
Merged

feat(automations): Automation Engine — Phase 1a (#3653)#3668
Yeraze merged 23 commits into
mainfrom
feat/automation-engine-phase1a

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Automation Engine — Phase 1a (#3653)

A generic, global "when this happens, do that" automation system for MeshMonitor — Home Assistant / Node-RED / IFTTT-inspired, replacing hardcoded automations. Applies across all sources (like Map Analysis). Built on a graph data model with an IFTTT/Maintainerr-style linear builder (drag-and-drop canvas deferred to Phase 2).

Design doc: docs/internal/dev-notes/AUTOMATION_ENGINE_PLAN.md.

Screenshots

The builder + in-app Test panel (dry-run: "Trigger matched · completed", no IO, nothing saved)

Builder with Test panel

IFTTT/Maintainerr-style builder — WHEN → RULE (IF/THEN) → optional FINALLY combine

Automation builder

Automations list  ·  Variables help drawer (types & scopes)

Highlights

Engine (server) — event-driven off dataEventEmitter; per-automation trigger index; topological graph evaluation with condition routing (If/ElseIf/Else), fanout/collapse (map/reduce: ANY/ALL/NONE), cooldown throttle, per-run action cap, and a run-log. All mesh IO is injected (ActionDeps) so the pipeline is fully unit-tested.

Triggers — message · node discovered · node updated · telemetry · schedule · system (bootup / source online / source offline / upgrade available) · geofence (enter / exit / dwell).

Conditions — numeric (event field / hydrated node field / latest telemetry, all ops, variable operands) · string (message text, node longName/shortName/roleName, all ops) · source filter · distance from a point · variable / flag · time-of-day.

Actions — tapback · send message (channel/DM, reply-to-trigger, interpolation) · node manage (favorite/ignore/delete) · notify via Apprise.

User-defined variables — types string/integer/float/boolean/flag (flag = auto-clearing boolean for anti-spam) · scopes global/source/node/sourceNode · readonly constants vs dynamic · with an in-UI help drawer.

In-app Test / dry-runPOST /api/automations/test: run a workflow against a synthetic event with no mesh IO, no Apprise dispatch, no persistence. Returns the full trace (trigger match, per-condition verdicts, resolved action params, simulated variable writes). Surfaced as a ▶ Test panel in the builder so authors preview a rule before saving — and reused as the deterministic substrate for no-hardware system tests.

Appriseaction.notify dispatches through appriseNotificationService.notifyDirect (per-source → global → APPRISE_URL → bundled :8000), with optional inline Apprise URL(s) + severity.

Data model — 4 global tables (automations, automation_runs, automation_variables, automation_variable_values) across SQLite/PostgreSQL/MySQL (migrations 098/099), registered in the migrate-db CLI table order.

Testing

  • Full Vitest suite green (7,304 passed, 0 failed). New unit suites cover the evaluator, condition evaluator, trigger context, variable resolver/codec, compile/decompile, geofence, Apprise notifyDirect, the engine service (incl. system-event prefilter + geofence + notify-failure), and the simulator.
  • Deterministic no-hardware system tests built on the test endpoint: tests/automation/{lib.sh,test-triggers.sh,test-conditions.sh} — 24 live assertions passing against a deployed container.

Scope / follow-ups (Phase 1b+)

  • MeshCore triggers/fields in the engine; "upgrade available" wired only via the version-check route so far.
  • flow.delay / stateful waits; visual drag-and-drop canvas; remaining system-test batches (flow/map-reduce, actions w/ Apprise stub, lifecycle, MQTT smoke).

🤖 Generated with Claude Code

Yeraze and others added 21 commits June 22, 2026 16:20
…ine (#3653)

Foundation for the user-creatable, global Automation Engine ("Advanced Mode").
This slice is the data layer only — engine, routes, and UI follow.

- Migration 098: create `automations` (global, no sourceId) + `automation_runs`
  (execution log now, stateful run store in Phase 1b) across SQLite/PG/MySQL.
- Drizzle schema + activeSchema/index wiring for both tables.
- AutomationsRepository: CRUD, enabled-only load, run-log create/update/list,
  list-by-status + cancel-active-runs (for the Phase 1b stateful engine).
- Wire repository into DatabaseService; export from the repo barrel.
- Document the global-by-design exception in CLAUDE.md and add the full
  design/plan doc (AUTOMATION_ENGINE_PLAN.md).

Tests: 11 repository tests (real in-memory SQLite) + migration registry
count/sequence updated to 98. All green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Adds the variable registry behind the Automation Engine: define reusable
values once, reference them as {{ var.<name> }} in conditions/actions.

- Migration 099: create `automation_variables` (global definitions:
  name/type/scope/readonly/config) + `automation_variable_values` (per-scope
  values keyed by scopeKey, with `expiresAt` for flag auto-clear) across
  SQLite/PG/MySQL. Count test 98 -> 99.
- Drizzle schema + activeSchema/index wiring.
- AutomationVariablesRepository: definitions CRUD; buildScopeKey() for the four
  scopes (global/source/node/sourceNode); scoped value upsert/clear;
  getEffectiveValue() applies flag TTL (expired reads as absent); pruneExpired()
  sweep. Type-agnostic at the repo layer — encoding/duration live in the engine.
- Wire into DatabaseService + repo barrel.

Types: string|integer|float|boolean|flag. `readonly` marks user-set constants
(thresholds) the engine may read but not write; flags auto-clear after a
duration (anti-spam primitive).

Tests: 10 repository tests (real in-memory SQLite). All green (30 total across
the automation data layer).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Adds the contract shared by the backend engine/routes and the frontend builder.

- src/types/automation.ts: canonical block-type catalog (trigger/condition/
  action/flow unions + category helpers), the { version, nodes[], edges[] } graph
  shape, and validateAutomationGraph() — a dependency-free validator that returns
  structured per-problem errors. Enforces: well-formed shape, unique/known nodes,
  exactly one trigger, valid edges (no dangling refs/self-loops), true/false
  ports only on condition edges, no trigger incoming edges, acyclic (DAG), no
  orphans, plus light per-block param checks.
- Canonical VariableType/VariableScope live here now; the variables repository
  re-exports them to avoid divergence.
- Plan doc: record the hand-written-validator decision (no Zod dependency).

Tests: 14 validator/category tests (happy path incl. If/ElseIf/Else routing +
every rejection case). All green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Pure, DB-free engine internals (new src/server/services/automation/):

- variableCodec.ts: parseVarConfig (tolerant JSON), encodeValue/decodeValue
  (typed <-> stored string for string/integer/float/boolean/flag, rejecting
  non-representable values), and flagExpiry (now + flagDurationSeconds, null when
  no positive duration). Keeps the repository type-agnostic.
- interpolate.ts: interpolate(template, lookup) replaces {{ path }} tokens via a
  caller-supplied lookup (trigger.*/var.*/system vars resolved by the engine);
  unknown paths render empty and a throwing lookup never breaks output.
  extractPaths() lists referenced tokens.

Tests: 15 (codec round-trips + flag expiry; interpolation incl. unknown/throwing
lookups). All green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
The heart of the engine — a topological-activation walk over a validated DAG,
decoupled from mesh IO via injected hooks (evaluateCondition/executeAction/
applySetVar) so all routing logic is unit-tested without a node connection.

- Uniform activation model handles If/ElseIf/Else routing (true/false ports,
  unported = gate), fanout (multi-out), and collapse (ANY/ALL/NONE join) in one
  topological pass.
- Action + setVar errors are caught and recorded — a failure never aborts the
  run. maxActions guard caps executions per run (loop/spam backstop).
- Produces a structured result: activated nodes, condition results, action
  outcomes, and an ordered step log (for the run-log).

Tests: 12 — linear, If/Else both branches, gate, ElseIf cascade, fanout,
collapse ANY/ALL/NONE, action-error isolation, maxActions guard, setVar.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…3653, §5.1)

Pure helpers turning mesh event payloads into the trigger.* field map and the
subject node for variable scope binding:

- buildMessageContext (DbMessage → §5.1 fields incl. derived hops=hopStart-hopLimit,
  isDM/isBroadcast vs 0xFFFFFFFF, snr/rssi), buildNodeContext (changed keys),
  buildTelemetryContext (per reading), buildSystemContext (bootup/connect).
- messageMatchesFilter: the tight portnum/from/to/channel/textContains/regex
  fast-fail the engine applies before graph evaluation (invalid regex = no match).
- resolveTriggerPath: resolves {{ trigger.* }} + system vars (NOW, sourceId,
  timestamp) for interpolation.

Tests: 12 — hop derivation, field mapping, broadcast flags, each context builder,
filter matching, path resolution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…3653, §5.2)

The engine's typed read/write API over user-defined variables:

- getValue: resolves by name, keys from context (subject node/source), applies
  flag TTL, falls back to the configured default when unstored. Unknown var or
  un-resolvable scope → null.
- setValue: rejects readonly constants and non-representable values; for flags a
  truthy value arms with TTL, falsy clears.
- setFlag/clearFlag conveniences; increment for integer/float counters (seeds 0,
  rejects non-numeric types).

Tests: 11 against real in-memory SQLite — default fallback, readonly rejection,
type-representability, per-node + sourceNode scope isolation, missing-context
error, flag arm/expire/clear, counter increment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
… §5)

- engineContext.ts: EngineEvalContext (the Ctx threaded through the graph
  evaluator hooks) bundling trigger fields + variable resolver + scope context +
  clock. Async {{ }} interpolation (interpolateAsync/resolveOperand) that
  pre-resolves every referenced path since var.* needs a DB read; resolveField
  for condition field refs.
- conditionEvaluator.ts: evaluates every condition.* type → boolean, never
  throws. sourceFilter, numeric (with {{ var }} thresholds), string
  (contains/regex/...), variable (is-set + comparison), distance (haversine vs
  coords), timeRange (incl. overnight wrap), logical (AND/OR/NOT recursion).

Tests: 8 condition suites (incl. var-backed thresholds against real SQLite).
Full automation core now 101 tests, all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Turns action.* nodes into concrete IO calls, with all mesh IO behind an
ActionDeps interface so routing/interpolation is tested without a live node:

- action.sendMessage: interpolated text, broadcast (trigger channel) or DM (to
  an explicit node), optional reply-to-trigger.
- action.tapback: emoji reaction, replyId defaults to the triggering packet,
  routed the way the trigger arrived (DM→DM, channel→channel) — mirrors auto-ack.
- action.nodeManage: favorite/unfavorite/ignore/unignore/delete on the subject
  (or explicit) node, with op validation.
- action.notify: interpolated title/body + type for Apprise/webhook.
- Target source = explicit param override else the trigger's source.

Also exposes the triggering packetId in the message context (parsed from the
load-bearing `${sourceId}_${from}_${packetId}` row id) for tapback replyIds.

Tests: 8 executor (fake deps) + 12 trigger-context. All green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Composes the whole pipeline: load enabled automations + validate their graphs,
index by trigger type, and on each event build the trigger context, fast-fail on
the trigger pre-filter, enforce a per-automation cooldown, evaluate the graph,
and write a run-log row (completed/failed). Mesh IO injected via ActionDeps; the
clock is injectable for cooldown/flag tests. flow.setVar handling (set/clear/
flag/increment) lives here. Phase 1a is synchronous (no waiting status).

Event entry points: onMessage / onNode / onTelemetry / onSystem (telemetry
pre-filters by telemetryType).

Tests: 5 end-to-end against real SQLite + fake deps — ping→tapback with run-log,
pre-filter miss, cooldown window, the welcome-once per-node flag anti-spam
pattern, and invalid-config skip-on-load. Full core: 114 tests, all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…rver wiring (#3653)

Makes the engine live and controllable via API:

- Global `automations` permission resource (distinct from the legacy per-source
  `automation`; NOT in SOURCEY_RESOURCES; admin default on).
- meshActionDeps: real ActionDeps wired to MeshtasticManager (sendTextMessage for
  messages/tapbacks, sendFavoriteNode/sendRemoveFavoriteNode/sendIgnoredNode/
  sendRemoveIgnoredNode, deleteNodeAsync). notify throws for now (no clean
  per-source Apprise entry point) → records a failed step, not a crash.
- automationEngineSingleton: process-wide engine, reloadAutomations() for routes,
  and dataEventEmitter subscription (message:new / node:updated / telemetry:batch /
  connection:status → onMessage/onNode/onTelemetry/onSystem), fully guarded.
- /api/automations routes: catalog, variables CRUD, automations CRUD + enable/
  disable + import (lands disabled) + export + runs. Graph validated via
  validateAutomationGraph; mutations reload the engine.
- server.ts: mount routes + startAutomationEngine() on boot.

Backend TS-clean; automation suite still 114 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Adds the global "Automation Engine" tab and management page:

- Register `automations` tab (TabType + VALID_TABS + App.tsx render + Sidebar nav
  + tab permission gate on the global `automations` resource).
- AutomationsPage: list/toggle/delete/export automations; editor with a
  prefilled ping→tapback template and server-side graph validation surfacing
  per-problem errors; run-log viewer; and a Variables management area
  (create/delete with type/scope/readonly/default/flag-duration).

The visual node-graph builder is Phase 2; this JSON-config surface is enough to
create, enable, and verify real automations end-to-end.

Frontend TS-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…rce tab (#3653)

Verification in the deployed container revealed the management UI never appeared:
it was wired as a per-source tab (App.tsx activeTab + Sidebar Configuration
NavItem), but the Automation Engine is a GLOBAL feature (like Map Analysis) and
the per-source tab nav doesn't render in the unified/global view.

- Add a global route `/automations` (sharedProviders(<AutomationsPage/>)) in
  main.tsx, mirroring the `/analysis` Map Analysis route.
- Add a permission-gated "🤖 Automation Engine" link to the global DashboardSidebar
  (next to Map Analysis / Analysis & Reports).
- Add a "← Dashboard" back control to the page (it now renders full-screen).
- Revert the incorrect per-source wiring (App.tsx import/tabPermissions/render
  block, Sidebar NavItem, TabType + VALID_TABS).

Frontend TS-clean (changed files) and Vite-bundles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Foundation for the IFTTT/Maintainerr-style workflow builder (the plan's linear
UI over the graph model):

- catalog.ts: UI metadata for every trigger/condition/action — the param fields
  the builder renders (kind/label/options/help/advanced), plus per-trigger field
  options for conditions.
- compile.ts: compile(form)→linear graph and decompile(graph)→form. Builder model
  is WHEN(trigger) → IF(AND-chain of conditions) → THEN(sequence of actions);
  decompile returns null for branched/fanout/non-linear graphs so the page can
  fall back to the raw-JSON editor for advanced/imported workflows.

Tests: 6 (compile validity + blank-param stripping, round-trip, and the null
fall-back cases). Green, TS-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
#3653)

Addresses UI feedback: crowded/jumbled layout, no help for variable types/scopes,
and the missing actual workflow builder.

- AutomationBuilder: structured WHEN → IF (AND-chain) → THEN editor driven by the
  catalog — trigger/condition/action dropdowns with typed param fields, per-trigger
  field options, variable pickers, add/remove rows. Compiles to the graph model;
  a raw-JSON "Advanced" toggle remains for branched/imported workflows (auto-used
  when a graph can't be decompiled to the linear form).
- AutomationsPage.css: full Catppuccin-themed stylesheet (cards, tabs, buttons,
  form fields, WHEN/IF/THEN sections, help drawer) — fixes the crowded layout.
- Variables: a "?" help drawer explaining every type (incl. flag auto-clear) and
  scope (Global / Per Source / Per Node / Per Source+Node) + constant, with a
  link to meshmonitor.org for full docs.

Frontend TS-clean and Vite-bundles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Caught in UI testing: a condition saved without field/op because a <select>
shows its first option but fires no onChange until changed, so the params stayed
empty and server validation rejected it ("requires params.field/op"). Seed every
select field's default value when a block is created or its type changes (incl.
the dynamic, trigger-derived condition "field" options).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…er (#3653)

Implements the proposed rules + combine model so the form-based builder can
express fanout (map) and collapse (reduce) without a free-form canvas:

- compile/decompile reworked to WorkflowForm { trigger, rules[], combine }:
  * multiple RULES → trigger fans out to one branch per rule (each IF→THEN)
  * optional COMBINE → flow.collapse(ANY/ALL/NONE) joining each rule's tail,
    then its actions (reduce: run if ANY/ALL/NONE of the rules matched)
  * a single rule with no combine still compiles to a plain linear chain (no
    fanout), and decompile recovers both shapes; anything more exotic returns
    null → JSON/canvas fallback. Backward-compatible with existing linear graphs.
- AutomationBuilder: renders the WHEN → RULES (add/remove rule) → FINALLY
  (ANY/ALL/NONE + actions) form; reusable BlockListEditor for conditions/actions.
- AutomationsPage: rules-aware DEFAULT_FORM + validateForm (each rule needs an
  action unless it only feeds the combine).

Tests: 11 compiler tests — linear/fanout/collapse round-trips (incl. ANY/ALL/NONE
and condition-only rules) + the null fall-back cases. Green, TS-clean, Vite-builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…nce, source + system triggers (#3653)

Conditions can now compare far more than the trigger's own event fields:

- Engine: NodeDataProvider hydrates the subject node + its latest telemetry
  during condition evaluation (injected for testability; real impl reads
  nodesRepo.getNode + getLatestTelemetryForTypeAsync). Field resolution is async
  and supports namespaces:
    * node.*        long/short name, role, hopsAway, battery, voltage, position,
                    + calculated node.ageMinutes and node.roleName
    * telemetry.*   latest reading of ANY metric for the subject node
    * event fields  (hops, text, value, …) as before
  condition.distance now uses the hydrated node position.
- Builder: the Number/Text field pickers are grouped <optgroup> selectors (This
  event / Node / Latest telemetry). New conditions surfaced: "Source is one of…"
  (condition.sourceFilter, multi-select of sources) and "Distance from a point".
  New trigger "A system event" (System start / Source online / Source offline);
  the engine now fires `bootup` on startup. ("Upgrade available" flagged as
  coming later.)
- Source list fetched from /api/sources for the source multi-select.

Tests: +6 condition cases (node battery, any-telemetry, node age, longName +
roleName, distance-from-node-position). Full automation suite 121 green, TS-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
Adds trigger.geofence: fires when a node crosses a circular region.

- geo.ts: shared haversineKm + pure geofenceFires(prevInside, nowInside, mode)
  transition logic (baseline never fires; enter=out→in, exit=in→out, dwell=in→in).
  conditionEvaluator now reuses haversineKm.
- Engine: checkGeofences(nodeNum, sourceId) hydrates the node position, computes
  inside/outside per geofence automation, tracks per-(automation,node) state, and
  fires on the configured transition (honoring cooldown). Refactored the
  per-automation evaluation into fireAutomation() shared with runTrigger.
- Wired into the node:updated handler — only runs when latitude/longitude changed.
- triggerContext.buildGeofenceContext (subject node = the moving node, so node.*
  conditions work). New trigger type registered in the shared catalog + builder
  ("A node enters/leaves a region": event + center lat/lon + radius km).

Note: geofence state is in-memory (re-baselines after a restart — first position
update establishes state without firing).

Tests: geo helpers (haversine + all transition cases) + engine enter/exit
scenarios (baseline no-fire, enter once, exit). Automation suite 107 green in
this set; TS-clean; Vite-builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…nt (#3653)

action.notify now dispatches through appriseNotificationService.notifyDirect
(automation-specific, non-user-filtered): resolves per-source→global→env→bundled
Apprise URL, supports optional inline Apprise URL(s) + severity in the builder,
and throws on failure so the run logs a failed step. New notifyDirect unit tests.

Add the 'upgrade-available' trigger.system event, raised (deduped per version)
from /version/check, exposing latestVersion/currentVersion to conditions and
interpolation. Fix a latent bug: trigger.system now prefilters on its `event`
param so a bootup rule no longer fires on source-connected etc.

Register the four global Automation Engine tables in the migrate-db TABLE_ORDER
(automations, automation_runs, automation_variables, automation_variable_values)
so SQLite→PG/MySQL migration covers them.

Add docs/internal/dev-notes/AUTOMATION_ENGINE_SYSTEM_TESTS.md: proposal for a
gated simulate endpoint + a no-hardware system-test matrix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
…est scripts (#3653)

Add POST /api/automations/test (and /:id/test): dry-run a graph against a
synthetic trigger event with NO mesh IO, NO Apprise dispatch and NO persistence.
Recording ActionDeps return fully-resolved params; a stub NodeDataProvider
overlays caller-supplied node facts + latest telemetry on the live DB; a
recording VariableResolver reads real values (with overrides) and records writes.
Returns the full trace { matched, status, steps, conditionResults, actions
(resolvedParams), variableWrites }.

Builder gains an in-app "▶ Test" panel (AutomationTester) — trigger-specific
input form + optional subject-node facts / variable overrides → renders the
execution trace, resolved actions, and simulated variable changes, so authors
preview a rule before saving.

Add deterministic no-hardware system tests built on the endpoint:
tests/automation/{lib.sh,test-triggers.sh,test-conditions.sh}.

Unit tests: automationSimulator.test.ts (7). Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: Automation Engine — Phase 1a

Tasks:

  • Gather context and understand the PR
  • Review types and data model
  • Review engine services (evaluators, executors)
  • Review routes and API
  • Review DB schema and migrations
  • Review frontend components
  • Review test coverage
  • Compile final feedback

This is a substantial, well-architected feature. Overall code quality is high — the dependency-injected engine, the strong type model, the three-DB migration support, and the comprehensive test coverage all reflect mature engineering. The items below are observations and suggestions, not blockers.


Code Quality & Architecture

Strengths:

  • Excellent separation of concerns: AutomationEngineService orchestrates but delegates mesh IO via ActionDeps, graph walking via graphEvaluator, and condition evaluation via conditionEvaluator. Each is independently testable.
  • The validateAutomationGraph function in src/types/automation.ts returns structured errors instead of throwing — good pattern for import UIs.
  • The SimVariableResolver design (record writes, read through to real DB) is clean and makes dry-runs reliable.
  • Topological sort using Kahn's algorithm before walking the DAG is correct and efficient.
  • The getSubjectNode memo (ctx.__nodeP) in engineContext.ts:66 avoids redundant DB lookups across conditions in the same run — nice detail.

Observations:

  1. any casts in the engine (automationEngineService.ts:110, automationEngineService.ts:233, automationEngineService.ts:249):

    const cooldownSeconds = Number((triggerNode.params as any)?.cooldownSeconds ?? 0) || 0;
    const want = (a.triggerNode.params as any)?.telemetryType;

    These are casting params?: Record<string, unknown> to any to read typed sub-fields. Since params is already typed as Record<string, unknown>, you can avoid any with a narrowing helper or a typed interface per trigger param set. This is minor but these appear in several places across the engine services.

  2. Double step push on condition error (graphEvaluator.ts:157-160):

    } catch (e: any) {
      result = false;
      steps.push({ nodeId, type: node.type, outcome: 'condition:false', error: e?.message }); // ← pushed on error
    }
    conditionResults[nodeId] = result;
    steps.push({ nodeId, type: node.type, outcome: result ? 'condition:true' : 'condition:false' }); // ← pushed again unconditionally

    When a condition throws, two steps are pushed for the same node — one with the error and one without. The error details are on the first record but the second (sans error) is also written. Consider guarding the second push with } else { or merging into a single push.

  3. In-memory geofence state leaks (automationEngineService.ts:79):

    private geofenceState = new Map<string, boolean>();

    The geofenceState map grows unbounded as nodes come and go. For deployments with thousands of tracked nodes and many geofence automations, this could accumulate stale entries. A simple LRU or periodic TTL-based eviction would be prudent once the feature gets broader use.

  4. lastFired map also unbounded (automationEngineService.ts:77):
    Similar concern — every automation that ever fires adds an entry keyed by automationId. This is less risky (bounded by automation count) but worth noting for completeness.


Potential Bugs

  1. Cooldown fires before the run succeeds (automationEngineService.ts:169-172):

    if (!this.cooledDown(a, now)) continue;
    this.lastFired.set(a.id, now);   // ← recorded before the await
    fired++;
    await this.fireAutomation(a, ctx, now);

    The cooldown timestamp is stamped before fireAutomation is awaited. If the run throws (caught inside fireAutomation) or if it writes a failed run-log, the cooldown still advances as if it succeeded. This is a deliberate tradeoff (prevents retry storms), but it means a flapping source that repeatedly throws will eat the full cooldown. Worth a comment clarifying this is intentional.

  2. trigger.nodeDiscovered is never actually emitted (automationEngineSingleton.ts:103-113):

    case 'node:updated': {
      // ...
      await e.onNode('trigger.nodeUpdated', nodeNum, changed, sourceId);

    The comment says "Discovered vs updated detection (isNew) is deferred to a later phase" — both node:updated events are emitted as nodeUpdated. Any automation with a trigger.nodeDiscovered trigger will therefore never fire. This should be documented more prominently in the UI or the trigger type should be greyed out until Phase 1b.

  3. regex condition is untested against ReDoS (conditionEvaluator.ts:41, triggerContext.ts:172):

    case 'regex':
      try { return new RegExp(b).test(a); } catch { return false; }

    And in the trigger pre-filter:

    re = new RegExp(params.regex);

    User-supplied regex patterns are compiled and evaluated on every matching message/event. A pathological pattern (ReDoS) can block the Node.js event loop. Consider using a timeout-based approach or a linear-time regex library for user-supplied patterns, or at minimum add a length cap.

  4. condition.logical recurses with raw sub-nodes (conditionEvaluator.ts:134):

    const subs = Array.isArray(p.conditions) ? (p.conditions as AutomationNode[]) : [];

    The p.conditions array is pulled straight from params (unvalidated Record<string, unknown>). The graph validator (validateAutomationGraph) does not validate nested condition.logical sub-nodes — they're invisible to the DAG. A deeply nested or circular logical structure could cause a stack overflow. A depth limit and type-guard here would make this safer.

  5. action.sendMessage missing channel bound-check (actionExecutor.ts:61):

    const channel = p.channel != null ? Number(p.channel) : triggerChannel;

    No validation that channel is in [0, 7] (Meshtastic channel range). An out-of-range channel would be silently forwarded to the manager. Minor but worth a clamp or validation.


Security Concerns

  1. POST /api/automations/test accepts arbitrary configs (automationRoutes.ts:147):
    This endpoint parses and validates the config graph but then calls simulateAutomation with liveData: createMeshNodeDataProvider() — meaning it queries the live database for node/telemetry data even during a dry-run. This is appropriate design, but note that a user with automations:write can probe any node's data by crafting a condition.numeric { field: "node.batteryLevel", nodeNum: X } and checking if the condition returns true/false. It's effectively an enumeration side-channel. Acceptable given the write permission gate, but worth being aware of.

  2. No size limit on config or log columns: An automation config that generates a flow.fanout with 50 rules, each with 50 actions, stays within maxActions: 50 but can still produce a very large log JSON blob per run. The automation_runs.log column is TEXT with no length check at the application layer. This isn't a security issue per se but could bloat the DB on prolific automations.

  3. Apprise URL injection via interpolation (actionExecutor.ts:91-92):

    const rawUrls = typeof p.urls === 'string' ? await interpolateAsync(p.urls, ctx) : '';
    const urls = rawUrls.split(/[\n,]/).map((u) => u.trim()).filter((u) => u.length > 0);

    A {{ trigger.text }} in the urls field lets a mesh message inject arbitrary Apprise service URLs. A bad actor who can send messages on a monitored channel could craft a payload that redirects notifications to an attacker-controlled endpoint. Consider stripping {{ }} interpolation from the urls field or limiting it to {{ var.* }} paths only.


Performance

  1. Every trigger event hits all enabled automations (automationEngineService.ts:162):
    The pre-filters (message regex, telemetry type, system event) are cheap, but getSubjectNode / getTelemetry go to the database. For high-traffic deployments (MQTT relay with thousands of messages/second), each message fires listEnabledAutomations ... wait, no — the index is cached in memory. That's fine. But every onMessage call that passes the pre-filter will run fireAutomation, which awaits evaluateGraph, which calls getSubjectNode (DB query) if any condition references node.*. Consider lazy-loading: only hydrate the subject node when a condition actually needs it (which getSubjectNode's memo already does, but the DB call is always synchronous with the event handling path). This is fine for typical deployments; just note it for MQTT-scale.

  2. reloadAutomations() on every CRUD mutation (automationRoutes.ts:220, 260, 271, 282, 296):
    Each create/update/enable/disable/delete calls reloadAutomations(), which re-queries all enabled automations and rebuilds the entire index. For large automation sets (hundreds), this is a full-table scan per mutation. An incremental update (add/remove individual entries) would scale better. Not an issue for the expected Phase 1a scale, but a future consideration.


Test Coverage

Strengths:

  • 7,304 tests passing; dedicated suites for evaluator, conditions, triggers, variables, geofence, simulator, and Apprise.
  • Deterministic system test harness against the live container is an excellent approach.

Gaps:
15. No test for the double-step-push bug (point 2 above) — add a test asserting steps.length === 1 when a condition throws.
16. condition.logical deep nesting — no test for recursion depth; a depth=3+ nested logical with circular reference isn't covered.
17. trigger.nodeDiscovered never fires — no test asserting this edge case; a comment or a skipped test would document the known gap.
18. ReDoS guard — no tests for pathological regex patterns in condition.string or message trigger pre-filters.


Minor / Nits

  1. automationRoutes.ts:86 UNIQUE detection is fragile:

    if (String(error?.message).includes('UNIQUE')) {

    This works for SQLite but PostgreSQL error messages say "duplicate key value violates unique constraint" (no "UNIQUE"). MySQL says "Duplicate entry". If the automation_variables.name unique constraint is ever violated on PostgreSQL or MySQL, a 409 won't be returned — it'll fall through to a 500. Either use a consistent error code check or catch per-driver.

  2. automationEngineSingleton.ts module-level mutable state (let engine, let subscribed, let lastUpgradeNotified): This is a standard singleton pattern for Node.js, but subscribed is a process-global flag that means calling startAutomationEngine() twice in a test will silently not re-subscribe. The idempotency guard is correct but unit tests that call startAutomationEngine multiple times in the same process will share this state.

  3. AutomationsPage.tsx:87 uses confirm() and alert(): Browser-native modals block the main thread and are unavailable in some embedded/WebView contexts. The existing app codebase likely has a toast/modal component — using that would be more consistent.

  4. compile.ts node IDs are positional and not UUIDs (e.g. 't', 'c0', 'r0c0'): These are deterministic and human-readable, which is nice for debugging and the test suite. However, an imported and re-saved automation will re-compile with the same IDs, making it impossible to distinguish nodes in the run log by ID alone if two automations use the same graph shape. Not a bug, but worth noting.


Summary: This is a production-quality Phase 1a implementation. The architecture is sound, the test coverage is thorough, and the design decisions (global automations, source-filter block for scoping, dry-run simulator) are well-reasoned. The most actionable items before a 1.0 stamp would be the double-step bug (#2), the Apprise URL injection via interpolation (#12), the trigger.nodeDiscovered dead trigger (#6), and the UNIQUE error detection across DBs (#19). The ReDoS concern (#7) is the one worth tracking as a security item.

Comment thread src/server/services/automation/graphEvaluator.ts Fixed
Comment thread src/server/services/automation/triggerContext.ts Fixed
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Claude encountered an error —— View job


I'll analyze this and get back to you.

… findings (#3653)

CodeQL:
- js/regex-injection: compile user-supplied regex with RE2 (compileUserRegex,
  linear-time, ReDoS-immune) instead of `new RegExp` in the message-trigger
  pre-filter and the condition.string regex op.
- js/remote-property-injection: the graph evaluator keyed conditionResults by
  config-derived node ids on a plain object; use a Map internally and materialise
  the return via Object.fromEntries (no prototype-pollution sink).

Review:
- Graph evaluator pushed two steps when a condition threw — now a single step
  carrying the error.
- action.notify `urls` interpolation restricted to {{ var.* }} (varsOnly) so a
  mesh-controlled {{ trigger.* }} can't inject a notification target.
- Variable-create UNIQUE-violation detection now covers PostgreSQL/MySQL
  ("duplicate") not just SQLite ("UNIQUE").
- trigger.nodeDiscovered description notes new-vs-updated detection is deferred.

Tests: evaluator single-step-on-throw + adversarial-node-id prototype-safety;
notify urls security (trigger.* stripped). Automation suite green (107).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VBLhGGNh35oMwTL53va1Y5
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: Automation Engine — Phase 1a

Tasks:

  • Gather context and understand the PR
  • Review engine services (evaluators, trigger context, action executor)
  • Review routes and API security
  • Review DB schema and migrations
  • Review frontend components
  • Review test coverage
  • Compile final feedback

This is a substantial, well-architected feature. The architecture is sound, the CodeQL concerns from the initial review have been resolved in the code, and the test coverage is thorough. Below is my independent review.


What's Been Addressed Well (compared to initial concerns)

ReDoS (CodeQL alert)triggerContext.ts:174 and conditionEvaluator.ts:43 both now use compileUserRegex() backed by RE2 (linear-time regex engine). The safeRegex.ts module clearly documents the trade-off and resolves the CodeQL js/regex-injection alert. ✅

Property injection (CodeQL alert)graphEvaluator.ts:111-195 uses a Map<string, boolean> for conditionResults internally, and serializes to a plain object via Object.fromEntries only on return. The comment explicitly references the __proto__/constructor injection risk. ✅

Double-step push bug — The previous review flagged a double push on condition error. Reading the actual code at graphEvaluator.ts:155-165, this has been fixed: a single steps.push handles both the success and error case using spread syntax (...(error ? { error } : {})). ✅

Apprise URL injectionactionExecutor.ts:92 uses interpolateAsync(p.urls, ctx, { varsOnly: true }) which, per engineContext.ts:94, blocks non-var.* paths. trigger.* cannot be injected into Apprise URLs. ✅

UNIQUE error detection across DBsautomationRoutes.ts:88-89 lowercases the error message and checks for both 'unique' and 'duplicate', covering SQLite, PostgreSQL, and MySQL. ✅


Remaining Issues

1. trigger.nodeDiscovered is dead code (automationEngineSingleton.ts:103-108)

case 'node:updated': {
  await e.onNode('trigger.nodeUpdated', nodeNum, changed, sourceId);
  // trigger.nodeDiscovered is never emitted
}

Both node:updated events are dispatched as trigger.nodeUpdated. Any automation with trigger.nodeDiscovered will silently never fire. The UI's catalog (catalog.ts) exposes this trigger type to users with no indication it's inactive. This should either:

  • Be clearly documented/greyed out in the UI
  • Have a skipped/todo test documenting the gap

2. any casts for trigger params (automationEngineService.ts:110, 233, 249)

const cooldownSeconds = Number((triggerNode.params as any)?.cooldownSeconds ?? 0) || 0;
const want = (a.triggerNode.params as any)?.telemetryType;

params is typed as Record<string, unknown>, so (params as any) is unnecessary — params?.cooldownSeconds or params?.['cooldownSeconds'] directly reads the value with the same runtime behavior but avoids suppressing type safety. Minor but appears three times.

3. Unbounded in-memory geofence state (automationEngineService.ts:79)

private geofenceState = new Map<string, boolean>();

Keys are ${automationId}:${nodeNum}. In high-traffic MQTT deployments, with thousands of nodes and multiple geofence automations, this map grows without bound. Stale entries for deleted nodes/automations are never evicted. Acceptable for Phase 1a, but worth a // TODO: evict comment at minimum.

4. condition.logical depth not capped (conditionEvaluator.ts:134-143)

case 'condition.logical': {
  const subs = Array.isArray(p.conditions) ? (p.conditions as AutomationNode[]) : [];
  // ... recursive evaluateCondition(subs[0], ctx)

Nested condition.logical sub-nodes are not DAG nodes — they're raw params.conditions arrays outside the validator's view. A pathologically deep or circular structure causes unbounded recursion. The graph validator (validateAutomationGraph) doesn't inspect these. A depth cap of ~5 levels would prevent stack overflow from adversarial/malformed configs.

5. action.sendMessage channel not clamped (actionExecutor.ts:61)

const channel = p.channel != null ? Number(p.channel) : triggerChannel;

No validation that channel ∈ [0, 7]. An out-of-range value is silently forwarded to the manager. A Math.max(0, Math.min(7, channel)) clamp would be defensive.

6. No foreign key between automation_runs.automationId and automations.id (migrations 098)

Neither the SQLite, PostgreSQL, nor MySQL migration creates an FK constraint between automation_runs.automationId and automations.id. Orphaned run rows will accumulate when automations are deleted (the repository's deleteAutomation doesn't cascade-delete runs). This is probably deliberate for performance/flexibility, but it means listRuns can return rows for deleted automations with no corresponding parent.

7. reloadAutomations() on every mutation is a full rebuild (automationRoutes.ts:223, 263, 274, 285, 296)

Every create/update/enable/disable/delete rebuilds the entire automation index from a full table scan. For Phase 1a scale this is fine. Worth noting for future work with larger automation sets.

8. confirm() / alert() in UI (AutomationsPage.tsx:87, 91)

const remove = async (a: Automation) => { if (!confirm(...)) return; ...
alert('Exported JSON copied to clipboard.');

Browser-native confirm()/alert() block the main thread and are unavailable in some embedded/WebView contexts. The existing app appears to have toast/modal patterns — using those would be more consistent. Low priority nit.

9. trigger.schedule is declared but never wired (automationEngineSingleton.ts, automationEngineService.ts)

trigger.schedule appears in TRIGGER_TYPES and the catalog but there's no cron/interval emitter anywhere in the singleton. Like trigger.nodeDiscovered, a user creating a schedule automation will have it silently never fire. Same recommendation: grey it out in the UI or add a visible "coming soon" label.


Test Coverage Assessment

Strong areas:

  • graphEvaluator.test.ts — 281 lines, covers fanout/collapse/action capping/error handling
  • conditionEvaluator.test.ts — covers numeric, string, distance, time range, variable conditions
  • automationSimulator.test.ts — end-to-end dry-run coverage
  • automationEngineService.test.ts — covers geofence, system prefilter, cooldown
  • variableResolver.test.ts + variableCodec.test.ts — thorough codec/resolver coverage

Gaps:

  • No test asserting trigger.nodeDiscovered never fires (documents the known limitation)
  • No test for condition.logical with depth > 2 (recursion safety)
  • No test for trigger.schedule being a no-op
  • graphEvaluator.test.ts could add a case asserting steps.length === 1 when a condition throws (verify no double-push regression)

Summary

The implementation is production-quality for Phase 1a. The critical security concerns from the initial review (ReDoS, property injection, Apprise URL injection) have all been properly addressed. The most actionable remaining items before a 1.0 stamp:

  1. trigger.nodeDiscovered + trigger.schedule dead triggers — UI should indicate these are inactive (feat: telemetry parsing, direct messages, danger zone, and UX improvements #6, feat: Add GitHub Container Registry publishing and pre-built images #9)
  2. condition.logical depth limit — add a recursion guard (feat: traceroute request tracking and auth documentation #4)
  3. Run log sizeautomation_runs.log has no application-level size cap; prolific automations can generate large blobs
  4. any casts — minor cleanup (feat: traceroute highlighting and UI improvements #2)

@Yeraze Yeraze merged commit 789d269 into main Jun 23, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants