Implement rotating JSONL recorder for persistent logging#10428
Conversation
There was a problem hiding this comment.
Pull request overview
Implements an always-on, process-global persistent recorder in the MCP server that captures Meshtastic pubsub events into rotating JSONL streams under mcp-server/.mtlog/, and exposes read/query tools (logs/telemetry/packets/events windows, export, pause/resume) plus optional build-time macro injection for leak-hunting workflows.
Changes:
- Add recorder write-side (rotating JSONL writer, log/telemetry/packet parsing, pubsub subscriptions) and read-side query helpers (
log_query.py) with new MCP tools inserver.py. - Add
build_flagssupport forflash.build()/flash.flash()propagated viaPLATFORMIO_BUILD_FLAGSandpio.run(extra_env=...). - Add unit tests for recorder/query behavior and build-flag propagation, plus a Datadog forwarder + dashboard JSON and updated agent commands.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| mcp-server/tests/unit/test_recorder.py | New unit tests covering recorder parsing, heap synthesis, rotation, and query APIs. |
| mcp-server/tests/unit/test_build_flags.py | New unit tests for build_flags → PLATFORMIO_BUILD_FLAGS translation and propagation to pio.run. |
| mcp-server/src/meshtastic_mcp/server.py | Starts recorder on import and adds MCP tools for querying/exporting recorder streams; adds build_flags args to build/flash tools. |
| mcp-server/src/meshtastic_mcp/serial_session.py | Publishes serial monitor lines to pubsub so the recorder can capture text-mode output. |
| mcp-server/src/meshtastic_mcp/recorder/rotating.py | New size-capped rotating JSONL writer with gzip archives + pruning. |
| mcp-server/src/meshtastic_mcp/recorder/recorder.py | New process-global Recorder singleton that subscribes to pubsub topics and writes logs/telemetry/packets/events streams. |
| mcp-server/src/meshtastic_mcp/recorder/parsers.py | New best-effort parsing for firmware log line prefixes and telemetry variants. |
| mcp-server/src/meshtastic_mcp/recorder/init.py | New package entrypoint exporting Recorder / get_recorder. |
| mcp-server/src/meshtastic_mcp/pio.py | Adds extra_env support to subprocess execution for build-time env injection. |
| mcp-server/src/meshtastic_mcp/log_query.py | New streaming queries over recorder JSONL streams (windows, telemetry downsampling/slope, export). |
| mcp-server/src/meshtastic_mcp/flash.py | Adds build_flags support to build/flash and threads flags through to pio.run(extra_env=...). |
| mcp-server/scripts/mtlog_to_datadog.py | New forwarder shipping .mtlog logs/telemetry to Datadog with cursor-based resume. |
| mcp-server/scripts/datadog-dashboard.json | New sample Datadog dashboard definition for recorder-derived metrics/logs. |
| mcp-server/.gitignore | Ignores .mtlog/ recorder output and cursor file. |
| .claude/commands/repro.md | Updates repro workflow to export per-attempt recorder slices. |
| .claude/commands/leakhunt.md | New leakhunt workflow documentation leveraging recorder tools and heap slope. |
| .claude/commands/diagnose.md | Updates diagnose workflow to include recorder-based error + heap trend checks. |
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Applied the remaining review-thread fixes in 13d0a16: archive pruning now follows filename chronology, |
The previous "Fixes" commit added `_files_snapshot()` which acquires
`self._lock` so handlers don't race with `stop()` clearing `_files`.
But two callers were already holding `self._lock` when they invoked
methods that go through the snapshot:
- `start()` writes the `recorder_start` event from inside its `with
self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires
the same non-reentrant `threading.Lock`, freezing process startup.
- `force_rotate_all()` calls `self.status()` (which also acquires
`self._lock`) while still holding the lock from rotating each file.
Both fixes release the lock before the call. The recorder_start marker
still lands in events.jsonl because the started/started_at flags are
already set when we write it.
Verified end-to-end against the standalone /tmp/verify_pr_fixes.py
harness — all 9 PR review-comment fixes pass, including pause/resume
event ordering and concurrent start/stop without KeyError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Applied the latest review-thread fixes in 7714338: recorder startup is now best-effort in |
…10428) * Implement rotating JSONL recorder for persistent logging * Fixes * Update documentation and clean up imports in command files * Address remaining recorder review feedback Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * recorder: fix lock re-entry deadlock on start() and force_rotate_all() The previous "Fixes" commit added `_files_snapshot()` which acquires `self._lock` so handlers don't race with `stop()` clearing `_files`. But two callers were already holding `self._lock` when they invoked methods that go through the snapshot: - `start()` writes the `recorder_start` event from inside its `with self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires the same non-reentrant `threading.Lock`, freezing process startup. - `force_rotate_all()` calls `self.status()` (which also acquires `self._lock`) while still holding the lock from rotating each file. Both fixes release the lock before the call. The recorder_start marker still lands in events.jsonl because the started/started_at flags are already set when we write it. Verified end-to-end against the standalone /tmp/verify_pr_fixes.py harness — all 9 PR review-comment fixes pass, including pause/resume event ordering and concurrent start/stop without KeyError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix markdown linting issues in leakhunt.md and repro.md * Handle recorder startup and query review fixes Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Tighten recorder follow-up tests Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Stabilize recorder startup tests Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Remove brittle recorder startup test Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Polish recorder follow-up errors Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Refine recorder startup and regex errors Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Clean up recorder follow-up nits Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Trunk --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…10428) * Implement rotating JSONL recorder for persistent logging * Fixes * Update documentation and clean up imports in command files * Address remaining recorder review feedback Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * recorder: fix lock re-entry deadlock on start() and force_rotate_all() The previous "Fixes" commit added `_files_snapshot()` which acquires `self._lock` so handlers don't race with `stop()` clearing `_files`. But two callers were already holding `self._lock` when they invoked methods that go through the snapshot: - `start()` writes the `recorder_start` event from inside its `with self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires the same non-reentrant `threading.Lock`, freezing process startup. - `force_rotate_all()` calls `self.status()` (which also acquires `self._lock`) while still holding the lock from rotating each file. Both fixes release the lock before the call. The recorder_start marker still lands in events.jsonl because the started/started_at flags are already set when we write it. Verified end-to-end against the standalone /tmp/verify_pr_fixes.py harness — all 9 PR review-comment fixes pass, including pause/resume event ordering and concurrent start/stop without KeyError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix markdown linting issues in leakhunt.md and repro.md * Handle recorder startup and query review fixes Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Tighten recorder follow-up tests Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Stabilize recorder startup tests Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Remove brittle recorder startup test Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Polish recorder follow-up errors Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Refine recorder startup and regex errors Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Clean up recorder follow-up nits Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Trunk --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No description provided.