Skip to content

Implement rotating JSONL recorder for persistent logging#10428

Merged
thebentern merged 16 commits into
masterfrom
log-shipping
May 10, 2026
Merged

Implement rotating JSONL recorder for persistent logging#10428
thebentern merged 16 commits into
masterfrom
log-shipping

Conversation

@thebentern

Copy link
Copy Markdown
Contributor

No description provided.

@thebentern thebentern requested a review from Copilot May 9, 2026 00:04
@thebentern thebentern added the enhancement New feature or request label May 9, 2026
@github-actions github-actions Bot added the needs-review Needs human review label May 9, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an always-on, process-global persistent recorder in the MCP server that captures Meshtastic pubsub events into rotating JSONL streams under mcp-server/.mtlog/, and exposes read/query tools (logs/telemetry/packets/events windows, export, pause/resume) plus optional build-time macro injection for leak-hunting workflows.

Changes:

  • Add recorder write-side (rotating JSONL writer, log/telemetry/packet parsing, pubsub subscriptions) and read-side query helpers (log_query.py) with new MCP tools in server.py.
  • Add build_flags support for flash.build() / flash.flash() propagated via PLATFORMIO_BUILD_FLAGS and pio.run(extra_env=...).
  • Add unit tests for recorder/query behavior and build-flag propagation, plus a Datadog forwarder + dashboard JSON and updated agent commands.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
mcp-server/tests/unit/test_recorder.py New unit tests covering recorder parsing, heap synthesis, rotation, and query APIs.
mcp-server/tests/unit/test_build_flags.py New unit tests for build_flagsPLATFORMIO_BUILD_FLAGS translation and propagation to pio.run.
mcp-server/src/meshtastic_mcp/server.py Starts recorder on import and adds MCP tools for querying/exporting recorder streams; adds build_flags args to build/flash tools.
mcp-server/src/meshtastic_mcp/serial_session.py Publishes serial monitor lines to pubsub so the recorder can capture text-mode output.
mcp-server/src/meshtastic_mcp/recorder/rotating.py New size-capped rotating JSONL writer with gzip archives + pruning.
mcp-server/src/meshtastic_mcp/recorder/recorder.py New process-global Recorder singleton that subscribes to pubsub topics and writes logs/telemetry/packets/events streams.
mcp-server/src/meshtastic_mcp/recorder/parsers.py New best-effort parsing for firmware log line prefixes and telemetry variants.
mcp-server/src/meshtastic_mcp/recorder/init.py New package entrypoint exporting Recorder / get_recorder.
mcp-server/src/meshtastic_mcp/pio.py Adds extra_env support to subprocess execution for build-time env injection.
mcp-server/src/meshtastic_mcp/log_query.py New streaming queries over recorder JSONL streams (windows, telemetry downsampling/slope, export).
mcp-server/src/meshtastic_mcp/flash.py Adds build_flags support to build/flash and threads flags through to pio.run(extra_env=...).
mcp-server/scripts/mtlog_to_datadog.py New forwarder shipping .mtlog logs/telemetry to Datadog with cursor-based resume.
mcp-server/scripts/datadog-dashboard.json New sample Datadog dashboard definition for recorder-derived metrics/logs.
mcp-server/.gitignore Ignores .mtlog/ recorder output and cursor file.
.claude/commands/repro.md Updates repro workflow to export per-attempt recorder slices.
.claude/commands/leakhunt.md New leakhunt workflow documentation leveraging recorder tools and heap slope.
.claude/commands/diagnose.md Updates diagnose workflow to include recorder-based error + heap trend checks.

Comment thread mcp-server/src/meshtastic_mcp/recorder/recorder.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/recorder/recorder.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/recorder/rotating.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/log_query.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/log_query.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/log_query.py
Comment thread mcp-server/src/meshtastic_mcp/log_query.py
Comment thread mcp-server/src/meshtastic_mcp/recorder/parsers.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/recorder/__init__.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Comment thread mcp-server/src/meshtastic_mcp/recorder/rotating.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/log_query.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/log_query.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/recorder/parsers.py Outdated
Comment thread mcp-server/scripts/mtlog_to_datadog.py
Comment thread mcp-server/scripts/mtlog_to_datadog.py Outdated
@thebentern

Copy link
Copy Markdown
Contributor Author

@copilot apply changes based on the comments in this thread

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

Copilot AI commented May 9, 2026

Copy link
Copy Markdown
Contributor

@copilot apply changes based on the comments in this thread

Applied the remaining review-thread fixes in 13d0a16: archive pruning now follows filename chronology, _iter_jsonl() short-circuits per file, the UTC/uptime/docs are consistent, the Datadog forwarder docs now match its live-file behavior, and I added focused regression coverage plus a small recorder lock fix needed to get the unit suite passing.

thebentern and others added 2 commits May 8, 2026 20:13
The previous "Fixes" commit added `_files_snapshot()` which acquires
`self._lock` so handlers don't race with `stop()` clearing `_files`.
But two callers were already holding `self._lock` when they invoked
methods that go through the snapshot:

  - `start()` writes the `recorder_start` event from inside its `with
    self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires
    the same non-reentrant `threading.Lock`, freezing process startup.

  - `force_rotate_all()` calls `self.status()` (which also acquires
    `self._lock`) while still holding the lock from rotating each file.

Both fixes release the lock before the call. The recorder_start marker
still lands in events.jsonl because the started/started_at flags are
already set when we write it.

Verified end-to-end against the standalone /tmp/verify_pr_fixes.py
harness — all 9 PR review-comment fixes pass, including pause/resume
event ordering and concurrent start/stop without KeyError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread mcp-server/src/meshtastic_mcp/server.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/log_query.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/recorder/recorder.py Outdated
Comment thread mcp-server/src/meshtastic_mcp/recorder/recorder.py Outdated
@thebentern

Copy link
Copy Markdown
Contributor Author

@copilot apply changes based on the comments in this thread

Copilot AI and others added 6 commits May 9, 2026 18:38
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

Copilot AI commented May 9, 2026

Copy link
Copy Markdown
Contributor

@copilot apply changes based on the comments in this thread

Applied the latest review-thread fixes in 7714338: recorder startup is now best-effort in server.py, invalid grep regexes return a clear ValueError, the stale RLock comment was corrected, and pubsub subscription failures are now logged instead of being silently swallowed. Validated with .venv/bin/python -m pytest tests/unit/test_recorder.py -q (48 passed).

@thebentern thebentern marked this pull request as ready for review May 9, 2026 23:58
@thebentern thebentern merged commit f6a954b into master May 10, 2026
79 of 80 checks passed
@thebentern thebentern deleted the log-shipping branch May 10, 2026 14:53
balya pushed a commit to balya/meshtastic-firmware that referenced this pull request May 14, 2026
…10428)

* Implement rotating JSONL recorder for persistent logging

* Fixes

* Update documentation and clean up imports in command files

* Address remaining recorder review feedback

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* recorder: fix lock re-entry deadlock on start() and force_rotate_all()

The previous "Fixes" commit added `_files_snapshot()` which acquires
`self._lock` so handlers don't race with `stop()` clearing `_files`.
But two callers were already holding `self._lock` when they invoked
methods that go through the snapshot:

  - `start()` writes the `recorder_start` event from inside its `with
    self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires
    the same non-reentrant `threading.Lock`, freezing process startup.

  - `force_rotate_all()` calls `self.status()` (which also acquires
    `self._lock`) while still holding the lock from rotating each file.

Both fixes release the lock before the call. The recorder_start marker
still lands in events.jsonl because the started/started_at flags are
already set when we write it.

Verified end-to-end against the standalone /tmp/verify_pr_fixes.py
harness — all 9 PR review-comment fixes pass, including pause/resume
event ordering and concurrent start/stop without KeyError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix markdown linting issues in leakhunt.md and repro.md

* Handle recorder startup and query review fixes

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Tighten recorder follow-up tests

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Stabilize recorder startup tests

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Remove brittle recorder startup test

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Polish recorder follow-up errors

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Refine recorder startup and regex errors

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Clean up recorder follow-up nits

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Trunk

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Evil8it pushed a commit to Evil8it/ME4TACTNK that referenced this pull request Jun 10, 2026
…10428)

* Implement rotating JSONL recorder for persistent logging

* Fixes

* Update documentation and clean up imports in command files

* Address remaining recorder review feedback

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* recorder: fix lock re-entry deadlock on start() and force_rotate_all()

The previous "Fixes" commit added `_files_snapshot()` which acquires
`self._lock` so handlers don't race with `stop()` clearing `_files`.
But two callers were already holding `self._lock` when they invoked
methods that go through the snapshot:

  - `start()` writes the `recorder_start` event from inside its `with
    self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires
    the same non-reentrant `threading.Lock`, freezing process startup.

  - `force_rotate_all()` calls `self.status()` (which also acquires
    `self._lock`) while still holding the lock from rotating each file.

Both fixes release the lock before the call. The recorder_start marker
still lands in events.jsonl because the started/started_at flags are
already set when we write it.

Verified end-to-end against the standalone /tmp/verify_pr_fixes.py
harness — all 9 PR review-comment fixes pass, including pause/resume
event ordering and concurrent start/stop without KeyError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix markdown linting issues in leakhunt.md and repro.md

* Handle recorder startup and query review fixes

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Tighten recorder follow-up tests

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Stabilize recorder startup tests

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Remove brittle recorder startup test

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Polish recorder follow-up errors

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Refine recorder startup and regex errors

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Clean up recorder follow-up nits

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Trunk

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants