Skip to content

feat(training-export): overhaul trigger system and message conversion#79703

Draft
wzhgba wants to merge 1 commit into
openclaw:mainfrom
SenseTime-FVG:wuzehuan/training-export
Draft

feat(training-export): overhaul trigger system and message conversion#79703
wzhgba wants to merge 1 commit into
openclaw:mainfrom
SenseTime-FVG:wuzehuan/training-export

Conversation

@wzhgba

@wzhgba wzhgba commented May 9, 2026

Copy link
Copy Markdown

Draft / work-in-progress — this PR is under active development. Feedback welcome on the overall direction.

Summary

Introduce a trajectory-first, trigger-driven training export system that produces episode-level JSONL data from the OpenClaw runtime — no offline reconstruction, no separate pipeline. The system is opt-in (trainingExport.enabled: true) and writes to:

~/.openclaw/training-export/episodes.jsonl

Each line is a self-contained training sample: a task episode (full agent turn with system prompt, messages, tools, metadata) or a compact-summary episode (compression prompt → summary pair for RL compaction training).

Relationship to Existing Systems

/export-trajectory (human-facing debug bundles)

The existing /export-trajectory command (docs at docs/tools/trajectory.md) produces redacted interactive support bundles for human debugging — prompt timelines, tool traces, transcript snapshots, usage metadata. It is triggered manually by users or support staff.

The training export introduced here is complementary and non-overlapping:

/export-trajectory Training Export
Purpose Human debugging, support Machine training data
Trigger Manual command Automatic (compaction, reset, export command)
Output Redacted bundle directory JSONL episodes
Format Directory of text/markdown files One JSON line per episode
Privacy Redacted (best-effort) Full content (machine-consumed; opt-in)
Audience Developers, support RL training pipelines

Both systems read from the same trajectory (trajectory capture / cache-trace). Training export simply produces a different output format for a different consumer, alongside the existing mechanism.

Compaction subsystem

Training export hooks into the Pi SDK compaction lifecycle (session_before_compact + session_compact) to capture:

  • Pre-compaction context (task episode): the full conversation before compression
  • Post-compaction summary (summary episode): the prompt sent to the summarization model + the summary it produced, with compaction metadata (tokensBefore, firstKeptEntryId, fromExtension)

This is the same data the compaction system already computes internally — training export just persists it in a structured training format before it is discarded.

Key Design Decisions

1. Trajectory-first

All training fields (system prompt, messages, tools, model metadata) come from runtime trajectory context.compiled events. Message and tool conversion delegates to the Pi SDK / provider layer (convertMessages from @mariozechner/pi-ai/openai-completions).

2. Unified compaction hook

A single Pi SDK extension (session_before_compact + session_compact) handles all compaction modes (default, safeguard, manual, overflow, timeout). No runTrainingExport calls scattered across individual compaction paths.

3. Pair-export guarantee

For compaction-triggered exports, task and summary episodes are built as a batch. If either is filtered by quality checks, the entire batch is discarded — no orphaned episodes.

4. Config-gated at every call site

getTrainingExportConfig(cfg)?.enabled === true is checked at all three entry points (extension registration, session reset, trajectory export command), so reviewers can see the opt-in gating logic without digging into implementation details.

5. compactionSummary bridging

Pi SDK's convertToLlm converts compactionSummaryuser messages, but the upstream convertMessages from @mariozechner/pi-ai/openai-completions does not handle the compactionSummary role. A pre-processing step (sharing a single map with thinking-block stripping) mirrors Pi SDK's conversion format before handing off to the upstream converter.

6. Training-quality message filtering (all triggers)

Training episodes must end with a complete assistant message — regardless of trigger type. Any snapshot (compaction, reset, or trajectory export) may end mid-turn at a non-assistant message (e.g. toolResult). Trailing non-assistant messages are trimmed from every trigger's output. The trainExampleMessagesAreUsable check requires ≥1 user + ≥1 assistant; if trimming leaves the episode unusable, it is discarded entirely. This is a universal training-data quality requirement, not a compaction-specific behavior.

7. Reset export is independent of plugin hooks

The before_reset training export call is placed outside emitGatewayBeforeResetPluginHook, so it fires regardless of whether any before_reset plugin hooks are registered.

8. Private file permissions

The export directory (~/.openclaw/training-export) and JSONL file are created with private filesystem modes (0o700 / 0o600) to prevent world-readable access to training data.

Files Changed

File Change
src/training-export.ts New — core module: snapshot collection, episode construction, JSONL I/O, prompt constants, compaction extension
src/training-export.test.ts New — test suite
docs/training-export.md New — formal feature documentation
src/config/types.openclaw.ts Add trainingExport config type (enabled, compat)
src/config/zod-schema.ts Add trainingExport schema
src/config/schema.help.ts Add field help text
src/config/schema.labels.ts Add field labels
src/agents/pi-embedded-runner/extensions.ts Register compaction extension (config-gated, opt-in)
src/gateway/session-reset-service.ts before_reset trigger (config-gated, outside hook function)
src/auto-reply/reply/commands-export-trajectory.ts trajectory_export trigger (config-gated, alongside existing command)
src/agents/openai-transport-stream.ts Minor: export convertResponsesMessages for use in conversion pipeline

Configuration

trainingExport:
  enabled: true            # default: false (opt-in)
  compat: {}               # optional ModelCompatConfig override for export path

When enabled is false (the default), the extension is not registered and runTrainingExport is never called — zero overhead.

How to Test

  1. Enable via trainingExport.enabled: true
  2. Trigger a compaction in a session long enough to exceed the context threshold
  3. Check ~/.openclaw/training-export/episodes.jsonl — should contain paired task + summary episodes with compaction metadata
  4. Reset a session — should produce a task episode
  5. Run /export-trajectory — should produce a task episode via the training export path as well
  6. Disable via trainingExport.enabled: false — episodes file should receive no new entries

Open Questions for Review

  1. Default to opt-in (enabled: false) — is this the right default, or should we consider a different approach?
  2. Privacy and retention policy — the training export writes full (non-redacted) session content to disk. Should there be a retention/cleanup mechanism?

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime agents Agent runtime and tooling size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026
@clawsweeper

clawsweeper Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 2, 2026, 1:10 AM ET / 05:10 UTC.

Summary
The PR adds an opt-in training export system that writes episode-level JSONL from compaction, session reset, and trajectory export triggers, plus config schema/help, docs, and tests.

PR surface: Source +1276, Tests +506, Docs +173. Total +1955 across 11 files.

Reproducibility: yes. for the review finding: source inspection shows enabled reset, compaction, and trajectory export paths call the new training exporter, which reads the full trajectory sidecar synchronously without the current exporter’s caps.

Review metrics: 2 noteworthy metrics.

  • Config surface: 1 added object, 2 fields. trainingExport.enabled and trainingExport.compat control automatic unredacted export behavior, so maintainers need to review the upgrade and operator contract.
  • Automatic export triggers: 3 trigger paths. Compaction, session reset, and trajectory export can all write the new JSONL file when enabled, so runtime and privacy proof must cover each path.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦪 silver shellfish
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted real behavior proof for enabled compaction, reset, and trajectory export triggers, then update the PR body so ClawSweeper can re-review.
  • [P1] Bound trajectory sidecar reads and add focused coverage for oversized or high-event-count trajectory files.
  • [P1] Resolve the privacy and retention policy for unredacted training export before merge.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body has manual test instructions but no after-fix real environment output, logs, terminal proof, screenshot, or artifact showing enabled compaction/reset/export behavior writing the expected JSONL. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The provided GitHub context marks this PR as draft and dirty, so it needs a rebase/refresh before a final merge review.
  • [P1] The feature persists unredacted prompts, messages, tool data, and metadata automatically when enabled; privacy, retention, and operator expectations need explicit maintainer approval.
  • [P1] The new config surface and persistent export path need upgrade/docs alignment before landing.

Maintainer options:

  1. Bound runtime-sidecar reads first (recommended)
    Reuse the existing trajectory file-size and event-count contracts, or stream only the latest relevant snapshot, before automatic reset/compaction/export triggers can write training rows.
  2. Pause for privacy policy
    Hold the PR until maintainers decide whether unredacted automatic JSONL export belongs in core and what retention, redaction, or warning controls are required.
  3. Accept opt-in export risk
    Maintainers could deliberately accept full-content local export as an advanced opt-in feature, but that decision should be explicit in the PR discussion and docs.

Next step before merge

  • [P1] Human review is needed because the PR lacks contributor real-behavior proof, is dirty against base, and has an unresolved privacy/product decision even though one code defect is mechanically fixable.

Security
Needs attention: The diff adds an opt-in but automatic unredacted export of session content, so privacy, retention, and file handling need maintainer attention before merge.

Review findings

  • [P2] Cap automatic trajectory reads before parsing — src/training-export.ts:210-221
Review details

Best possible solution:

Keep the feature opt-in, but implement it through bounded trajectory parsing, current main module paths, and an explicit maintainer-approved privacy/retention policy before merge.

Do we have a high-confidence way to reproduce the issue?

Yes for the review finding: source inspection shows enabled reset, compaction, and trajectory export paths call the new training exporter, which reads the full trajectory sidecar synchronously without the current exporter’s caps.

Is this the best way to solve the issue?

No. The feature direction may be useful, but the current implementation should reuse bounded trajectory contracts and settle the unredacted export privacy/retention policy first.

Full review comments:

  • [P2] Cap automatic trajectory reads before parsing — src/training-export.ts:210-221
    runTrainingExport is reached from reset, /export-trajectory, and compaction hooks when the new config is enabled, but this helper reads and parses the entire trajectory sidecar synchronously with no file-size or event-count limit. Current trajectory export rejects sidecars over TRAJECTORY_RUNTIME_FILE_MAX_BYTES and caps runtime events, so an existing large session can now stall the gateway/agent path or consume excessive memory just by resetting or compacting. Reuse the bounded trajectory parser or stream only the latest relevant snapshot before appending training rows.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ebf20241bd17.

Label changes

Label changes:

  • add P2: This is a substantial opt-in feature with concrete runtime and privacy risks, but it is not a shipped urgent regression.
  • add merge-risk: 🚨 compatibility: The PR adds new config/default schema surface and a persistent export artifact path that affects operator setup and upgrade expectations.
  • add merge-risk: 🚨 security-boundary: The PR persists unredacted session content outside the existing redacted support-bundle workflow.
  • add merge-risk: 🚨 availability: Automatic trigger paths can synchronously read and parse large trajectory sidecars on runtime paths.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦪 silver shellfish.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body has manual test instructions but no after-fix real environment output, logs, terminal proof, screenshot, or artifact showing enabled compaction/reset/export behavior writing the expected JSONL. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P2: This is a substantial opt-in feature with concrete runtime and privacy risks, but it is not a shipped urgent regression.
  • merge-risk: 🚨 availability: Automatic trigger paths can synchronously read and parse large trajectory sidecars on runtime paths.
  • merge-risk: 🚨 security-boundary: The PR persists unredacted session content outside the existing redacted support-bundle workflow.
  • merge-risk: 🚨 compatibility: The PR adds new config/default schema surface and a persistent export artifact path that affects operator setup and upgrade expectations.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦪 silver shellfish.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body has manual test instructions but no after-fix real environment output, logs, terminal proof, screenshot, or artifact showing enabled compaction/reset/export behavior writing the expected JSONL. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +1276, Tests +506, Docs +173. Total +1955 across 11 files.

View PR surface stats
Area Files Added Removed Net
Source 9 1278 2 +1276
Tests 1 506 0 +506
Docs 1 173 0 +173
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 11 1957 2 +1955

Security concerns:

  • [medium] Unredacted automatic training data export — src/training-export.ts:824
    The PR stores full prompts, messages, tool data, and metadata in a persistent JSONL file outside the existing redacted support-bundle flow, while the PR body still lists privacy and retention as open questions.
    Confidence: 0.86

What I checked:

  • Repository policy read: Read the full root AGENTS.md and scoped docs/agents/gateway policies; config, storage, dependency, privacy, and runtime hot-path guidance applies to this review. (AGENTS.md:1, ebf20241bd17)
  • Current main lacks training export: Search found trajectory support-bundle and cache-trace surfaces but no current trainingExport config or episodes.jsonl runtime surface on main. (ebf20241bd17)
  • Unbounded PR reader: The new helper reads the whole trajectory sidecar synchronously, splits all rows, parses all valid JSON, and returns every event before selecting the latest snapshot. (src/training-export.ts:206, 6ace302b5313)
  • Existing bounded trajectory contract: Current trajectory export checks regular file size and event caps before parsing runtime sidecars, then rejects oversized exports rather than parsing unbounded input. (src/trajectory/export.ts:223, ebf20241bd17)
  • Automatic trigger paths: The PR wires runTrainingExport into session reset and /export-trajectory, and registers the compaction extension when trainingExport.enabled is true. (src/gateway/session-reset-service.ts:641, 6ace302b5313)
  • Config surface added: The PR adds a new trainingExport object with enabled and compat fields to the OpenClaw config type and schema. (src/config/zod-schema.ts:444, 6ace302b5313)

Likely related people:

  • vincentkoc: Local git log/blame ties the current trajectory export, gateway reset, compaction hook, and provider conversion surfaces to Vincent Koc's recent merged main history in this checkout. (role: recent area contributor; confidence: medium; commits: 459abfc26baf, ebf20241bd17; files: src/trajectory/export.ts, src/gateway/session-reset-service.ts, src/agents/agent-hooks/compaction-safeguard.ts)
  • mariozechner: The PR discussion/timeline mentions and subscribes this account, and the proposed design depends on Pi/agent-session compaction hook and message-conversion contracts. (role: adjacent dependency/contact; confidence: low; files: src/agents/sessions/agent-session.ts, packages/agent-core/src/harness/messages.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@wzhgba wzhgba force-pushed the wuzehuan/training-export branch from b675a0d to 4f6af34 Compare May 9, 2026 08:10
@openclaw-barnacle openclaw-barnacle Bot added the triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. label May 9, 2026
@wzhgba wzhgba force-pushed the wuzehuan/training-export branch 3 times, most recently from 8649e3e to 9f06f41 Compare May 9, 2026 10:46
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 1, 2026
@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 2, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation gateway Gateway runtime merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant