feat(training-export): overhaul trigger system and message conversion#79703
feat(training-export): overhaul trigger system and message conversion#79703wzhgba wants to merge 1 commit into
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 2, 2026, 1:10 AM ET / 05:10 UTC. Summary PR surface: Source +1276, Tests +506, Docs +173. Total +1955 across 11 files. Reproducibility: yes. for the review finding: source inspection shows enabled reset, compaction, and trajectory export paths call the new training exporter, which reads the full trajectory sidecar synchronously without the current exporter’s caps. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep the feature opt-in, but implement it through bounded trajectory parsing, current main module paths, and an explicit maintainer-approved privacy/retention policy before merge. Do we have a high-confidence way to reproduce the issue? Yes for the review finding: source inspection shows enabled reset, compaction, and trajectory export paths call the new training exporter, which reads the full trajectory sidecar synchronously without the current exporter’s caps. Is this the best way to solve the issue? No. The feature direction may be useful, but the current implementation should reuse bounded trajectory contracts and settle the unredacted export privacy/retention policy first. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against ebf20241bd17. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +1276, Tests +506, Docs +173. Total +1955 across 11 files. View PR surface stats
Security concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
b675a0d to
4f6af34
Compare
8649e3e to
9f06f41
Compare
9f06f41 to
6ace302
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
Introduce a trajectory-first, trigger-driven training export system that produces episode-level JSONL data from the OpenClaw runtime — no offline reconstruction, no separate pipeline. The system is opt-in (
trainingExport.enabled: true) and writes to:Each line is a self-contained training sample: a task episode (full agent turn with system prompt, messages, tools, metadata) or a compact-summary episode (compression prompt → summary pair for RL compaction training).
Relationship to Existing Systems
/export-trajectory(human-facing debug bundles)The existing
/export-trajectorycommand (docs atdocs/tools/trajectory.md) produces redacted interactive support bundles for human debugging — prompt timelines, tool traces, transcript snapshots, usage metadata. It is triggered manually by users or support staff.The training export introduced here is complementary and non-overlapping:
/export-trajectoryBoth systems read from the same trajectory (trajectory capture /
cache-trace). Training export simply produces a different output format for a different consumer, alongside the existing mechanism.Compaction subsystem
Training export hooks into the Pi SDK compaction lifecycle (
session_before_compact+session_compact) to capture:compactionmetadata (tokensBefore,firstKeptEntryId,fromExtension)This is the same data the compaction system already computes internally — training export just persists it in a structured training format before it is discarded.
Key Design Decisions
1. Trajectory-first
All training fields (system prompt, messages, tools, model metadata) come from runtime trajectory
context.compiledevents. Message and tool conversion delegates to the Pi SDK / provider layer (convertMessagesfrom@mariozechner/pi-ai/openai-completions).2. Unified compaction hook
A single Pi SDK extension (
session_before_compact+session_compact) handles all compaction modes (default, safeguard, manual, overflow, timeout). NorunTrainingExportcalls scattered across individual compaction paths.3. Pair-export guarantee
For compaction-triggered exports, task and summary episodes are built as a batch. If either is filtered by quality checks, the entire batch is discarded — no orphaned episodes.
4. Config-gated at every call site
getTrainingExportConfig(cfg)?.enabled === trueis checked at all three entry points (extension registration, session reset, trajectory export command), so reviewers can see the opt-in gating logic without digging into implementation details.5.
compactionSummarybridgingPi SDK's
convertToLlmconvertscompactionSummary→usermessages, but the upstreamconvertMessagesfrom@mariozechner/pi-ai/openai-completionsdoes not handle thecompactionSummaryrole. A pre-processing step (sharing a single map with thinking-block stripping) mirrors Pi SDK's conversion format before handing off to the upstream converter.6. Training-quality message filtering (all triggers)
Training episodes must end with a complete
assistantmessage — regardless of trigger type. Any snapshot (compaction, reset, or trajectory export) may end mid-turn at a non-assistantmessage (e.g.toolResult). Trailing non-assistant messages are trimmed from every trigger's output. ThetrainExampleMessagesAreUsablecheck requires ≥1 user + ≥1 assistant; if trimming leaves the episode unusable, it is discarded entirely. This is a universal training-data quality requirement, not a compaction-specific behavior.7. Reset export is independent of plugin hooks
The
before_resettraining export call is placed outsideemitGatewayBeforeResetPluginHook, so it fires regardless of whether anybefore_resetplugin hooks are registered.8. Private file permissions
The export directory (
~/.openclaw/training-export) and JSONL file are created with private filesystem modes (0o700/0o600) to prevent world-readable access to training data.Files Changed
src/training-export.tssrc/training-export.test.tsdocs/training-export.mdsrc/config/types.openclaw.tstrainingExportconfig type (enabled,compat)src/config/zod-schema.tstrainingExportschemasrc/config/schema.help.tssrc/config/schema.labels.tssrc/agents/pi-embedded-runner/extensions.tssrc/gateway/session-reset-service.tsbefore_resettrigger (config-gated, outside hook function)src/auto-reply/reply/commands-export-trajectory.tstrajectory_exporttrigger (config-gated, alongside existing command)src/agents/openai-transport-stream.tsconvertResponsesMessagesfor use in conversion pipelineConfiguration
When
enabledisfalse(the default), the extension is not registered andrunTrainingExportis never called — zero overhead.How to Test
trainingExport.enabled: true~/.openclaw/training-export/episodes.jsonl— should contain paired task + summary episodes withcompactionmetadata/export-trajectory— should produce a task episode via the training export path as welltrainingExport.enabled: false— episodes file should receive no new entriesOpen Questions for Review
enabled: false) — is this the right default, or should we consider a different approach?