Skip to content

research: evaluate RL training infrastructure options for memory consolidation #708

@Aureliolo

Description

@Aureliolo

Context

Multiple research findings point to RL-based optimization as the path to significantly better agent memory and experience extraction:

  • Complementary RL (arXiv:2603.17621): Co-evolutionary actor + experience-extractor with GRPO/CISPO optimization. Paper's own ablation (Figure 3a) shows static extractor without RL yields only marginal gains -- the architecture is adoptable but performance claims are RL-specific.
  • Memex(RL) (arXiv:2603.04257): RL reward shaping trains write/read behaviors under context budget. Results: 24.2% -> 85.6% success on hardened ALFWorld.
  • EvoSkill (arXiv:2603.02766): Evolutionary loop auto-discovers reusable skills from failure trajectories with Pareto frontier selection.

The non-RL adaptations from these papers are filed in #704. This issue tracks the longer-term question: should SynthOrg invest in RL training infrastructure to unlock the full performance gains?

Evaluation Criteria

  • What RL frameworks exist for LLM agent optimization? (RLHF libraries, GRPO implementations, etc.)
  • What infrastructure is required? (GPU compute, training pipeline, evaluation harness)
  • What is the minimum viable RL loop? (e.g., reward signal from task outcomes, no custom training)
  • Can existing execution history serve as training data?
  • Cost/benefit: what performance improvement justifies the infrastructure investment?
  • Is there a hosted/managed option that avoids self-hosted training infrastructure?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:mediumShould do, but not blockingscope:large3+ days of workspec:memoryDESIGN_SPEC Section 7 - Memory & Persistencetype:researchEvaluate options, make tech decisionsv0.6Minor version v0.6v0.6.8Patch release v0.6.8

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions