Skip to content

QuantaAlpha/MemGovern

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MemGovern Figure 2

🌟 MemGovern: Enhancing Code Agents with Experience Memory

πŸš€ Boost SWE-Agent Performance with Governance-Aware Memory Retrieval



πŸš€ Overview

🎯 Remember · Retrieve · Resolve - Make Past Experience Work for You

MemGovern enhances SWE-Agent by injecting governance-aware experience memories into the agent's reasoning loop. When facing a new GitHub issue, the agent retrieves similar past experiences and learns from successful resolution patterns.

πŸ› New Issue β†’ πŸ” Memory Retrieval β†’ πŸ“š Experience Injection β†’ 🧠 Enhanced Reasoning β†’ βœ… Better Patches


πŸ“Š Performance

MemGovern Performance

SWE-Agent vs MemGovern (Ours) on SWE-Bench Verified across 9 LLMs

πŸ† Key Results

Model SWE-Agent MemGovern (Ours) Improvement
Claude-4-Sonnet 66.6% 69.8% +3.2
GPT5-Medium 65.0% 67.4% +2.4
DeepSeek-V3.1T 62.8% 65.8% +3.0
Qwen3-235B 47.2% 55.4% +8.2
Kimi-K2-Instruct 43.8% 51.8% +8.0
GPT-4o 23.2% 32.6% +9.4
GPT-4o-Mini 14.0% 17.2% +3.2

From "Solving from Scratch" β†’ To "Learning from Experience"


πŸ“ Repository Structure

MemGovern/
β”œβ”€β”€ data/                  # πŸ“¦ Experience DB artifacts (Git LFS)
β”‚   └── agentic_exp_data_1220_13w_DSnewPrompt/
β”‚       β”œβ”€β”€ experience_data.json
β”‚       └── chroma_db_experience/
β”œβ”€β”€ trajectories/          # πŸ—‚οΈ Model trajectory archives (Git LFS)
β”‚   β”œβ”€β”€ gpt4o_*.tar.gz
β”‚   β”œβ”€β”€ gemini3_pro_trajectory.tar.gz
β”‚   └── ...
β”œβ”€β”€ config/                # βš™οΈ SWE-Agent compatible YAML configs
β”‚   β”œβ”€β”€ benchmarks/        #    Benchmark sweep configurations
β”‚   β”œβ”€β”€ demo/              #    Lightweight demo presets
β”‚   β”œβ”€β”€ human/             #    Human study protocols
β”‚   └── exotic/            #    Ablation experiment settings
β”œβ”€β”€ tools/                 # πŸ”§ Memory pipeline utilities
β”‚   β”œβ”€β”€ experience_server.py
β”‚   β”œβ”€β”€ issue_memory_rag/
β”‚   β”œβ”€β”€ exp_search/
β”‚   └── ...
β”œβ”€β”€ scripts/               # πŸ“œ Data collection scripts
β”‚   β”œβ”€β”€ github_scraper.py
β”‚   └── experience_process.py
β”œβ”€β”€ figs/                  # πŸ–ΌοΈ Publication-ready figures
└── requirements.txt       # πŸ“¦ Runtime deps (installs SWE-agent + utilities)

βš™οΈ Reproducing MemGovern (Linux + Docker)

MemGovern is implemented as memory tools + configs on top of SWE-agent. A full run uses two terminals:

  • Terminal A: start the Experience Server (vector search + experience lookup)
  • Terminal B: run SWE-agent on SWE-bench with a MemGovern config that calls the server tools

Requirements: Linux (or WSL2), Python β‰₯ 3.11, Git, Docker.

WSL2 note: Windows drives are mounted under /mnt/ (e.g., E:\ β†’ /mnt/e/).

1) Install (this also downloads SWE-agent code)

git clone https://github.com/QuantaAlpha/MemGovern.git
cd MemGovern

python3 -m venv SWE
source SWE/bin/activate
pip install -U pip
pip install -r requirements.txt

2) Prepare MemGovern experience data

The Experience Server needs two artifacts:

  • experience_data.json: governed experience cards (key β†’ structured fields, including bug_description / fix_experience)
  • chroma_db_experience/: a persistent ChromaDB store used for semantic retrieval

In this repository, we provide them under:

  • data/agentic_exp_data_1220_13w_DSnewPrompt/ (tracked via Git LFS)

Place them in a directory (example layout):

<EXPERIENCE_DATA_DIR>/
β”œβ”€β”€ experience_data.json
└── chroma_db_experience/
    β”œβ”€β”€ chroma.sqlite3
    └── <uuid>/
        β”œβ”€β”€ data_level0.bin
        β”œβ”€β”€ header.bin
        β”œβ”€β”€ index_metadata.pickle
        β”œβ”€β”€ length.bin
        └── link_lists.bin

Notes:

  • These artifacts are large; we recommend hosting them via Git LFS or a separate dataset release.
  • Retrieval quality depends on using the same embedding model at serving time as was used to build the ChromaDB store.

In our internal runs, we keep these files under a folder named agentic_exp_data_1220_13w_DSnewPrompt/.

3) Start the Experience Server (Terminal A)

cd <MEMGOVERN_ROOT>/data/agentic_exp_data_1220_13w_DSnewPrompt
source <MEMGOVERN_ROOT>/SWE/bin/activate

export DB_DIR="$PWD/chroma_db_experience"
export JSON_DATA_PATH="$PWD/experience_data.json"
export MODEL_PATH="<PATH_OR_MODEL_ID_FOR_SENTENCE_TRANSFORMERS>"
export HOST="0.0.0.0"
export PORT="9030"

python <MEMGOVERN_ROOT>/tools/experience_server.py

How to confirm it is running

In another shell:

curl -s http://localhost:9030/health

You should also see log lines like:

  • [TOOL] /search ...
  • [TOOL] /get_experience ...

when the agent uses the tools (this is the run-through evidence we use).

4) Run SWE-agent with MemGovern config (Terminal B)

Before running, edit config/dsv31t_agenticMemSearch_1220_13w.yaml and replace:

  • agent.model.api_base: YOUR_API_BASE
  • agent.model.api_key: YOUR_API_KEY
cd <MEMGOVERN_ROOT>
source SWE/bin/activate

sweagent run-batch \
  --config config/dsv31t_agenticMemSearch_1220_13w.yaml \
  --instances.type swe_bench \
  --instances.subset verified \
  --instances.split test \
  --num_workers 12 \
  --instances.shuffle=False

About the config β†’ server wiring

config/dsv31t_agenticMemSearch_1220_13w.yaml sets tool endpoints:

  • GRAPH_EXP_SEARCH_URL: http://host.docker.internal:9030/search
  • GRAPH_EXP_READ_URL: http://host.docker.internal:9030/get_experience

This is the recommended setup when SWE-agent runs tasks inside Docker and the Experience Server runs on the host.

5) Evaluate (SWE-bench harness)

After the run finishes, evaluate the produced predictions:

python -m swebench.harness.run_evaluation \
  --predictions_path <PATH_TO_PREDS_JSON> \
  --dataset_name princeton-nlp/SWE-bench_Verified \
  --run_id <RUN_ID> \
  --max_workers 8

The predictions file is typically named preds.json under your run’s trajectories/ output directory.

If python -m swebench... is not available in your environment, install the SWE-bench harness following the official SWE-bench instructions.


πŸ› οΈ Tools & Configs

πŸ“Š Data Collection

Scrape GitHub PR data (metadata + patch + comments):

export GITHUB_TOKEN=your_github_token
python scripts/github_scraper.py \
  --csv-path <PATH_TO_INPUT_CSV> \
  --output-dir <OUTPUT_DIR> \
  --chunk-size 200

🧾 Experience Governance (Experience Card generation)

We provide experience_process.py to transform issue/PR/patch fields into governed experience cards using an LLM. It reads an input parquet table and writes JSONL/parquet with the Experience Card fields.

export API_KEY=your_llm_key
export BASE_URL=your_llm_base_url   # optional if using OpenAI default
export MODEL=your_model_name

python scripts/experience_process.py \
  --input <INPUT_PARQUET> \
  --output <OUTPUT_JSONL_OR_PARQUET> \
  --output-format jsonl \
  --max-workers 200

🧠 Experience Server

Launch the memory retrieval service (see β€œReproducing MemGovern” above). The server reads these env vars:

  • DB_DIR
  • JSON_DATA_PATH
  • MODEL_PATH
  • HOST (default 0.0.0.0)
  • PORT (default 9030)

βš™οΈ Config Categories

Config Use Case
config/benchmarks/*.yaml Full benchmark sweeps with different governance settings
config/demo/*.yaml Quick demos with minimal latency
config/human/*.yaml Human evaluation study protocols
config/exotic/*.yaml Ablation: windowed replace, late reproduction

πŸ–ΌοΈ Visuals

Tool Usage

Patch vs Experience Cases

🀝 Contributing

🌟 Join Us in Building Better Code Agents

We welcome contributions of all kindsβ€”new configs, tools, bug fixes, or documentation improvements!

πŸš€ Ways to Contribute

  • πŸ› Bug Reports: Open an issue
  • πŸ’‘ New Configs: Add timestamped YAML files under config/
  • πŸ”§ New Tools: Extend the tools/ directory with your utilities
  • πŸ“Š Trajectories: Share model runs via Git LFS

Note: Large files (>50 MB) should use Git LFS. Run git lfs ls-files before committing.


πŸ™ Acknowledgments

Special thanks to:


🌐 About QuantaAlpha

QuantaAlpha was founded in April 2025 by researchers from Tsinghua University, Peking University, CAS, CMU, HKUST, and more.

🌟 Our mission: Explore the "quantum" of intelligence and pioneer the "alpha" frontier of agent research.

✨ Research Directions:

  • CodeAgent: End-to-end autonomous task execution
  • DeepResearch: Deep reasoning & retrieval-augmented intelligence
  • Agentic RL: Agent-based reasoning and reinforcement learning
  • Self-evolution: Multi-agent coordination and learning

πŸ”— Team Homepage: QuantaAlpha
πŸ“§ Email: quantaalpha.ai@gmail.com


Star History

Star History Chart


⭐ If MemGovern helps your research, please give us a star!

Made with ❀️ by the QuantaAlpha Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published