SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

[📖 Paper] [🤗 Daily Paper]

🔥 Overview

We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization.

SKILL0 achieves substantial improvements over the standard RL baseline on ALFWorld and Search-QA.

🗞️ News

2026-4-03: We release our paper and code.

🛠️ Installation

Python environment

conda create -n skillzero python=3.12 -y
conda activate skillzero

pip install vllm==0.10.0
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip install -e .

Log in to Weights & Biases if you use WandB logging (scripts pass trainer.logger=['console','wandb'] in many cases):

export WANDB_API_KEY=your_key_here

Install Supported Environments

1. ALFWorld

Install with pip:

pip3 install gymnasium==0.29.1
pip3 install stable-baselines3==2.6.0
pip3 install alfworld

Download PDDL & Game files and pre-trained MaskRCNN detector (will be stored in ~/.cache/alfworld/):

alfworld-download -f

2. Search

cd ./agent_system/environments/env_package/search/third_party
pip install -e .
pip install gym==0.26.2

Prepare dataset (data will be saved at ~/data/searchR1_processed_direct):

cd repo_root/
python examples/data_preprocess/preprocess_search_r1_dataset.py

Build Retriever environments:

conda create -n retriever python=3.10 -y
conda activate retriever

conda install numpy==1.26.4 
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

pip install transformers datasets pyserini huggingface_hub
conda install faiss-gpu==1.8.0 -c pytorch -c nvidia -y
pip install uvicorn fastapi

Download the index:

conda activate retriever

local_dir=~/data/searchR1
python examples/search/searchr1_download.py --local_dir $local_dir
cat $local_dir/part_* > $local_dir/e5_Flat.index
gzip -d $local_dir/wiki-18.jsonl.gz

Start the local flat e5 retrieval server:

conda activate retriever

# redirect the output to a file to avoid cluttering the terminal
# we have observed outputting to the terminal causing spikes in server response times
bash examples/search/retriever/retrieval_launch.sh > retrieval_server.log

Validation parquet for SkillZero Search

python -m examples.data_preprocess.generate_search_r1_val

Training

All scripts live under scripts/ and assume the repo root as working directory (they cd there automatically). You can run either:

bash scripts/train_alfworld_skillzero_3b.sh
bash scripts/train_search_skillzero_3b

### Merge checkpoints

See `scripts/model_merger.py` for FSDP/Megatron merge examples using paths under `./checkpoints/...`.

⭐️ Citation

If you find this project useful, welcome to cite us.

@misc{lu2026skill0,
      title={SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization}, 
      author={Zhengxi Lu and Zhiyuan Yao and Jinyang Wu and Chengcheng Han and Qi Gu and Xunliang Cai and Weiming Lu and Jun Xiao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2604.02268},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.02268}, 
}

🤝 Acknowledgement

This project builds on AgentOCR, verl-agent, veRL, ALFWorld, SkillRL, and Search-R1. We thank the authors of those projects.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
agent_system		agent_system
agentocr		agentocr
docker		docker
docs		docs
examples		examples
gigpo		gigpo
recipe		recipe
scripts		scripts
skills		skills
tests		tests
verl		verl
.gitignore		.gitignore
.hopeignore		.hopeignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

🔥 Overview

🗞️ News

🛠️ Installation

Python environment

Install Supported Environments

1. ALFWorld

2. Search

Training

⭐️ Citation

🤝 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

🔥 Overview

🗞️ News

🛠️ Installation

Python environment

Install Supported Environments

1. ALFWorld

2. Search

Training

⭐️ Citation

🤝 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages