Skip to content

ByteDance-Seed/In-Place-TTT

Repository files navigation

👋 Hi, everyone!
We are ByteDance Seed team.

You can get to know us better through the following channels👇

seed logo

In-Place Test-Time Training

Seamlessly Endowing LLMs with Test-Time Training Ability

Guhao Feng*, Shengjie Luo*, Kai Hua, Ge Zhang, Wenhao Huang, Di He, Tianle Cai

In-Place TTT is a drop-in test-time training method for Transformer LLMs. This repository provides the training, checkpoint conversion, inference, and evaluation stack built on VeOmni, together with recommended configs for Qwen3-8B and LLaMA-3.1-8B.

News

[2026/03] The codebase is open-sourced.
[2026/02] In-Place TTT is accepted to ICLR 2026 as an Oral presentation.

Table of Contents

Introduction

Current large language models follow a static "train then deploy" paradigm. Once deployed, model weights are frozen and cannot adapt to new information encountered during inference. This limits long-context reasoning, where useful information arrives progressively and the model would benefit from updating itself as it reads.

In-Place Test-Time Training (In-Place TTT) addresses this by updating a subset of model parameters, the MLP down-projection fast weights, during inference. Unlike prior TTT approaches that require architectural side modules or external memory, In-Place TTT stays inside the standard Transformer block and remains compatible with off-the-shelf autoregressive LLMs.

The method is centered around three ideas:

  1. Architectural compatibility. Fast weights live in the existing MLP down-projection matrix, so no extra attention heads or memory modules are introduced.
  2. LM-aligned objective. The fast-weight update is aligned with next-token prediction instead of a generic reconstruction target.
  3. Chunk-wise update. Long sequences are split into chunks so updates can be computed efficiently and scaled to long contexts.

In-Place TTT Method Overview

As used in this repo, the end-to-end workflow is:

  1. Provide your own VeOmni-compatible processed dataset and base model assets.
  2. Launch continual pretraining with VeOmni through train.sh and tasks/train_torch.py.
  3. Export DCP checkpoints into HuggingFace format with scripts/merge_dcp_to_hf.py.
  4. Run TTT-aware inference and RULER evaluation with inference_model/, eval.sh, and eval_config/.

The repository includes recommended training configs for Qwen3-8B and LLaMA-3.1-8B, checkpoint conversion utilities, and a full RULER evaluation pipeline via OpenCompass from 4K to 256K context lengths.

Getting Started

Environment Setup

Step 1. Install PyTorch and FlashAttention:

pip3 install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip3 install flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
rm flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl

Step 2. Install VeOmni from the validated commit:

pip3 install "veomni @ git+https://github.com/ByteDance-Seed/VeOmni.git@9b91e164bea9e17f17ed490aab5e076c2335ca25"

Step 3. Install the remaining dependencies:

pip3 install liger-kernel
pip3 install byted-wandb torchdata blobfile datasets diffusers tiktoken timm
pip3 install transformers==4.57.3
pip3 install opt_einsum einops

pip3 uninstall -y byted-wandb wandb
pip3 install byted-wandb

Step 4. Optionally verify the installed VeOmni source:

python3 - <<'PY'
import json, pathlib, veomni
p = pathlib.Path(veomni.__file__).resolve().parents[1] / "veomni-0.1.0.dist-info" / "direct_url.json"
print("veomni file:", veomni.__file__)
print("direct_url:", json.loads(p.read_text()) if p.exists() else "not found")
PY

Data Preparation

This repository no longer ships data-processing scripts. Provide your own processed dataset through data.train_path.

The recommended configs assume:

  • data.data_type=plaintext
  • data.datasets_type=iterable
  • data.text_keys=content_split

For dataset argument definitions and supported loading modes, refer to the official VeOmni docs:

Example:

bash train.sh tasks/train_torch.py configs/pretrain/qwen3_longct.yaml \
  --data.train_path /path/to/your_data \
  --train.output_dir /path/to/your_output_dir

Recommended Config

Below is the recommended model config pattern used in the provided Qwen and LLaMA examples.

model:
  model_path: /path/to/your_base_model
  foundation:
    ttt_layers: [0, 6, 12, 18, 24, 30, 36]
    ttt_mode: true
    ttt_proj: true
    ttt_lr: 3
    ttt_chunk: 4096

data:
  train_path: /path/to/your_data
  train_size: 20000000000
  dataloader_type: native
  datasets_type: iterable
  data_type: plaintext
  max_seq_len: 65536
  text_keys: content_split
  drop_last: true

train:
  output_dir: /path/to/your_output_dir
  data_parallel_mode: fsdp2
  global_batch_size: 64
  micro_batch_size: 1
  optimizer: adamw
  lr: 5.0e-6
  lr_warmup_ratio: 0.02
  lr_decay_style: cosine
  lr_decay_ratio: 0.90
  weight_decay: 0.1
  max_grad_norm: 1.0
  max_steps: 5000
  enable_mixed_precision: true
  enable_gradient_checkpointing: true
  enable_full_shard: true
  init_device: meta
  ckpt_manager: dcp
  save_steps: 500
  save_hf_weights: true
  use_wandb: true

The corresponding recommended config files are:

  • configs/pretrain/qwen3_longct.yaml
  • configs/pretrain/llama3_longct.yaml

Training

Quick smoke run:

bash train.sh tasks/train_torch.py configs/pretrain/qwen3_longct.yaml \
  --train.output_dir /path/to/your_output_dir \
  --train.max_steps 1 \
  --train.use_wandb false

Recommended Qwen config override:

bash train.sh tasks/train_torch.py configs/pretrain/qwen3_longct.yaml \
  --train.wandb_project your_wandb_project \
  --train.wandb_name your_run_name \
  --train.output_dir /path/to/your_output_dir \
  --model.foundation '{"ttt_layers":[0,6,12,18,24,30,36],"ttt_mode":true,"ttt_proj":true,"ttt_lr":3,"ttt_chunk":4096}'

Recommended LLaMA config override:

bash train.sh tasks/train_torch.py configs/pretrain/llama3_longct.yaml \
  --train.wandb_project your_wandb_project \
  --train.wandb_name your_run_name \
  --train.output_dir /path/to/your_output_dir \
  --model.foundation '{"ttt_layers":[0,6,12,18,24,30,36],"ttt_mode":true,"ttt_proj":true,"ttt_lr":3,"ttt_chunk":4096}'

Checkpoint Conversion

Convert VeOmni DCP checkpoints into HuggingFace format:

python scripts/merge_dcp_to_hf.py \
  --load-dir /path/to/your_checkpoint_dir

python scripts/merge_dcp_to_hf.py \
  --load-dir /path/to/your_checkpoint_dir \
  --save-dir /path/to/your_hf_checkpoint_dir \
  --model-assets-dir /path/to/your_base_model \
  --shard-size 5000000000

Evaluation

Run the default RULER evaluation sweep:

bash eval.sh

Single-config smoke run:

CUDA_VISIBLE_DEVICES=0 python3 -c \
  "import inference_model; from opencompass.cli.main import main; import sys; sys.argv=['opencompass','eval_config/ruler_4k.py','--debug']; main()"

To evaluate your own checkpoints, update eval_config/models.py with your model name and HuggingFace checkpoint path.

Features

  • Drop-in TTT for standard Transformers. In-Place TTT updates the MLP down-projection fast weights without introducing extra architectural side modules.
  • LM-aligned fast-weight updates. The optimization target is derived for autoregressive language modeling instead of a generic reconstruction objective.
  • Long-context continual pretraining stack. The repo includes recommended Qwen3-8B and LLaMA-3.1-8B configs built on VeOmni and FSDP2.
  • Checkpoint export path. scripts/merge_dcp_to_hf.py converts VeOmni DCP checkpoints into HuggingFace format.
  • TTT-aware inference and evaluation. inference_model/, eval.sh, and eval_config/ cover inference and RULER evaluation through OpenCompass.
  • Long-context coverage. The evaluation setup spans 4K, 8K, 16K, 32K, 64K, 128K, and includes a 256K config.

License

This project is licensed under the Apache License 2.0.

Citation

If you find this work useful for your research and applications, feel free to give us a star or cite us using:

@inproceedings{feng2026inplace,
  title     = {In-Place Test-Time Training},
  author    = {Feng, Guhao and Luo, Shengjie and Hua, Kai and Zhang, Ge and Huang, Wenhao and He, Di and Cai, Tianle},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026},
  note      = {Oral Presentation},
  url       = {https://openreview.net/forum?id=dTWfCLSoyl}
}

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors