Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments (ICLR 2026)
TMoW is a novel framework that enables embodied agents to dynamically adapt to unseen domains at test time without requiring costly retraining. Unlike traditional Mixture-of-Experts (MoE) architectures with fixed routing functions, TMoW performs test-time training of routing functions by leveraging world models as internal simulators.
- Multi-granular Prototype-based Router: Adapts world model mixtures by comparing input observations with learned prototype representations across different levels of spatial abstraction (from local objects to global scenes)
- Test-time Prototype Refinement: Refines prototypes through weighted interpolation between existing prototypes based on their similarity to the current environment, enabling zero-shot adaptation to unseen domains
- Distilled Mixture-based Model Augmentation: Supports data-efficient creation of new world models by distilling knowledge from existing model mixtures using few-shot demonstrations
┌─────────────────────────────────────┐
│ Multi-granular Prototype │
│ based Router │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
Observation ───►│ │ GCN │─►│ GCN │─►│ GCN │─►... │
+ Instruction │ └──┬──┘ └──┬──┘ └──┬──┘ │
│ │ │ │ │
│ Layer 1 Layer 2 Layer N │
│ Routing Routing Routing │
└─────┼────────┼────────┼─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ Mixture of World Models (MoW) │
│ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │Expert │ │Expert │ │Expert │ ... │
│ │ 1 │ │ 2 │ │ N │ │
│ │(LoRA) │ │(LoRA) │ │(LoRA) │ │
│ └───────┘ └───────┘ └───────┘ │
│ Base LLM (Frozen) │
└─────────────────────────────────────┘
# Clone the repository
git clone https://github.com/doldam0/tmow.git
cd tmow
# Install dependencies with uv (Python 3.12+)
uv syncuv pip install -e tmow/environments/virtualhomeALFWorld is automatically installed as a dependency. Make sure to set up the required data:
export ALFWORLD_DATA=/path/to/alfworld/dataFirst, train individual domain-specific expert models:
uv run tmow train expert --config configs/train_expert_config.pyTrain the multi-granular prototype-based router:
uv run tmow train --config configs/train_mow_config.yamluv run tmow eval virtualhome \
--model_path /path/to/mow/model \
--domain_type seen \
--task_type seenuv run tmow eval alfworld \
--model_path /path/to/mow/model \
--dataset_path /path/to/eval/datasetExpand the model to new domains with few-shot demonstrations:
uv run tmow expand \
--config configs/train_mow_config.yaml \
--datasets /path/to/new/domain/data \
--num_samples 10 \
--output_path /path/to/expanded/modeltmow/
├── tmow/
│ ├── cli/ # Command-line interface
│ ├── common/ # Common utilities (data, graph, trainer)
│ ├── dataset/ # Dataset builders (VirtualHome, ALFWorld, etc.)
│ ├── environments/ # Environment wrappers
│ │ └── virtualhome/ # VirtualHome simulator
│ ├── modules/ # Core model components
│ │ ├── mow.py # Mixture of World Models
│ │ ├── routers.py # Graph-based router
│ │ ├── gcn.py # Graph Convolutional Network
│ │ └── mlp.py # MLP modules
│ ├── scripts/ # Training and evaluation scripts
│ └── utils/ # Utility functions
├── configs/ # Configuration files
└── tests/ # Unit tests
The core model that combines multiple domain-specific world models (implemented as LoRA adapters) with a prototype-based routing mechanism. See tmow/modules/mow.py.
A multi-layer GCN-based router that computes routing scores by comparing input graph embeddings with domain prototypes at multiple granularity levels. See tmow/modules/routers.py.
During inference, the router can be refined using the refine_router() method, which updates prototypes based on similarity to the current environment without requiring gradient updates.
TMoW demonstrates significant improvements over baselines:
- 27.21% improvement over SayCanPay in zero-shot adaptation scenarios
- 25.66% gain in few-shot expansion when constructing new world models
Evaluated on:
- VirtualHome
- ALFWorld
- RLBench
- Real-world robotic scenarios
@inproceedings{tmow2026,
title={Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments},
author={Jinwoo Jang, Minjong Yoo, Sihyung Yoon, Honguk Woo},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}This project is licensed under the MIT License - see the LICENSE file for details.
