🧭 GoViG: Goal-Conditioned
Visual Navigation Instruction Generation

Fengyi Wu^1,*, Yifei Dong^1,*, Zhi-Qi Cheng^1,†, Yilong Dai¹, Guangyu Chen¹, Hang Wang², Qi Dai³, Alexander G Hauptmann⁴
¹UW, ²PolyU, ³Microsoft Research, ⁴CMU

GoViG introduces a new task in embodied AI: generating navigation instructions directly from egocentric visual observations of the initial and goal states. Unlike previous methods that rely on semantic maps or structured annotations, GoViG operates purely on egocentric visual input—making it highly adaptable to unseen and unstructured environments.

🔍 Overview

GoViG decomposes the instruction generation task into two interconnected subtasks:

Navigation Visualization
Predicts intermediate visual states that bridge the initial and goal views.
Instruction Generation with Visual Cues
Synthesizes linguistically coherent and spatially grounded instructions based on both observed and anticipated visuals.

These components are unified within an autoregressive MLLM, trained with tailored objectives to ensure spatial accuracy and linguistic clarity.

🧠 Reasoning Strategies

Inspired by human navigation behavior, GoViG supports two multimodal reasoning paradigms:

One-Pass Reasoning: Generates instructions in a single forward pass.
Interleaved Reasoning: Alternates between visual prediction and language generation for incremental planning.

📦 Dataset: R2R-Goal

To evaluate GoViG, we introduce R2R-Goal, a dataset combining synthetic and real-world trajectories.

Quick Start

conda create -n GoViG python=3.10
conda activate GoViG
pip install torch==2.4.0
pip install -r requirements.txt --user

Data

We release a partial dataset for the purpose of debugging and demonstrating the data format, you can find them in data_samples. And you can access the full dataset here

unzip R2R_Goal.zip

Training

bash train.sh

Evaluation

bash eval.sh

you can find detailed metrics calculation in taskeval_vis.py.

Acknowledgement

We would like to thank ANOLE and MVOT for their publicly available codebase, which we referenced during the implementation of Anole training.

🧭 GoViG Gallery

Initial View	Goal View	Trajectory (1P)	Instructions (1P)	Trajectory (Int)	Instructions (Int)
			Stop in the doorway.		Stop in front of the last door on your right. Then take a slight left turn to go towards the bathroom. After you leave the kitchen and go through the double doors, keep going and go into the living room. Turn left at the first door past the oven and continue down the hallway. Go into the powder room that is straight ahead. Walk past the bathroom door.
			Walk into the bedroom.		Walk out of the bedroom using the door on your right. Walk out of bedroom and turn right. Leave the bedroom. Turn to your right and go outside. Exit the room. Exit bedroom through doorway on the right.
			Across the kitchen.		Exit the kitchen. Turn right at the counter. Walk past kitchen island. Turn past the sink, and in front of the oven to your left. Make a left immediately through the kitchenette, then turn right into the hallway. Walk past the sink.
			Go through the door.		Straight through the bedroom with the lamp. Turn left and wait in the doorway. Stop in the bedroom doorway. Then turn right and wait in bedroom at the end of the hall. Stop in the doorway. Turn slight left, continue straight. Turn slight left, stop at bed.
			Walk out of the kitchen.		Walk through the kitchen stop at the oven. Continue walking straight down the kitchen. Turn left, walk down the kitchen hallway. Turn left and enter kitchen. Walk and stop right before washing area. Turn right and continue down the hall until you get to a refrigerator.
			Walk past the room on the left.		Walk past the door directly across from you. Continue straight and continue through a second set of double doors. Pass the wall on the right. Turn left and enter kitchen. Go down the hall into the office on the left. Walk to the end of the hall and through the open door.
			Walk up stairs, turn right, continue up stairs		Walk up stairs. Go up the stairs. Walk straight ahead passed the stairs. Go up the stairs. Go up three steps then wait at the top. Go all of the way up the stairs.
			Walk past the room on the left.		Stop in entryway of house. Stop at sliding barn door. Wait near the patio. Turn to the front row of couches is showing and walk over to the patio. Wait in the doorway to the patio. Walk straight besides the wooden tables. Stop when you reach the sliding glass doors.

More examples of GoViG results on the Real-world Subset of our R2R-Goal dataset.

🌟 Citation

If you find this repository or our paper useful, please consider starring this repository and citing our paper:

@article{wu2025govig,
  title={GoViG: Goal-Conditioned Visual Navigation Instruction Generation},
  author={Wu, Fengyi and Dong, Yifei and Cheng, Zhi-Qi and Dai, Yilong and Chen, Guangyu and Wang, Hang and Dai, Qi and Hauptmann, Alexander G},
  journal={arXiv preprint arXiv:2508.09547},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assists		assists
config		config
data_samples		data_samples
model_utils		model_utils
prompt		prompt
scripts		scripts
README.md		README.md
accelerate_config.yaml		accelerate_config.yaml
eval.sh		eval.sh
requirements.txt		requirements.txt
taskeval_vis.py		taskeval_vis.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧭 GoViG: Goal-Conditioned
Visual Navigation Instruction Generation

🔍 Overview

🧠 Reasoning Strategies

📦 Dataset: R2R-Goal

Quick Start

Data

Training

Evaluation

Acknowledgement

🧭 GoViG Gallery

🌟 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧭 GoViG: Goal-Conditioned Visual Navigation Instruction Generation

🔍 Overview

🧠 Reasoning Strategies

📦 Dataset: R2R-Goal

Quick Start

Data

Training

Evaluation

Acknowledgement

🧭 GoViG Gallery

🌟 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🧭 GoViG: Goal-Conditioned
Visual Navigation Instruction Generation

Packages