Skip to content

F1y1113/GoViG

Repository files navigation


🧭 GoViG: Goal-Conditioned
Visual Navigation Instruction Generation

Fengyi Wu1,*, Yifei Dong1,*, Zhi-Qi Cheng1,†, Yilong Dai1, Guangyu Chen1, Hang Wang2, Qi Dai3, Alexander G Hauptmann4
1UW, 2PolyU, 3Microsoft Research, 4CMU

task

GoViG introduces a new task in embodied AI: generating navigation instructions directly from egocentric visual observations of the initial and goal states. Unlike previous methods that rely on semantic maps or structured annotations, GoViG operates purely on egocentric visual input—making it highly adaptable to unseen and unstructured environments.

🔍 Overview

task

GoViG decomposes the instruction generation task into two interconnected subtasks:

  • Navigation Visualization
    Predicts intermediate visual states that bridge the initial and goal views.

  • Instruction Generation with Visual Cues
    Synthesizes linguistically coherent and spatially grounded instructions based on both observed and anticipated visuals.

These components are unified within an autoregressive MLLM, trained with tailored objectives to ensure spatial accuracy and linguistic clarity.

🧠 Reasoning Strategies

Inspired by human navigation behavior, GoViG supports two multimodal reasoning paradigms:

  • One-Pass Reasoning: Generates instructions in a single forward pass.
  • Interleaved Reasoning: Alternates between visual prediction and language generation for incremental planning.

📦 Dataset: R2R-Goal

To evaluate GoViG, we introduce R2R-Goal, a dataset combining synthetic and real-world trajectories.

Quick Start

conda create -n GoViG python=3.10
conda activate GoViG
pip install torch==2.4.0
pip install -r requirements.txt --user

Data

We release a partial dataset for the purpose of debugging and demonstrating the data format, you can find them in data_samples. And you can access the full dataset here

unzip R2R_Goal.zip

Training

bash train.sh

Evaluation

bash eval.sh

you can find detailed metrics calculation in taskeval_vis.py.

Acknowledgement

We would like to thank ANOLE and MVOT for their publicly available codebase, which we referenced during the implementation of Anole training.

🧭 GoViG Gallery

        Initial View                 Goal View          Trajectory (1P) Instructions (1P) Trajectory (Int)           Instructions (Int)         
Stop in the doorway. Stop in front of the last door on your right.
Then take a slight left turn to go towards the bathroom.
After you leave the kitchen and go through the double doors, keep going and go into the living room.
Turn left at the first door past the oven and continue down the hallway.
Go into the powder room that is straight ahead.
Walk past the bathroom door.
Walk into the bedroom. Walk out of the bedroom using the door on your right.
Walk out of bedroom and turn right.
Leave the bedroom.
Turn to your right and go outside.
Exit the room.
Exit bedroom through doorway on the right.
Across the kitchen. Exit the kitchen.
Turn right at the counter.
Walk past kitchen island.
Turn past the sink, and in front of the oven to your left.
Make a left immediately through the kitchenette, then turn right into the hallway.
Walk past the sink.
Go through the door. Straight through the bedroom with the lamp.
Turn left and wait in the doorway.
Stop in the bedroom doorway.
Then turn right and wait in bedroom at the end of the hall.
Stop in the doorway.
Turn slight left, continue straight. Turn slight left, stop at bed.
Walk out of the kitchen. Walk through the kitchen stop at the oven.
Continue walking straight down the kitchen.
Turn left, walk down the kitchen hallway.
Turn left and enter kitchen.
Walk and stop right before washing area.
Turn right and continue down the hall until you get to a refrigerator.
Walk past the room on the left. Walk past the door directly across from you.
Continue straight and continue through a second set of double doors.
Pass the wall on the right.
Turn left and enter kitchen.
Go down the hall into the office on the left.
Walk to the end of the hall and through the open door.
Walk up stairs, turn right, continue up stairs Walk up stairs.
Go up the stairs.
Walk straight ahead passed the stairs.
Go up the stairs.
Go up three steps then wait at the top.
Go all of the way up the stairs.
Walk past the room on the left. Stop in entryway of house.
Stop at sliding barn door.
Wait near the patio.
Turn to the front row of couches is showing and walk over to the patio. Wait in the doorway to the patio.
Walk straight besides the wooden tables.
Stop when you reach the sliding glass doors.

More examples of GoViG results on the Real-world Subset of our R2R-Goal dataset.

real

🌟 Citation

If you find this repository or our paper useful, please consider starring this repository and citing our paper:

@article{wu2025govig,
  title={GoViG: Goal-Conditioned Visual Navigation Instruction Generation},
  author={Wu, Fengyi and Dong, Yifei and Cheng, Zhi-Qi and Dai, Yilong and Chen, Guangyu and Wang, Hang and Dai, Qi and Hauptmann, Alexander G},
  journal={arXiv preprint arXiv:2508.09547},
  year={2025}
}

About

Official implementation of paper "GoViG: Goal-Conditioned Visual Navigation Instruction Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors