Skip to content

lhy0807/stereonavnet-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StereoNavNet (SNN)

Learning to Navigate using Stereo Cameras with Auxiliary Occupancy Voxels

Hongyu Li, Taskin Padir, Huaizu Jiang

Northeastern University

[arXiv]

TODO

  • Model code release
  • Dataset release
  • Simulation environment release

Overview

StereoNavNet (SNN) is a visual navigation approach that combines stereo vision with 3D occupancy voxel grids for robust, geometry-aware robot navigation. Unlike semantic approaches that learn navigation directly from raw images, SNN uses an explicit 3D geometric representation as an intermediate step, leading to higher learning efficiency and improved generalizability across environments.

Architecture

SNN follows a modular design with two main components:

Stereo Images (I_L, I_R)
        │
        ▼
┌─────────────────────────┐
│   Perception Module      │  (StereoVoxelNet / SVN)
│   Stereo → 3D Voxels     │
│                         │  Loss: IoU between predicted and GT voxel grids
└──────────┬──────────────┘
           │  Occupancy Grid G (64×64×64)
           ▼
┌─────────────────────────┐
│    Policy Module         │  (3D CNN + MLP)
│  Voxels + Goal → Action  │
│                         │  Loss: MSE between predicted and expert actions
└──────────┬──────────────┘
           │
           ▼
      Action a = (v, ω)

Perception Module (StereoVoxelNet)

Estimates a 3D voxel occupancy grid from stereo RGB image pairs:

  1. Feature Extraction: MobileNet-based backbone extracts features from left and right images
  2. Cost Volume: Constructs a stereo cost volume by interweaving left/right features at multiple disparity levels
  3. 3D Decoder (UNet): A 2D-to-3D encoder-decoder architecture predicts the 64×64×64 occupancy grid

Policy Module (3D CNN + MLP)

Extracts geometric obstacle features and predicts velocity actions:

  1. 3D CNN Feature Extraction: Four 3D convolutional layers with channels [4, 8, 16, 32], kernel size 4, stride 2 process the occupancy grid into a 256-dimensional obstacle feature vector h_o
  2. Goal Encoding: The relative goal position is represented as (distance, angle)
  3. Action Prediction: An MLP with layers [258→128→32→8→2] (each with BatchNorm + ReLU) takes concatenated [h_o, goal] and predicts the velocity pair (v, ω)

Modular Learning

SNN is trained in three stages:

  1. Stage 1 – Train the perception module (SVN) using stereo images and ground-truth voxel occupancy grids (IoU loss), 150 epochs
  2. Stage 2 – Train the policy module (Planner) using ground-truth occupancy as input (MSE loss on expert actions), 50 epochs
  3. Stage 3 – Joint end-to-end fine-tuning with combined loss: L = L_perception + α · L_policy (α = 0.1), 300 epochs

File Structure

stereonavnet/
├── __init__.py         # Package init
├── args.py             # Command-line argument definitions
├── dataset.py          # Dataset loading and preprocessing
├── network.py          # Core model definitions (SVN, Planner, SNN)
├── submodule.py        # MobileNet feature extraction, cost volume utilities
├── train.py            # Training script with modular learning
└── stereonavnet/       # Package directory

Usage

Training

Stage 1: Train Perception Module (StereoVoxelNet)

python train.py --stereo_only --epochs 150

Stage 2: Train Policy Module (Planner)

python train.py --planner_only --epochs 50

Stage 3: Joint End-to-End Fine-Tuning (SNN)

python train.py --epochs 300 --alpha 0.1

Testing

python train.py --test

Key Arguments

Argument Default Description
--stereo_only False Train only the perception module (SVN)
--planner_only False Train only the policy module with GT occupancy
--epochs 400 Number of training epochs
--batch_size 4 Training batch size
--lr 1e-3 Learning rate (Adam optimizer)
--alpha 0.1 Weight for velocity loss in joint training
--gpus 1 Number of GPUs
--val_frequency 3 Validation every N epochs
--dataset_portion 1.0 Fraction of training data to use
--test False Run in test mode

Citation

@INPROCEEDINGS{10802385,
  author={Li, Hongyu and Padır, Taşkın and Jiang, Huaizu},
  booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, 
  title={StereoNavNet: Learning to Navigate using Stereo Cameras with Auxiliary Occupancy Voxels}, 
  year={2024},
  pages={5559-5566},
  doi={10.1109/IROS58592.2024.10802385}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages