Skip to content

guoweiyu/NeuroVLA

Repository files navigation

NeuroVLA: A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control

Weiyu Guo1,4, He Zhang1,4, Pengteng Li1,4, Tiefu Cai1,4, Ziyang Chen1,4, Yandong Guo1,4,
He Xiao4, Yongkui Yang3*, Ying Sun1,2*, Hui Xiong1,2*

1The Thrust of Artificial Intelligence, HKUST (Guangzhou), China
2The Department of CSE, HKUST, Hong Kong, China
3Shenzhen Institutes of Advanced Technology, CAS, China
4AI2Robotics, Shenzhen, China

Nature Python License


Fine-grained Manipulation
(a) Fine-grained Manipulation
Precision pouring task demanding high-frequency feedback for robotic stable control.
Temporal Memory
(b) Temporal Memory
Multi-stage shaking task demonstrating phase tracking and working memory capabilities.
Reflexive Safety
(c) Reflexive Safety
Rapid withdrawal and recovery upon unexpected collision to ensure hardware safety.

Real-world experiments evaluating precision (Pouring), memory (Shaking), and safety reflexes (Collision Recovery).


📖 Overview

The pursuit of general-purpose embodied intelligence faces a critical sensorimotor paradox: traditional Vision-Language-Action (VLA) models suffer from "temporal blindness" and high latency, leading to action jitter and an inability to reflex instantaneously in dynamic scenarios.

NeuroVLA introduces a bio-inspired, tri-level hierarchical architecture that restores the canonical division of labor found in biological motor systems. Instead of a monolithic processor, NeuroVLA decouples high-level cognition from low-level motor control:

  1. Cortical Module (Vision-Language): Responsible for semantic planning and high-level goal generation.
  2. Cerebellar Module (Adaptive): Functions as a high-frequency adaptive filter to predict sensory consequences and refine timing.
  3. Spinal Module (Spiking Neural Network): Implements asynchronous, localized actuation and fast sensorimotor loops.

By mapping the spinal module to event-driven spiking networks, NeuroVLA exploits temporal sparsity to minimize end-to-end latency, enabling localized, hardware-efficient learning on edge devices.

🔥 Key Results

Our experiments on both simulated benchmarks and physical robotic hardware demonstrate distinctive capabilities that purely scaling monolithic VLAs cannot replicate:

  • Kinematic Smoothness (75% Jerk Reduction): The cerebellar module functions as an adaptive filter, effectively suppressing high-frequency intention tremor. This reduces kinematic jerk by over 75%, ensuring fluid execution even with noisy visual feedback.
  • Survival Reflexes (< 20 ms Latency): Under unexpected physical collisions, the cerebellar-spinal loops trigger rapid withdrawal reflexes in < 20 ms, bypassing the prohibitive latency (> 200 ms) of the cortical loop to protect hardware.
  • Emergent Sparsity: The neuromorphic spinal layer exhibits unsupervised functional self-organization without explicit training signals:
    • Temporal Sparsity: Neurons spontaneously revert to quiescence during static posturing to minimize metabolic cost.
    • Spatial Disentanglement: The network naturally segregates high-dimensional control signals into distinct, somatotopic behavioral modes.

🛠️ Installation

The environment setup is based on standard VLA dependencies. We recommend using conda to manage the environment.

Prerequisites

  • Linux (Ubuntu 20.04/22.04 recommended)
  • Python 3.10+
  • NVIDIA GPU with CUDA support

Environment Setup

# 1. Create a conda environment
conda create -n neurovla python=3.10 -y
conda activate neurovla

# 2. Install PyTorch (Adjust CUDA version as needed)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install requirements
pip install -r requirements.txt

# Install FlashAttention2
pip install flash-attn --no-build-isolation

Note: For specific dependency versions and detailed configuration related to the base VLA framework, please refer to the StarVLA Environment Setup Guide. Our implementation builds upon these foundational libraries.

🚀 Usage

# 1. Run training example
bash NeuroVLA/scripts/run_scripts/run_libero_train_NeuroVLA.sh

# 2. Run evaluation example
bash NeuroVLA/examples/LIBERO/eval_libero.sh

📝 Citation

If you find our code or architecture helpful in your research, please cite our repository:

@misc{guo2025neurovla,
  author = {Guo, Weiyu and Zhang, He and Li, Pengteng and Cai, Tiefu and Chen, Ziyang and Guo, Yandong and Xiao, He and Yang, Yongkui and Sun, Ying and Xiong, Hui},
  title = {NeuroVLA: A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {https://github.com/guoweiyu/NeuroVLA}}
}

🛡️ License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

This is a strict copyleft license. If you use this software (or a modified version of it) to provide a service over a network, you must make the source code available to the users of that service.

See LICENSE for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors