NeuroVLA: A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control

Weiyu Guo^1,4, He Zhang^1,4, Pengteng Li^1,4, Tiefu Cai^1,4, Ziyang Chen^1,4, Yandong Guo^1,4,
He Xiao⁴, Yongkui Yang³*, Ying Sun^1,2*, Hui Xiong^1,2*

¹The Thrust of Artificial Intelligence, HKUST (Guangzhou), China
²The Department of CSE, HKUST, Hong Kong, China
³Shenzhen Institutes of Advanced Technology, CAS, China
⁴AI²Robotics, Shenzhen, China

_{(a) Fine-grained Manipulation
Precision pouring task demanding high-frequency feedback for robotic stable control.}

_{(b) Temporal Memory
Multi-stage shaking task demonstrating phase tracking and working memory capabilities.}

_{(c) Reflexive Safety
Rapid withdrawal and recovery upon unexpected collision to ensure hardware safety.}

Real-world experiments evaluating precision (Pouring), memory (Shaking), and safety reflexes (Collision Recovery).

📖 Overview

The pursuit of general-purpose embodied intelligence faces a critical sensorimotor paradox: traditional Vision-Language-Action (VLA) models suffer from "temporal blindness" and high latency, leading to action jitter and an inability to reflex instantaneously in dynamic scenarios.

NeuroVLA introduces a bio-inspired, tri-level hierarchical architecture that restores the canonical division of labor found in biological motor systems. Instead of a monolithic processor, NeuroVLA decouples high-level cognition from low-level motor control:

Cortical Module (Vision-Language): Responsible for semantic planning and high-level goal generation.
Cerebellar Module (Adaptive): Functions as a high-frequency adaptive filter to predict sensory consequences and refine timing.
Spinal Module (Spiking Neural Network): Implements asynchronous, localized actuation and fast sensorimotor loops.

By mapping the spinal module to event-driven spiking networks, NeuroVLA exploits temporal sparsity to minimize end-to-end latency, enabling localized, hardware-efficient learning on edge devices.

🔥 Key Results

Our experiments on both simulated benchmarks and physical robotic hardware demonstrate distinctive capabilities that purely scaling monolithic VLAs cannot replicate:

Kinematic Smoothness (75% Jerk Reduction): The cerebellar module functions as an adaptive filter, effectively suppressing high-frequency intention tremor. This reduces kinematic jerk by over 75%, ensuring fluid execution even with noisy visual feedback.
Survival Reflexes (< 20 ms Latency): Under unexpected physical collisions, the cerebellar-spinal loops trigger rapid withdrawal reflexes in < 20 ms, bypassing the prohibitive latency (> 200 ms) of the cortical loop to protect hardware.
Emergent Sparsity: The neuromorphic spinal layer exhibits unsupervised functional self-organization without explicit training signals:
- Temporal Sparsity: Neurons spontaneously revert to quiescence during static posturing to minimize metabolic cost.
- Spatial Disentanglement: The network naturally segregates high-dimensional control signals into distinct, somatotopic behavioral modes.

🛠️ Installation

The environment setup is based on standard VLA dependencies. We recommend using conda to manage the environment.

Prerequisites

Linux (Ubuntu 20.04/22.04 recommended)
Python 3.10+
NVIDIA GPU with CUDA support

Environment Setup

# 1. Create a conda environment
conda create -n neurovla python=3.10 -y
conda activate neurovla

# 2. Install PyTorch (Adjust CUDA version as needed)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install requirements
pip install -r requirements.txt

# Install FlashAttention2
pip install flash-attn --no-build-isolation

Note: For specific dependency versions and detailed configuration related to the base VLA framework, please refer to the StarVLA Environment Setup Guide. Our implementation builds upon these foundational libraries.

🚀 Usage

# 1. Run training example
bash NeuroVLA/scripts/run_scripts/run_libero_train_NeuroVLA.sh

# 2. Run evaluation example
bash NeuroVLA/examples/LIBERO/eval_libero.sh

📝 Citation

If you find our code or architecture helpful in your research, please cite our repository:

@misc{guo2025neurovla,
  author = {Guo, Weiyu and Zhang, He and Li, Pengteng and Cai, Tiefu and Chen, Ziyang and Guo, Yandong and Xiao, He and Yang, Yongkui and Sun, Ying and Xiong, Hui},
  title = {NeuroVLA: A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {https://github.com/guoweiyu/NeuroVLA}}
}

🛡️ License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

This is a strict copyleft license. If you use this software (or a modified version of it) to provide a service over a network, you must make the source code available to the users of that service.

See LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
NeuroVLA		NeuroVLA
assets		assets
deployment		deployment
examples		examples
playground		playground
scripts/run_scripts		scripts/run_scripts
.README.md.swp		.README.md.swp
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuroVLA: A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control

📖 Overview

🔥 Key Results

🛠️ Installation

Prerequisites

Environment Setup

🚀 Usage

📝 Citation

🛡️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuroVLA: A Brain-like Embodied Intelligence for Fluid and Fast Reflexive Robotics Control

📖 Overview

🔥 Key Results

🛠️ Installation

Prerequisites

Environment Setup

🚀 Usage

📝 Citation

🛡️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages