Skip to content

HKUST-MINSys-Lab/MMEdge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding

This is the repo for SenSys 2026 paper: "MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding".

Introduction

This repository provides the public release of MMEdge, a real-time on-device multimodal inference framework based on pipelined sensing and encoding. Unlike traditional multimodal systems that wait for complete sensor inputs before inference, MMEdge decomposes data collection and computation into fine-grained sensing and encoding units, enabling fully pipelined and parallel execution across modalities. To maintain accuracy under this fine-grained design, MMEdge introduces a lightweight temporal aggregation module that preserves temporal continuity across units. It further incorporates an adaptive multimodal configuration optimizer that dynamically selects optimal sensing and model configurations under latency constraints, and a cross-modal speculative skipping mechanism that bypasses redundant computations when early predictions reach high confidence. MMEdge achieves up to 75.8% reduction in end-to-end latency while maintaining comparable accuracy on multiple public datasets (LRW, NuScenes-QA) and a real-world UAV multimodal testbed, demonstrating its effectiveness for low-latency on-device multimodal perception and reasoning.

✅ Paper: https://arxiv.org/abs/2510.25327v5
✅ Demo Video: https://www.youtube.com/watch?v=n36M9ho2z9o

System Overview

MMEdge introduces three core modules to accelerate on-device multimodal inference: figure

  • Pipelined Sensing and Encoding: Decomposes sensing and inference into fine-grained units (e.g., frames or chunks), allowing parallel data collection and feature encoding to reduce idle time.

  • Adaptive Multimodal Configuration: Dynamically selects the optimal sensing and model configurations under latency constraints using a lightweight accuracy predictor and pre-profiled latency table.

  • Cross-Modal Speculative Skipping: Enables early inference termination by leveraging faster modalities’ features, skipping redundant processing of slower modalities when confidence is sufficient.

Quick Start

Offline Stage

1. Installation

cd Offline
pip install requirements.txt

2. Dataset preparation

  • We take the audio-visual speech recognition task as an example in this repository. Please download the Lip Reading in the Wild (LRW) dataset here.
  • Move the dataset to ./Offline/data and ./Online/data.

3. Train multimodal models

  1. Train video models
    bash scripts/train_video.sh
  2. Train audio models
    bash scripts/train_audio.sh
  3. Train fusion models
    bash scripts/train_fusion.sh

4. Train accuracy predictor

  1. Generate accuracy table
    cd Offline
    python make_accuracy_table.py
  2. Train accuracy predictor
    python train_accuracy_predictor.py

4. Train cross-modal speculative skipping model

python train_gating.py

Online Stage

1. Installation

cd Online
pip install requirements.txt

Download checkpoints and dataset to the device, saved at ./Online/checkpoints and ./Online/data respectively.

2. Run the data collection module to simulate real-time streaming sensor data during inference.

python data_collection_simulation.py

3. Run end-to-end inference

python main.py

Citation

Please consider to cite our paper if you use the code or data in your research project.

@article{huang2025mmedge,
  title={MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding},
  author={Huang, Runxi and Yu, Mingxuan and Tsoi, Mingyu and Ouyang, Xiaomin},
  journal={arXiv preprint arXiv:2510.25327},
  year={2025}
}

About

[SenSys 2026] "MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding" by Runxi Huang, Mingxuan Yu, Mingyu Tsoi and Xiaomin Ouyang

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors