SocioReasoner

Official implementation of Urban Socio-Semantic Segmentation with Vision-Language Reasoning.

Abstract: This paper introduces the Urban Socio-Semantic Segmentation dataset named SocioSeg, a new resource comprising satellite imagery, digital maps, and pixel-level labels of social semantic entities organized in a hierarchical structure. Additionally, we propose a novel vision-language reasoning framework called SocioReasoner that simulates the human process of identifying and annotating social semantic entities via cross-modal recognition and multi-stage reasoning. We employ reinforcement learning to optimize this non-differentiable process and elicit the reasoning capabilities of the vision-language model. Experiments demonstrate our approach's significant gains over state-of-the-art models and strong zero-shot generalization.

Code: Available in this repository.
Dataset and Model: Undergoing review (Link coming soon).

1. Installation

OS: Linux distribution support for CUDA
Hardware: At least 4x NVIDIA H20 (or A100 80GB) GPUs
Framework: This repository is based on ROLL, following the below installation instructions.

conda create -n socioseg python=3.10 -y
conda activate socioseg
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
pip install -r requirements.txt
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip install 'transformer-engine[pytorch]==2.2.0' deepspeed==0.16.4 vllm==0.8.4 --no-build-isolation
pip install -e .

2. Training

Please download the dataset and change the actor_train.data_args.file_name and validation.data_args.file_name in examples/train/rlvr_megatron.yaml

sh examples/train/train.sh

The trained model will be saved in ./output/train/checkpoint/

3. Evaluation and Visualization

Please download the dataset & pretrained model(or the model trained by yourself), and change the actor_train.data_args.file_name and pretrain in examples/infer/rlvr_megatron.yaml

sh examples/infer/infer.sh

The evaluation and visualization results will be saved in ./output/infer/result/

Citation

@article{wang2026socioreasoner,
  title={Urban Socio-Semantic Segmentation with Vision-Language Reasoning}, 
  author={Yu Wang and Yi Wang and Rui Dai and Yujie Wang and Kaikui Liu and Xiangxiang Chu and Yansheng Li},
  journal={arXiv preprint arXiv:2601.10477},
  year={2026}
}

Acknowledgements

We thank the authors of ROLL and SegZero.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
examples		examples
mcore_adapter		mcore_adapter
roll		roll
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SocioReasoner

1. Installation

2. Training

3. Evaluation and Visualization

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

AMAP-ML/SocioReasoner

Folders and files

Latest commit

History

Repository files navigation

SocioReasoner

1. Installation

2. Training

3. Evaluation and Visualization

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages