Skip to content

[feat] support Qwen3.5#172

Merged
JinhuiYE merged 2 commits intostarVLA:starVLAfrom
LiamLian0727:starVLA
Mar 3, 2026
Merged

[feat] support Qwen3.5#172
JinhuiYE merged 2 commits intostarVLA:starVLAfrom
LiamLian0727:starVLA

Conversation

@LiamLian0727
Copy link
Contributor

This PR includes two updates:

  1. Integrated Qwen3.5 into the current pipeline.
    Just now, the official Qwen team released the weights for Qwen3.5 0.8B, 2B, 4B, and 9B. So, it is time to integrate Qwen3.5 into starVLA. I have tested Qwen3.5-4B with QwenGR00T on SimplerEnv (global batch=128). The training loss behavior is normal during the first 500 steps.
image
  1. Improved LangForce text descriptions
    Added/updated several textual descriptions for LangForce to make prompts and related outputs clearer.

@JinhuiYE
Copy link
Contributor

JinhuiYE commented Mar 2, 2026

Thank you so much for this contribution! Integrating Qwen3.5 into the pipeline is indeed timely and valuable. It's great to hear that the training loss looks normal in the first 500 steps.

Could you possibly run the training for a few more steps (e.g., 5k or 10k) and share some initial results (like loss curves or preliminary evaluation on SimplerEnv) in this PR? That would be super helpful for the community to get a quick sense of how Qwen3.5 performs with our framework.

Thanks again for your work!

@LiamLian0727
Copy link
Contributor Author

Of course, I'd be happy to do so. I'll share some of the latest results once tomorrow's experiments are complete.

@JinhuiYE JinhuiYE merged commit d1f7196 into starVLA:starVLA Mar 3, 2026
@LiamLian0727
Copy link
Contributor Author

LiamLian0727 commented Mar 5, 2026

I trained QwenGR00T-Qwen3.5-4B on the standard Bridge and Fractal datasets.

Training Configuration:

  • Hardware: 8 × H100 GPUs
  • Batch Size: 16 per GPU (global = 128)
  • Steps: 100k
  • Final Loss: Converged to approximately 0.05
image

SimplerEnv Evaluation Results:
Below are the specific success rates obtained in SimplerEnv. I hope these results are helpful to the community:

Task Success Rate
StackGreenCubeOnYellowCubeBakedTexInScene 0.375
PutCarrotOnPlateInScene 0.625
PutSpoonOnTableClothInScene 0.75
PutEggplantInBasketScene 0.875

Note: Each task was evaluated over the standard 120 episodes. Due to the high stochasticity inherent in the SimplerEnv environment, some random fluctuation in these results may occur.

💡 Performance Tips:
When using Qwen3.5, you can achieve significant acceleration by installing flash-linear-attention and causal_conv1d. Below is the software environment configuration I used:

torch==2.6.0+cu124
triton==3.2.0
flash-attn==2.7.4.post1
flash-linear-attention==0.3.2
causal_conv1d==1.5.0.post8
transformers==5.3.0

fderxs pushed a commit to fderxs/starVLA that referenced this pull request Mar 10, 2026
* feat(vlm): add Qwen3.5 model integration

* docs(LangForce): updated several textual descriptions for LangForce
hjxwhy pushed a commit to hjxwhy/starVLA that referenced this pull request Mar 19, 2026
* feat(vlm): add Qwen3.5 model integration

* docs(LangForce): updated several textual descriptions for LangForce
@HaronW
Copy link
Contributor

HaronW commented Mar 25, 2026

@LiamLian0727 thank you for your work. I am working on add qwen3.5 with fast tokens. This works well with starVLA/model/modules/vlm/tools/add_qwen_special_tokens/add_special_tokens_to_qwen.py. However, transformers AutoProcessor fails to load physical-intelligence/fast. I have tried transformers==5.2.0, 5.3.0, since qwen3.5 requires >=5.2.0. All show this error message:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "../lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 394, in from_pretrained
    return processor_class.from_pretrained(
  File "../lib/python3.10/site-packages/transformers/processing_utils.py", line 1402, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
  File "../lib/python3.10/site-packages/transformers/processing_utils.py", line 1516, in _get_arguments_from_pretrained
    tokenizer = cls._load_tokenizer_from_pretrained(
  File "../lib/python3.10/site-packages/transformers/processing_utils.py", line 1469, in _load_tokenizer_from_pretrained
    tokenizer = auto_processor_class.from_pretrained(
  File "../lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 725, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "../lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained
    return cls._from_pretrained(
  File "../lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1900, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "../lib/python3.10/site-packages/transformers/tokenization_utils_tokenizers.py", line 274, in __init__
    raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece or tiktoken installed to convert a slow tokenizer to a fast one.

I have installed sentencepiece and tiktoken and tried:

from transformers import AutoProcessor
p = AutoProcessor.from_pretrained("../physical-intelligence/fast", trust_remote_code=True,use_fast=False)

I am guessing this might be a bug of transformers. Have you encountered same error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants