[GRPO] Sequence-level TIS + MIS

### Feature request

By default, when vLLM is used for rollouts, we apply [Token-level Truncated Importance Sampling (TIS)](https://huggingface.co/docs/trl/main/en/paper_index#truncated-importance-sampling) to mitigate the training–inference mismatch. I propose we implement **sequence-level importance sampling** as the default behavior, paired with either truncation (TIS) or masking (MIS), instead of the current token-level approach. 

### Motivation

This is motivated by the following studies:

1. [Masked Importance Sampling (MIS)](https://ringtech.notion.site/icepop), introduced by the Ling Team in *IcePop*, takes a more conservative approach by completely masking noisy gradient updates. Empirically, they found MIS to be more stable than TIS.

2. The [Qwen Team’s analysis](https://yingru.notion.site/When-Speed-Kills-Stability-Demystifying-RL-Collapse-from-the-Training-Inference-Mismatch-271211a558b7808d8b12d403fd15edda) of the training–inference mismatch provides deeper theoretical insight. In Section 4.2.1, they show that while token-level importance sampling has lower variance than sequence-level methods, it is a biased and theoretically unsound estimator. They propose instead using sequence-level importance sampling and experimentally compare truncated (TIS) and masked (MIS) variants at both token and sequence levels, finding *Seq-MIS* to be the most stable and effective approach.

3. The strength of sequence-level masking (Seq-MIS) is further supported by recent work in [*Defeating the Training–Inference Mismatch via FP16*](https://arxiv.org/abs/2510.26788) (Sections 4.1–4.2).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO] Sequence-level TIS + MIS #4493

Feature request

Motivation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[GRPO] Sequence-level TIS + MIS #4493

Description

Feature request

Motivation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions