CI fails with dev dependencies: https://github.com/huggingface/trl/actions/runs/18493152127/job/52691262212
TypeError: 'NoneType' object is not subscriptable
FAILED tests/test_modeling_geometric_mixture_wrapper.py::TestGeometricMixtureWrapper::test_prepare_inputs_for_generation - TypeError: 'NoneType' object is not subscriptable
Stacktrace:
> inputs = self.wrapper.prepare_inputs_for_generation(input_ids, attention_mask=attention_mask, use_cache=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tests/test_modeling_geometric_mixture_wrapper.py:65:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
trl/models/modeling_base.py:717: in prepare_inputs_for_generation
model_inputs = self.model.prepare_inputs_for_generation(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/generation/utils.py:613: in prepare_inputs_for_generation
inputs_embeds, input_ids = self._cache_dependant_input_preparation(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151665, 8)
(layers): ModuleList(
(0-1...-06)
(rotary_emb): Qwen2RotaryEmbedding()
)
(lm_head): Linear(in_features=8, out_features=151665, bias=False)
)
input_ids = tensor([[1, 2, 3, 4, 5]], device='cuda:0'), inputs_embeds = None
cache_position = None
def _cache_dependant_input_preparation(
self,
input_ids: torch.LongTensor,
inputs_embeds: Optional[torch.FloatTensor],
cache_position: Optional[torch.LongTensor],
) -> tuple[torch.FloatTensor, torch.LongTensor]:
"""
Generic cache-dependent input preparation
The code is put in a separate function to allow granular unit testing
as it needs a different implementation to be exportable.
If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
- Exception 1: when passing input_embeds, input_ids may be missing entries
- Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
- Exception 3: with synced GPUs cache_position may go out of bounds, but we only want dummy token in that case.
- Exception 4: If input_embeds are passed then slice it through `cache_position`, to keep only the unprocessed tokens and
generate the first token for each sequence. Later use the generated Input ids for continuation.
The current implementation does not rely on ``self`` and could be
a class method. It is left as a standard method to be easily rewritten.
"""
if is_torchdynamo_exporting():
return self._cache_dependant_input_preparation_exporting(input_ids, inputs_embeds, cache_position)
if inputs_embeds is not None and input_ids.shape[1] == 0: # Exception 4
inputs_embeds = inputs_embeds[:, -cache_position.shape[0] :]
elif (
inputs_embeds is not None # Exception 1
> or (cache_position[-1] >= input_ids.shape[1]) # Exception 3
^^^^^^^^^^^^^^^^^^
):
E TypeError: 'NoneType' object is not subscriptable
.venv/lib/python3.12/site-packages/transformers/generation/utils.py:509: TypeError
CI fails with dev dependencies: https://github.com/huggingface/trl/actions/runs/18493152127/job/52691262212
Stacktrace: