Fix TypicalLogitsWarper tensor OOB indexing edge case by njhill · Pull Request #26579 · huggingface/transformers

njhill · 2023-10-04T00:16:44Z

This can be triggered fairly quickly with low precision e.g. bfloat16 and typical_p=0.99.

Rocketknight1 · 2023-10-05T14:54:29Z

I'm going to be honest - I don't fully understand the code here!

The array last_ind is created as:
last_ind = (cumulative_probs < self.mass).sum(dim=1)

This is the sum of a boolean array, which should be strictly nonnegative, because boolean arrays only contain 0 and 1 values. Therefore, I don't understand why the original line last_ind[last_ind < 0] = 0 or the replacement using torch.clamp_ are necessary - I don't see how you'd get negative values without an integer overflow, and if we're getting an integer overflow we should be using bigger integer dtypes, not fixing it with value clamping.

Do you know why this is necessary in the first place?

HuggingFaceDocBuilderDev · 2023-10-05T14:55:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

njhill · 2023-10-06T18:38:56Z

@Rocketknight1 good point, I hadn't considered whether the existing check was valid/redundant, I guess it's always been there.

I guess it would make more sense if the prior line was last_ind = (cumulative_probs < self.mass).sum(dim=1) - 1, perhaps that was originally intended but left out. I'll update this PR accordingly.

njhill · 2023-10-06T18:44:45Z

I guess this will actually mean a subtle change in behaviour, but I'm fairly sure it's what was originally intended. Not sure whether this is ok though w.r.t. transformers policies around this kind of thing...

Rocketknight1 · 2023-10-09T12:17:38Z

cc @gante here - I think you might know better than me what the code is doing!

gante

Thank you for the PR @njhill!

This makes sense -- the last valid index of a sorted vector with N valid values is N-1

Since this is a bug fix, we don't need a deprecation cycle :)

gante · 2023-10-17T13:50:17Z

src/transformers/generation/logits_process.py

Suggested change

last_ind.clamp_(0)

last_ind.clamp_(min=0)

nit: let's make it more clear that it is a minimum clamp

I still don't understand how negative values get in there, though!

@Rocketknight1 After the change, the (cumulative_probs < self.mass).sum(dim=1) can be 0, which added to -1 turns into a negative value. The cumsum vector at index N already contains the value of the tensor at index N, so if the first member of cumulative_probs is larger than self.mass a negative value will come out of here :D

Before the change, it seemed to be a redundant op, yes 😂

That makes sense, lol. Just making sure I wasn't missing something incredibly obvious!

This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.

njhill · 2023-10-18T15:14:22Z

Thanks @gante @Rocketknight1, I've now rebased and added a commit with the explicit min arg suggestion.

gante · 2023-10-25T10:36:35Z

Thank you for the fix @njhill 💪

) * Fix TypicalLogitsWarper tensor OOB indexing edge case This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99. * Shift threshold index by one * Use explicit named arg for clamp min

A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.

A recent PR #26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.

) * Fix TypicalLogitsWarper tensor OOB indexing edge case This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99. * Shift threshold index by one * Use explicit named arg for clamp min

A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.

LysandreJik requested a review from Rocketknight1 October 4, 2023 09:54

gante approved these changes Oct 17, 2023

View reviewed changes

njhill added 3 commits October 18, 2023 08:09

Fix TypicalLogitsWarper tensor OOB indexing edge case

dac0ca1

This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.

Shift threshold index by one

1b68dd3

Use explicit named arg for clamp min

0094d0e

njhill force-pushed the typical_p_oob_fix branch from 6ad5ee6 to 0094d0e Compare October 18, 2023 15:12

gante merged commit 0baa924 into huggingface:main Oct 25, 2023

njhill deleted the typical_p_oob_fix branch October 25, 2023 22:18

njhill mentioned this pull request Oct 31, 2023

fix: Fix typical_p behaviour broken in recent change #27165

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TypicalLogitsWarper tensor OOB indexing edge case#26579

Fix TypicalLogitsWarper tensor OOB indexing edge case#26579
gante merged 3 commits intohuggingface:mainfrom
njhill:typical_p_oob_fix

njhill commented Oct 4, 2023

Uh oh!

Rocketknight1 commented Oct 5, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Oct 5, 2023

Uh oh!

njhill commented Oct 6, 2023

Uh oh!

njhill commented Oct 6, 2023

Uh oh!

Rocketknight1 commented Oct 9, 2023

Uh oh!

gante left a comment •

edited

Loading

Uh oh!

gante Oct 17, 2023

Uh oh!

Rocketknight1 Oct 17, 2023

Uh oh!

gante Oct 18, 2023

Uh oh!

Rocketknight1 Oct 18, 2023

Uh oh!

njhill commented Oct 18, 2023

Uh oh!

gante commented Oct 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

njhill commented Oct 4, 2023

Uh oh!

Rocketknight1 commented Oct 5, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Oct 5, 2023

Uh oh!

njhill commented Oct 6, 2023

Uh oh!

njhill commented Oct 6, 2023

Uh oh!

Rocketknight1 commented Oct 9, 2023

Uh oh!

gante left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

gante Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

njhill commented Oct 18, 2023

Uh oh!

gante commented Oct 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gante left a comment •

edited

Loading