Fix TypicalLogitsWarper tensor OOB indexing edge case#26579
Fix TypicalLogitsWarper tensor OOB indexing edge case#26579gante merged 3 commits intohuggingface:mainfrom
Conversation
|
I'm going to be honest - I don't fully understand the code here! The array This is the sum of a boolean array, which should be strictly nonnegative, because boolean arrays only contain 0 and 1 values. Therefore, I don't understand why the original line Do you know why this is necessary in the first place? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
@Rocketknight1 good point, I hadn't considered whether the existing check was valid/redundant, I guess it's always been there. I guess it would make more sense if the prior line was |
|
I guess this will actually mean a subtle change in behaviour, but I'm fairly sure it's what was originally intended. Not sure whether this is ok though w.r.t. transformers policies around this kind of thing... |
|
cc @gante here - I think you might know better than me what the code is doing! |
There was a problem hiding this comment.
| last_ind.clamp_(0) | |
| last_ind.clamp_(min=0) |
nit: let's make it more clear that it is a minimum clamp
There was a problem hiding this comment.
I still don't understand how negative values get in there, though!
There was a problem hiding this comment.
@Rocketknight1 After the change, the (cumulative_probs < self.mass).sum(dim=1) can be 0, which added to -1 turns into a negative value. The cumsum vector at index N already contains the value of the tensor at index N, so if the first member of cumulative_probs is larger than self.mass a negative value will come out of here :D
Before the change, it seemed to be a redundant op, yes 😂
There was a problem hiding this comment.
That makes sense, lol. Just making sure I wasn't missing something incredibly obvious!
This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.
6ad5ee6 to
0094d0e
Compare
|
Thanks @gante @Rocketknight1, I've now rebased and added a commit with the explicit |
|
Thank you for the fix @njhill 💪 |
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
A recent PR #26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
This can be triggered fairly quickly with low precision e.g.
bfloat16andtypical_p=0.99.@gante