Fix panic in DecodeStream::step due to incorrect index usage#1699
Fix panic in DecodeStream::step due to incorrect index usage#1699Narsil merged 3 commits intohuggingface:mainfrom
Conversation
I'm very confused by my own implementation re-reading it. Your fix is definitely correct and we can remove than internal pointer. What's weird is that I remember having only that in my first iteration of this. I have no idea why I re-added half-brokenly that read_index. Thanks a lot for the fix. Note: I will rewrite your unit test. They are good but I find them not really linearly readable and also panicking within tests is perfectly fine (it's a failure). |
When calling
DecodeStream::stepmultiple times, it eventually panics with attempt to subtract with overflow in the following lines of code:tokenizers/tokenizers/src/tokenizer/mod.rs
Lines 1108 to 1109 in 24d29f4
The panic can be easily reproduced, and I have added a test case to demonstrate the issue.
Upon inspecting the code, I found that the shrinking of the token buffer references
read_indexinstead ofprefix_index. This PR corrects the issue by using the correct index.However, this change makes
read_indexunused, so I am not entirely certain if it aligns with the intended logic of the original implementation. Please let me know if further adjustments or clarifications are needed, or if there is additional context regarding the intended use ofread_index.