[generate] add faster stop_strings stopping criteria#40520
Conversation
| self.assertEqual(len(stopping_criteria), 1) | ||
|
|
||
| def test_stop_string_criteria(self): | ||
| @parameterized.expand( |
There was a problem hiding this comment.
(reuses the extensive tests for StopStringCriteria on StopStringTextMatchCriteria, ensuring 1:1 compatibility)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thank you for your PR! |
|
@MaxBourdon thank you for the feedback, corrected 😉 (there was another edge case failing: the last token fully contains the stop string, but doesn't start with stop string characters; added a test) |
| last_two_tokens_text = self.tokenizer.decode(input_ids[batch_idx, -2:]) | ||
| last_tokens_with_prefix_text = self.tokenizer.decode(input_ids[batch_idx, -1:]) |
There was a problem hiding this comment.
pretty sure we shuld be using the decode stream here!
There was a problem hiding this comment.
What is that? 👀
I see some related docs (https://huggingface.co/docs/tokenizers/v0.20.3/en/api/decoders#tokenizers.decoders.DecodeStream), but they lead nowhere
There was a problem hiding this comment.
https://huggingface.co/docs/tokenizers/main/en/api/decoders#tokenizers.decoders.DecodeStream
What I am saying is that decoding stuff like a brute here is risking not following the "stream" (acting on tokens and not strings).
What does this PR do?
Adds
StopStringTextMatchCriteria, a faster alternative toStopStringCriteria. UnlikeStopStringCriteria,StopStringTextMatchCriteriacan't be compiled.Some additional context:
StopStringCriteria, we were looking forward having end-to-endgeneratecompilation, so it made sense to focus on compilable options;StopStringCriteriacan be really slow in some contexts. More specifically, at initialization time (see benchamrks below);StopStringTextMatchCriteriais faster, so it's the new default.StopStringCriteriais kept fortorch.compileusers.Thank you @MaxBourdon for surfacing the problem
Benchmarks
TL;DR
StopStringCriteriais very slow to initialize on newstop_stringsinputs, >2s on my machine. This is cached, so successive calls with the samestop_stringsare not as bad. However, it's particularly troublesome when trying small models, as this initialization may take much more than the generation time. Excluding init time, the newStopStringTextMatchCriteriais also slightly faster.Benchmark script
Benchmark results on my machine: