Add descriptive docstring to WhisperTimeStampLogitsProcessor#25642
Add descriptive docstring to WhisperTimeStampLogitsProcessor#25642ArthurZucker merged 18 commits intohuggingface:mainfrom jprivera44:main
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
Hey! Thanks for opening a PR! |
gante
left a comment
There was a problem hiding this comment.
Hi @jprivera44 👋 Thank you for opening the PR :)
I have an important request here -- let's aim at having a short example with short outputs. ~20 lines of code is the sweet spot. Showing an example where we control returning the timestamps would be enough :)
|
|
||
|
|
There was a problem hiding this comment.
Formatting: we only leave at most one line between example lines
Adding in suggested fix to the LogitProcessor description. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Removing tip per suggestion. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Removing redundant code per suggestion. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
|
Hi @gante thank you for your feedback :) I've made the suggested changes. However, I'm a little bit confused on the returning of the timestamps? The WhisperTimeStampLogitsProcessor only returns the scores, which then get transcribed into words. Would outputting the token timestamps be sufficient? Since these are directly accessible within modeling_whisper.py. @ArthurZucker , I ran pytest --doctest-modules and all tests within the WhisperLogits passed. I noticed the recommendation for --doctest – does the transformers repo use a custom configuration that requires this argument? |
|
@jprivera44 Since the last reviews, we've added this file to the list of files to be doctested in our PR CI. If you rebase with Regarding the examples themselves... something is odd. We should be getting a timestamp at the start of the transcription, right @ArthurZucker? |
ArthurZucker
left a comment
There was a problem hiding this comment.
The timestamps are ignored by the skip_special_tokens = True, let's try to show them
| >>> #This allows the user to select a specific token to terminate the sequence on, in this case it's the word poem(21247) | ||
| >>> model.generation_config.eos_token_id = 21247 | ||
| >>> generated_ids = model.generate(inputs=input_features,return_timestamps=True) | ||
| >>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
There was a problem hiding this comment.
| >>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| >>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] |
Let's have one example with timestamps no?
There was a problem hiding this comment.
@jprivera44 let's a) limit the number of examples (one with and another without timestamps is enough); b) use Arthur's suggestion here :)
|
@jprivera44 double-checking: you are still planning on iterating on this PR, correct? :) |
|
Hello @gante, apologies I was having some issues getting the time stamps to show up, but it's all fixed now. I've added in the updated code we discussed with two examples, one with timestamps and one without. This includes running pytest as @ArthurZucker mentioned. Please let me know what else is needed, and thank you again for the help! |
|
Hello @gante hope you're well, is there anything else needed on this from my end? I just want to make sure I've completed the requirements. |
There was a problem hiding this comment.
Hey, @gante is OOO for a while; I'll take this over
| [`LogitsProcessor`] Whisper specific Processor. This processor is specifically designed for the Whisper Automatic | ||
| Speech Recognition model. It facilitates the manipulation of log probabilities for a predefined list of tokens | ||
| during text generation. By using this processor, you can effectively "force" certain tokens to be selected at | ||
| specific positions in the generated sequence. When tokens are passed to this processor, their log probabilities are | ||
| set to `inf` (infinity), ensuring that they are chosen at their corresponding indices. |
There was a problem hiding this comment.
This is more general but feels AI generated and is partly wrong. If we add docstring, let's explain in details what the processor does!
There was a problem hiding this comment.
Ah I see what you mean, I'll go ahead and take a look to add in more details and fix the mistakes.
There was a problem hiding this comment.
@ArthurZucker I've updated the docstring, please let me know what you think? I appreciate the help on this.
|
|
||
|
|
||
| >>> #No timestamps & change EOS: | ||
| >>> #This allows the user to select a specific token to terminate the sequence on, in this case it's the word poem(21247) |
There was a problem hiding this comment.
looks like it was not stopped on poem no?
There was a problem hiding this comment.
Yes, the phrase stops on token in an order to showcase how changing the EOS_token parameter changes the output of the transcription. If preferred, I can only showcase the generating of timestamps vs. non-timestamps.
There was a problem hiding this comment.
My point is that the doc says it stops on poem.
There was a problem hiding this comment.
Oh I see what you mean, and I just went through and made the fix.
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey! Again, let's make this helpful for users 😉
| [`LogitsProcessor`] that modifies the logits for the generation of timestamps in the transcription. When the input | ||
| tokens are at a specific threshold, the processor sets the scores to negative infinity and returned. The timestamps | ||
| are processed to be in pairs except right before the end-of-sequence token. If the total probability of all | ||
| timestamp tokens is greater than any individual non-timestamp token, the processor sets those non-timestamp logits |
There was a problem hiding this comment.
| [`LogitsProcessor`] that modifies the logits for the generation of timestamps in the transcription. When the input | |
| tokens are at a specific threshold, the processor sets the scores to negative infinity and returned. The timestamps | |
| are processed to be in pairs except right before the end-of-sequence token. If the total probability of all | |
| timestamp tokens is greater than any individual non-timestamp token, the processor sets those non-timestamp logits | |
| [`LogitsProcessor`] that modifies the logits for the generation of timestamps in the transcription. When the input | |
| tokens are at a specific threshold, the processor sets the scores to negative infinity. The processor makes sure that timestamp tokens appear in pair, by doing .... This is done because ...... | |
| It also ensure that when the predicted probability of sampling any of the timestamp token is greater than ........ This is done to ensure that..... |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Hi sorry this got to the stale status, @ArthurZucker got in an update on this! Thank you again. |
ArthurZucker
left a comment
There was a problem hiding this comment.
A lot better thanks 😉
|
No thank you guys! Apologies this took a while! |
…face#25642) * adding in logit examples for Whisper processor * adding in updated logits processor for Whisper * adding in cleaned version of logits processor for Whisper * adding docstrings for whisper processor * making sure the formatting is correct * adding logits after doc builder * Update src/transformers/generation/logits_process.py Adding in suggested fix to the LogitProcessor description. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Removing tip per suggestion. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Removing redundant code per suggestion. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * adding in revised version * adding in version with timestamp examples * Update src/transformers/generation/logits_process.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * enhanced paragraph on behavior of processor * fixing doc quality issue * removing the word poem from example * adding in updated docstring * adding in new version of file after doc-builder --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
…face#25642) * adding in logit examples for Whisper processor * adding in updated logits processor for Whisper * adding in cleaned version of logits processor for Whisper * adding docstrings for whisper processor * making sure the formatting is correct * adding logits after doc builder * Update src/transformers/generation/logits_process.py Adding in suggested fix to the LogitProcessor description. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Removing tip per suggestion. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Removing redundant code per suggestion. Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * adding in revised version * adding in version with timestamp examples * Update src/transformers/generation/logits_process.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * enhanced paragraph on behavior of processor * fixing doc quality issue * removing the word poem from example * adding in updated docstring * adding in new version of file after doc-builder --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
This PR adds in docstrings that explains the usage of the arguments that can be passed in to the Whisper logits processor.
Fixes #24783
Before submitting
Who can review?
@gante