Skip to content

Add descriptive docstring to WhisperTimeStampLogitsProcessor#25642

Merged
ArthurZucker merged 18 commits intohuggingface:mainfrom
jprivera44:main
Oct 24, 2023
Merged

Add descriptive docstring to WhisperTimeStampLogitsProcessor#25642
ArthurZucker merged 18 commits intohuggingface:mainfrom
jprivera44:main

Conversation

@jprivera44
Copy link
Contributor

What does this PR do?

This PR adds in docstrings that explains the usage of the arguments that can be passed in to the Whisper logits processor.

Fixes #24783

Before submitting

  • [ x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Who can review?

@gante

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@ArthurZucker
Copy link
Collaborator

Hey! Thanks for opening a PR!
This looks a bit too specific / long so will not be accepting it. Also you should run pytest --doctest src/transformers/generation/logits_process.py / rebase to main to check if this the docstring examples actually work 😉

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jprivera44 👋 Thank you for opening the PR :)

I have an important request here -- let's aim at having a short example with short outputs. ~20 lines of code is the sweet spot. Showing an example where we control returning the timestamps would be enough :)

Comment on lines +1378 to +1379


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting: we only leave at most one line between example lines

jprivera44 and others added 5 commits August 31, 2023 14:03
Adding in suggested fix to the LogitProcessor description.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Removing tip per suggestion.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Removing redundant code per suggestion.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
@jprivera44
Copy link
Contributor Author

Hi @gante thank you for your feedback :) I've made the suggested changes. However, I'm a little bit confused on the returning of the timestamps? The WhisperTimeStampLogitsProcessor only returns the scores, which then get transcribed into words. Would outputting the token timestamps be sufficient? Since these are directly accessible within modeling_whisper.py.

@ArthurZucker , I ran pytest --doctest-modules and all tests within the WhisperLogits passed. I noticed the recommendation for --doctest – does the transformers repo use a custom configuration that requires this argument?

@gante
Copy link
Contributor

gante commented Sep 5, 2023

@jprivera44 Since the last reviews, we've added this file to the list of files to be doctested in our PR CI. If you rebase with main, your examples will be tested :)

Regarding the examples themselves... something is odd. We should be getting a timestamp at the start of the transcription, right @ArthurZucker?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamps are ignored by the skip_special_tokens = True, let's try to show them

>>> #This allows the user to select a specific token to terminate the sequence on, in this case it's the word poem(21247)
>>> model.generation_config.eos_token_id = 21247
>>> generated_ids = model.generate(inputs=input_features,return_timestamps=True)
>>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
>>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

Let's have one example with timestamps no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jprivera44 let's a) limit the number of examples (one with and another without timestamps is enough); b) use Arthur's suggestion here :)

@gante
Copy link
Contributor

gante commented Sep 12, 2023

@jprivera44 double-checking: you are still planning on iterating on this PR, correct? :)

@jprivera44
Copy link
Contributor Author

Hello @gante, apologies I was having some issues getting the time stamps to show up, but it's all fixed now. I've added in the updated code we discussed with two examples, one with timestamps and one without. This includes running pytest as @ArthurZucker mentioned. Please let me know what else is needed, and thank you again for the help!

@jprivera44
Copy link
Contributor Author

Hello @gante hope you're well, is there anything else needed on this from my end? I just want to make sure I've completed the requirements.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @gante is OOO for a while; I'll take this over

Comment on lines +1333 to +1337
[`LogitsProcessor`] Whisper specific Processor. This processor is specifically designed for the Whisper Automatic
Speech Recognition model. It facilitates the manipulation of log probabilities for a predefined list of tokens
during text generation. By using this processor, you can effectively "force" certain tokens to be selected at
specific positions in the generated sequence. When tokens are passed to this processor, their log probabilities are
set to `inf` (infinity), ensuring that they are chosen at their corresponding indices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more general but feels AI generated and is partly wrong. If we add docstring, let's explain in details what the processor does!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean, I'll go ahead and take a look to add in more details and fix the mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker I've updated the docstring, please let me know what you think? I appreciate the help on this.



>>> #No timestamps & change EOS:
>>> #This allows the user to select a specific token to terminate the sequence on, in this case it's the word poem(21247)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it was not stopped on poem no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the phrase stops on token in an order to showcase how changing the EOS_token parameter changes the output of the transcription. If preferred, I can only showcase the generating of timestamps vs. non-timestamps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the doc says it stops on poem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean, and I just went through and made the fix.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Again, let's make this helpful for users 😉

Comment on lines +1334 to +1337
[`LogitsProcessor`] that modifies the logits for the generation of timestamps in the transcription. When the input
tokens are at a specific threshold, the processor sets the scores to negative infinity and returned. The timestamps
are processed to be in pairs except right before the end-of-sequence token. If the total probability of all
timestamp tokens is greater than any individual non-timestamp token, the processor sets those non-timestamp logits
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[`LogitsProcessor`] that modifies the logits for the generation of timestamps in the transcription. When the input
tokens are at a specific threshold, the processor sets the scores to negative infinity and returned. The timestamps
are processed to be in pairs except right before the end-of-sequence token. If the total probability of all
timestamp tokens is greater than any individual non-timestamp token, the processor sets those non-timestamp logits
[`LogitsProcessor`] that modifies the logits for the generation of timestamps in the transcription. When the input
tokens are at a specific threshold, the processor sets the scores to negative infinity. The processor makes sure that timestamp tokens appear in pair, by doing .... This is done because ......
It also ensure that when the predicted probability of sampling any of the timestamp token is greater than ........ This is done to ensure that.....

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@jprivera44
Copy link
Contributor Author

Hi sorry this got to the stale status, @ArthurZucker got in an update on this! Thank you again.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot better thanks 😉

@ArthurZucker ArthurZucker merged commit 576e282 into huggingface:main Oct 24, 2023
@jprivera44
Copy link
Contributor Author

No thank you guys! Apologies this took a while!

i4never pushed a commit to i4never/transformers that referenced this pull request Oct 25, 2023
…face#25642)

* adding in logit examples for Whisper processor

* adding in updated logits processor for Whisper

* adding in cleaned version of  logits processor for Whisper

* adding docstrings for whisper processor

* making sure the formatting is correct

* adding logits after doc builder

* Update src/transformers/generation/logits_process.py

Adding in suggested fix to the LogitProcessor description.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Removing tip per suggestion.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Removing redundant code per suggestion.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* adding in revised version

* adding in version with timestamp examples

* Update src/transformers/generation/logits_process.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* enhanced paragraph on behavior of processor

* fixing doc quality issue

* removing the word poem from example

* adding in updated docstring

* adding in new version of file after doc-builder

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
…face#25642)

* adding in logit examples for Whisper processor

* adding in updated logits processor for Whisper

* adding in cleaned version of  logits processor for Whisper

* adding docstrings for whisper processor

* making sure the formatting is correct

* adding logits after doc builder

* Update src/transformers/generation/logits_process.py

Adding in suggested fix to the LogitProcessor description.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Removing tip per suggestion.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Removing redundant code per suggestion.

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* adding in revised version

* adding in version with timestamp examples

* Update src/transformers/generation/logits_process.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* enhanced paragraph on behavior of processor

* fixing doc quality issue

* removing the word poem from example

* adding in updated docstring

* adding in new version of file after doc-builder

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate: have an example on each LogitsProcessor class docstring

4 participants