Track each row separately for stopping criteria by zucchini-nlp · Pull Request #29116 · huggingface/transformers

zucchini-nlp · 2024-02-19T18:28:43Z

What does this PR do?

Addresses the question raised in #28932. I accidentally messed up the first PR (#29056) that was approved, so this is the second version with the same changes.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante , can you please look again into this. It's the same thing you approved earlier.

HuggingFaceDocBuilderDev · 2024-02-19T18:51:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

Looks good -- let's just check why the test is failing :)

zucchini-nlp · 2024-02-19T19:01:20Z

Yep, should be unrelated to stopping criteria but I will check

zucchini-nlp · 2024-02-20T11:05:59Z

@gante , I found what was the reason for tests to fail. It was flaky, so I just did an empty commit here. Now it's all green :)

Also, I tried to fix the flaky test, the PR for it is here . I would love to get you review on that

Rocketknight1 · 2024-02-20T12:08:13Z

I support this - I'll make sure not to merge #28932 until this is in. After this is merged, I'll rebase and then modify my PR to also return a per-sample vector.

gante

Perfect!

amyeroberts

Thanks for adding this - looks great!

Two things to do before merging:

Running all the generation tests (incl slow) to confirm that this doesn't break anything
Adding some tests. At the moment, we're able to adapt the current tests with just all but we don't check that we 1) correctly only have some elements set to stop if only some sequences reach a criterion and 2) the generation behaviour when some sequences are stopped and others aren't.

zucchini-nlp · 2024-02-21T20:33:34Z

@amyeroberts

I ran all the slow tests, everything is fine.
In this PR the stopping criterias will only give all true or all false. Since the main goal was to adapt to Terminator strings for generate() #28932 which can stop some sequences and not others, I believe it's better to add the test there.

gante · 2024-02-26T16:06:12Z

Merging as the additional test request [what happens when some sequences are finished and others aren't] is not yet possible [the existing criterias are time or sequence base, i.e. equal for all sequences].

#28932 will add a stopping criteria that enable us to do [and test] that. @Rocketknight1 plz don't forget to include a test for this case :D

Rocketknight1 · 2024-02-26T16:21:42Z

Got it!

amyeroberts · 2024-02-26T16:28:34Z

@zucchini-nlp @gante There should still be a follow up PR for tests. I understand that the current tests will return only all true or all false, but new tests should be added when this is not the case

In fact, I'm not sure how we can know this is behaving correctly if we don't have this?

Rocketknight1 · 2024-02-26T16:32:49Z

@amyeroberts agreed - I'll make sure something like that is in the stop-strings PR, and let people know if it fails

gante · 2024-02-26T16:37:23Z

@amyeroberts I hear you.

But isn't that overzealous, testing a property that we can't yet trigger? We would have to build a mock class with the sole purpose of testing this property now, which seems worse than delegating testing the property to the PR that introduces its usage 🤔

amyeroberts · 2024-02-26T16:42:00Z

@gante Tests should go with the feature they're adding. Otherwise, we end up having to have a bunch of follow-up PRs to remedy something that should have been caught in the first place. If it's not easy to test, then that indicates either 1) this feature cannot properly be added at this time or 2) something else needs to be reworked to enable testing as it shouldn't be hard to do.

gante · 2024-02-26T17:02:58Z

@amyeroberts I'll try to be more conscious of full feature tests in future merges 🤗

Nevertheless, I believe the approach you mention has a different set of drawbacks: it forces us to write larger PRs, as changes are less atomic. On features that touch many models, like generate features, that often means nasty, time-consuming merge conflicts :(

amyeroberts · 2024-02-26T18:22:08Z

@gante I don't believe that's the case. For one, I don't normally consider the diff added in the tests as part of the overall diff count. Otherwise it ends in perverse situations when someone might write fewer tests then necessary just to keep under ~400 lines.

I don't understand why adding tests for a new feature, or the new feature itself should affect existing generate models or their tests? If they do, then we should work on separating out the generation logic such that it can be tested independently of this. Otherwise, development will start to slow as we can't quickly add features we can trust to work reliably.

gante · 2024-02-26T19:34:40Z

@amyeroberts generate sadly is a monolith with little flexibility at the moment, and it's easy to have PRs working on overlapping lines of code. For example, working on the static cache and some input preparation bug fixes will likely result in a conflict. Due to the volume of PRs on the generate side, it happens quite frequently.

It is indeed a sign that we should break down generate into a more flexible structure; you've possibly seen me writing about it in other places 🤗 I'd love to work soon! Meanwhile, I'm trying to reduce the scope of each PR within reasonable terms: smaller PRs -> less time open -> fewer merge conflicts.

(It also helps that I have a lighter view in terms of testing requirements, as I don't face the CI pain you surely face as a core maintainer 😉 As I've written above, I'll try to improve my strictness, as your judgment is probably fairer on testing questions.)

clean commit

e2fe51e

gante reviewed Feb 19, 2024

View reviewed changes

empty commit

6642c33

gante approved these changes Feb 21, 2024

View reviewed changes

gante requested a review from amyeroberts February 21, 2024 15:21

amyeroberts reviewed Feb 21, 2024

View reviewed changes

gante merged commit 8f2f0f0 into huggingface:main Feb 26, 2024

Rocketknight1 mentioned this pull request Feb 26, 2024

Terminator strings for generate() #28932

Merged

younesbelkada mentioned this pull request Feb 27, 2024

FIX [Generation] Fix some issues when running the MaxLength criteria on CPU #29317

Merged

zucchini-nlp mentioned this pull request Jun 24, 2024

Something wrong for StoppingCriteria #31548

Closed

4 tasks

Conversation

zucchini-nlp commented Feb 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 19, 2024

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Feb 19, 2024

Uh oh!

zucchini-nlp commented Feb 20, 2024

Uh oh!

Rocketknight1 commented Feb 20, 2024

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Feb 21, 2024

Uh oh!

gante commented Feb 26, 2024

Uh oh!

Rocketknight1 commented Feb 26, 2024

Uh oh!

amyeroberts commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Feb 26, 2024

Uh oh!

gante commented Feb 26, 2024

Uh oh!

amyeroberts commented Feb 26, 2024

Uh oh!

gante commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts commented Feb 26, 2024

Uh oh!

gante commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zucchini-nlp commented Feb 19, 2024 •

edited

Loading

amyeroberts commented Feb 26, 2024 •

edited

Loading

gante commented Feb 26, 2024 •

edited

Loading

gante commented Feb 26, 2024 •

edited

Loading