Skip to content

Track each row separately for stopping criteria#29116

Merged
gante merged 2 commits intohuggingface:mainfrom
zucchini-nlp:stopping_crtiteria
Feb 26, 2024
Merged

Track each row separately for stopping criteria#29116
gante merged 2 commits intohuggingface:mainfrom
zucchini-nlp:stopping_crtiteria

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Feb 19, 2024

What does this PR do?

Addresses the question raised in #28932. I accidentally messed up the first PR (#29056) that was approved, so this is the second version with the same changes.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante , can you please look again into this. It's the same thing you approved earlier.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- let's just check why the test is failing :)

@zucchini-nlp
Copy link
Member Author

Yep, should be unrelated to stopping criteria but I will check

@zucchini-nlp
Copy link
Member Author

@gante , I found what was the reason for tests to fail. It was flaky, so I just did an empty commit here. Now it's all green :)

Also, I tried to fix the flaky test, the PR for it is here . I would love to get you review on that

@Rocketknight1
Copy link
Member

I support this - I'll make sure not to merge #28932 until this is in. After this is merged, I'll rebase and then modify my PR to also return a per-sample vector.

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@gante gante requested a review from amyeroberts February 21, 2024 15:21
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this - looks great!

Two things to do before merging:

  • Running all the generation tests (incl slow) to confirm that this doesn't break anything
  • Adding some tests. At the moment, we're able to adapt the current tests with just all but we don't check that we 1) correctly only have some elements set to stop if only some sequences reach a criterion and 2) the generation behaviour when some sequences are stopped and others aren't.

@zucchini-nlp
Copy link
Member Author

@amyeroberts

  1. I ran all the slow tests, everything is fine.
  2. In this PR the stopping criterias will only give all true or all false. Since the main goal was to adapt to Terminator strings for generate() #28932 which can stop some sequences and not others, I believe it's better to add the test there.

@gante
Copy link
Contributor

gante commented Feb 26, 2024

Merging as the additional test request [what happens when some sequences are finished and others aren't] is not yet possible [the existing criterias are time or sequence base, i.e. equal for all sequences].

#28932 will add a stopping criteria that enable us to do [and test] that. @Rocketknight1 plz don't forget to include a test for this case :D

@gante gante merged commit 8f2f0f0 into huggingface:main Feb 26, 2024
@Rocketknight1
Copy link
Member

Got it!

@amyeroberts
Copy link
Contributor

amyeroberts commented Feb 26, 2024

@zucchini-nlp @gante There should still be a follow up PR for tests. I understand that the current tests will return only all true or all false, but new tests should be added when this is not the case

In fact, I'm not sure how we can know this is behaving correctly if we don't have this?

@Rocketknight1
Copy link
Member

@amyeroberts agreed - I'll make sure something like that is in the stop-strings PR, and let people know if it fails

@gante
Copy link
Contributor

gante commented Feb 26, 2024

@amyeroberts I hear you.

But isn't that overzealous, testing a property that we can't yet trigger? We would have to build a mock class with the sole purpose of testing this property now, which seems worse than delegating testing the property to the PR that introduces its usage 🤔

@amyeroberts
Copy link
Contributor

@gante Tests should go with the feature they're adding. Otherwise, we end up having to have a bunch of follow-up PRs to remedy something that should have been caught in the first place. If it's not easy to test, then that indicates either 1) this feature cannot properly be added at this time or 2) something else needs to be reworked to enable testing as it shouldn't be hard to do.

@gante
Copy link
Contributor

gante commented Feb 26, 2024

@amyeroberts I'll try to be more conscious of full feature tests in future merges 🤗

Nevertheless, I believe the approach you mention has a different set of drawbacks: it forces us to write larger PRs, as changes are less atomic. On features that touch many models, like generate features, that often means nasty, time-consuming merge conflicts :(

@amyeroberts
Copy link
Contributor

@gante I don't believe that's the case. For one, I don't normally consider the diff added in the tests as part of the overall diff count. Otherwise it ends in perverse situations when someone might write fewer tests then necessary just to keep under ~400 lines.

I don't understand why adding tests for a new feature, or the new feature itself should affect existing generate models or their tests? If they do, then we should work on separating out the generation logic such that it can be tested independently of this. Otherwise, development will start to slow as we can't quickly add features we can trust to work reliably.

@gante
Copy link
Contributor

gante commented Feb 26, 2024

@amyeroberts generate sadly is a monolith with little flexibility at the moment, and it's easy to have PRs working on overlapping lines of code. For example, working on the static cache and some input preparation bug fixes will likely result in a conflict. Due to the volume of PRs on the generate side, it happens quite frequently.

It is indeed a sign that we should break down generate into a more flexible structure; you've possibly seen me writing about it in other places 🤗 I'd love to work soon! Meanwhile, I'm trying to reduce the scope of each PR within reasonable terms: smaller PRs -> less time open -> fewer merge conflicts.

(It also helps that I have a lighter view in terms of testing requirements, as I don't face the CI pain you surely face as a core maintainer 😉 As I've written above, I'll try to improve my strictness, as your judgment is probably fairer on testing questions.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants