Track each row separately for stopping criteria#29116
Track each row separately for stopping criteria#29116gante merged 2 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
gante
left a comment
There was a problem hiding this comment.
Looks good -- let's just check why the test is failing :)
|
Yep, should be unrelated to stopping criteria but I will check |
|
I support this - I'll make sure not to merge #28932 until this is in. After this is merged, I'll rebase and then modify my PR to also return a per-sample vector. |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for adding this - looks great!
Two things to do before merging:
- Running all the generation tests (incl slow) to confirm that this doesn't break anything
- Adding some tests. At the moment, we're able to adapt the current tests with just
allbut we don't check that we 1) correctly only have some elements set to stop if only some sequences reach a criterion and 2) the generation behaviour when some sequences are stopped and others aren't.
|
|
Merging as the additional test request [what happens when some sequences are finished and others aren't] is not yet possible [the existing criterias are time or sequence base, i.e. equal for all sequences]. #28932 will add a stopping criteria that enable us to do [and test] that. @Rocketknight1 plz don't forget to include a test for this case :D |
|
Got it! |
|
@zucchini-nlp @gante There should still be a follow up PR for tests. I understand that the current tests will return only all true or all false, but new tests should be added when this is not the case In fact, I'm not sure how we can know this is behaving correctly if we don't have this? |
|
@amyeroberts agreed - I'll make sure something like that is in the stop-strings PR, and let people know if it fails |
|
@amyeroberts I hear you. But isn't that overzealous, testing a property that we can't yet trigger? We would have to build a mock class with the sole purpose of testing this property now, which seems worse than delegating testing the property to the PR that introduces its usage 🤔 |
|
@gante Tests should go with the feature they're adding. Otherwise, we end up having to have a bunch of follow-up PRs to remedy something that should have been caught in the first place. If it's not easy to test, then that indicates either 1) this feature cannot properly be added at this time or 2) something else needs to be reworked to enable testing as it shouldn't be hard to do. |
|
@amyeroberts I'll try to be more conscious of full feature tests in future merges 🤗 Nevertheless, I believe the approach you mention has a different set of drawbacks: it forces us to write larger PRs, as changes are less atomic. On features that touch many models, like |
|
@gante I don't believe that's the case. For one, I don't normally consider the diff added in the tests as part of the overall diff count. Otherwise it ends in perverse situations when someone might write fewer tests then necessary just to keep under ~400 lines. I don't understand why adding tests for a new feature, or the new feature itself should affect existing generate models or their tests? If they do, then we should work on separating out the generation logic such that it can be tested independently of this. Otherwise, development will start to slow as we can't quickly add features we can trust to work reliably. |
|
@amyeroberts It is indeed a sign that we should break down (It also helps that I have a lighter view in terms of testing requirements, as I don't face the CI pain you surely face as a core maintainer 😉 As I've written above, I'll try to improve my strictness, as your judgment is probably fairer on testing questions.) |
What does this PR do?
Addresses the question raised in #28932. I accidentally messed up the first PR (#29056) that was approved, so this is the second version with the same changes.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante , can you please look again into this. It's the same thing you approved earlier.