Specify scheduler in DataFrame assert_eq#8559
Conversation
|
Thanks @gjoseph92! cc @rjzamora who I recall wanted something like this previously |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @gjoseph92 -- I left a couple comments about the test that was added, but am looking forward to seeing this added
| with pytest.raises(AssertionError): | ||
| assert_eq(ddf2, ddf2) | ||
|
|
||
| assert_eq(ddf2, ddf2, scheduler=custom_scheduler) |
There was a problem hiding this comment.
I think this is close to what we want, but it's not clear to me that this is actually using custom_scheduler. For example, if we just added a scheduler= keyword to assert_eq and then did nothing with it, this line would still pass. If custom_scheduler incremented some counter when it was called and we then asserted that the counter has been incremented the expected number of times after calling assert_eq it would be more straightforward to see that custom_scheduler is actually being used (there certainly could be other approaches too -- I'm just throwing one possibility out there)
There was a problem hiding this comment.
The point here was that if you weren't using custom_scheduler, the computation would fail (as just tested above). I could also have the custom_scheduler function increment a counter every time it's called, but that seems kind of redundant to me.
There was a problem hiding this comment.
Ah, I see. Thanks for the clarification.
Somewhat related, what about updating the assert_eq calls to be the original pandas DataFrame (i.e. something along the of assert_eq(df + 1, ddf2))? That should result in us exercising all the same logic currently in the test, while also giving more confidence since we know we're comparing the computed ddf2 result to a known object which doesn't require a dask computation
There was a problem hiding this comment.
I checked it against itself just because it seemed like an easier way to verify that both sides of the assert_eq were respecting the scheduler= logic. Otherwise, besides parametrizing the test to flip them around, it would be possible to have a future regression where one of the calls inside assert_eq forgot to pass the argument through.
Support passing
scheduler=todask.dataframe.utils.assert_eq. By default,assert_eqcontinues to use thesyncscheduler, but this allows specific tests to override that as necessary.Split out from #8392.
pre-commit run --all-files