Conversation
|
hmm Is there a way I can trigger a retry of a given GHA? |
|
Apologies for the inconvenience, |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for the cleanup @ryan-williams!
docs/requirements-docs.txt
Outdated
| @@ -1,5 +1,5 @@ | |||
| numpydoc | |||
| sphinx | |||
| sphinx<3.5.0 | |||
There was a problem hiding this comment.
Per the discussion over in #7217
| sphinx<3.5.0 | |
| sphinx |
There was a problem hiding this comment.
thanks, removed
dask/dataframe/io/tests/test_io.py
Outdated
|
|
||
| def test_from_pandas_dataframe(): | ||
| a = list("aaaaaaabbbbbbbbccccccc") | ||
| np.random.seed(123) |
There was a problem hiding this comment.
This should be testing behavior that is independent of the values in the b column. Did you find that not having the seed set resulting in this test failing?
There was a problem hiding this comment.
Yea, agreed that it shouldn't matter here. I added this a while back unfortunately and don't remember the specifics.
My guess is I was debugging general from_pandas failures with my iloc implementation, and found it hard to follow when stepping repeatedly in the debugger when the values weren't the same each time.
Happy to remove if it feels like a distraction (or, to file an issue for a more holistic audit that all random tests are seeded, which I feel like should be the default in general)
dask/dataframe/io/parquet/utils.py
Outdated
| assert all( | ||
| p[:l] == basepath for p in path_parts_list | ||
| ), "All paths must begin with the given root" | ||
| l = len(basepath) |
There was a problem hiding this comment.
It looks like l is used a few lines down in "/".join(path_parts[l:])
There was a problem hiding this comment.
yea, it's set in both the if and else branches above though, so this line is redundant
| def test_gh580(): | ||
| df = pd.DataFrame({"x": np.arange(10, dtype=float)}) | ||
| ddf = dd.from_pandas(df, 2) | ||
| assert_eq(np.cos(df["x"]), np.cos(ddf["x"])) | ||
| assert_eq(np.cos(df["x"]), np.cos(ddf["x"])) |
There was a problem hiding this comment.
I started out trying to just add a comment documenting what this test was doing, since it's a bit cryptic, and ultimately decided it was stale/redundant enough to be considered "dead code".
Looking into the history (#580, #582) it seems it was verifying some workaround related to ufuncs that has since been superseded with more correct behavior. I'm not sure if the duplicated assertion is intentional or was ever significant, and I believe e.g. test_ufunc.py covers whatever is being tested here (and then some).
Happy to leave it in, but in that case maybe someone else should try their hand at documenting exactly what this test is doing? If no one can, that seems like a good motivation to remove it, I think?
ryan-williams
left a comment
There was a problem hiding this comment.
thanks, think I responded to everything, will force-push a new rebased version now
dask/dataframe/io/parquet/utils.py
Outdated
| assert all( | ||
| p[:l] == basepath for p in path_parts_list | ||
| ), "All paths must begin with the given root" | ||
| l = len(basepath) |
There was a problem hiding this comment.
yea, it's set in both the if and else branches above though, so this line is redundant
dask/dataframe/io/tests/test_io.py
Outdated
|
|
||
| def test_from_pandas_dataframe(): | ||
| a = list("aaaaaaabbbbbbbbccccccc") | ||
| np.random.seed(123) |
There was a problem hiding this comment.
Yea, agreed that it shouldn't matter here. I added this a while back unfortunately and don't remember the specifics.
My guess is I was debugging general from_pandas failures with my iloc implementation, and found it hard to follow when stepping repeatedly in the debugger when the values weren't the same each time.
Happy to remove if it feels like a distraction (or, to file an issue for a more holistic audit that all random tests are seeded, which I feel like should be the default in general)
| def test_gh580(): | ||
| df = pd.DataFrame({"x": np.arange(10, dtype=float)}) | ||
| ddf = dd.from_pandas(df, 2) | ||
| assert_eq(np.cos(df["x"]), np.cos(ddf["x"])) | ||
| assert_eq(np.cos(df["x"]), np.cos(ddf["x"])) |
There was a problem hiding this comment.
I started out trying to just add a comment documenting what this test was doing, since it's a bit cryptic, and ultimately decided it was stale/redundant enough to be considered "dead code".
Looking into the history (#580, #582) it seems it was verifying some workaround related to ufuncs that has since been superseded with more correct behavior. I'm not sure if the duplicated assertion is intentional or was ever significant, and I believe e.g. test_ufunc.py covers whatever is being tested here (and then some).
Happy to leave it in, but in that case maybe someone else should try their hand at documenting exactly what this test is doing? If no one can, that seems like a good motivation to remove it, I think?
docs/requirements-docs.txt
Outdated
| @@ -1,5 +1,5 @@ | |||
| numpydoc | |||
| sphinx | |||
| sphinx<3.5.0 | |||
There was a problem hiding this comment.
thanks, removed
eacad02 to
0cf1d05
Compare
dask/array/core.py
Outdated
| if np.isnan(sum(map(sum, blockdims))): | ||
| raise ValueError( | ||
| "Arrays chunk sizes (%s) are unknown.\n\n" | ||
| "Array's chunk sizes (%s) are unknown.\n\n" |
There was a problem hiding this comment.
| "Array's chunk sizes (%s) are unknown.\n\n" | |
| "Arrays' chunk sizes (%s) are unknown.\n\n" |
|
alright, I trimmed this to just the comment typo fixes. I'll send a PR w/ the dead code removal, but defer to y'all about whether the "obsolete test" removal / random-test-seeding is worth the PR noise. |
|
hmm this looks like a spurious failure? pip failed to resolve some dep? Can someone try running it again? (I dl'd a full log just in case) |
Tests added / passed(N/A)black dask/flake8 daskFactored out of #6661 (DDF.iloc).