Skip to content

comment fixes#7215

Merged
jrbourbeau merged 1 commit intodask:mainfrom
celsiustx:nits
Mar 15, 2021
Merged

comment fixes#7215
jrbourbeau merged 1 commit intodask:mainfrom
celsiustx:nits

Conversation

@ryan-williams
Copy link
Contributor

  • Tests added / passed (N/A)
  • Passes black dask / flake8 dask

Factored out of #6661 (DDF.iloc).

@ryan-williams
Copy link
Contributor Author

hmm dask/array/tests/test_linalg.py::test_norm_any_slice[False--1-shape1-chunks1], I suspect this is a flaky test.

Is there a way I can trigger a retry of a given GHA?

@jrbourbeau
Copy link
Member

Apologies for the inconvenience, test_norm_any_slice is a known flaky tests which we're working to resolve (xref #7189). I just bumped CI

@ryan-williams
Copy link
Contributor Author

Thanks! I also rebased this on top of #7217 to try to get CI passing; can re-rebase once #7220 is in

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup @ryan-williams!

@@ -1,5 +1,5 @@
numpydoc
sphinx
sphinx<3.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion over in #7217

Suggested change
sphinx<3.5.0
sphinx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, removed


def test_from_pandas_dataframe():
a = list("aaaaaaabbbbbbbbccccccc")
np.random.seed(123)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be testing behavior that is independent of the values in the b column. Did you find that not having the seed set resulting in this test failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, agreed that it shouldn't matter here. I added this a while back unfortunately and don't remember the specifics.

My guess is I was debugging general from_pandas failures with my iloc implementation, and found it hard to follow when stepping repeatedly in the debugger when the values weren't the same each time.

Happy to remove if it feels like a distraction (or, to file an issue for a more holistic audit that all random tests are seeded, which I feel like should be the default in general)

assert all(
p[:l] == basepath for p in path_parts_list
), "All paths must begin with the given root"
l = len(basepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like l is used a few lines down in "/".join(path_parts[l:])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it's set in both the if and else branches above though, so this line is redundant

Comment on lines -2538 to -2539
def test_gh580():
df = pd.DataFrame({"x": np.arange(10, dtype=float)})
ddf = dd.from_pandas(df, 2)
assert_eq(np.cos(df["x"]), np.cos(ddf["x"]))
assert_eq(np.cos(df["x"]), np.cos(ddf["x"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out trying to just add a comment documenting what this test was doing, since it's a bit cryptic, and ultimately decided it was stale/redundant enough to be considered "dead code".

Looking into the history (#580, #582) it seems it was verifying some workaround related to ufuncs that has since been superseded with more correct behavior. I'm not sure if the duplicated assertion is intentional or was ever significant, and I believe e.g. test_ufunc.py covers whatever is being tested here (and then some).

Happy to leave it in, but in that case maybe someone else should try their hand at documenting exactly what this test is doing? If no one can, that seems like a good motivation to remove it, I think?

Copy link
Contributor Author

@ryan-williams ryan-williams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, think I responded to everything, will force-push a new rebased version now

assert all(
p[:l] == basepath for p in path_parts_list
), "All paths must begin with the given root"
l = len(basepath)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it's set in both the if and else branches above though, so this line is redundant


def test_from_pandas_dataframe():
a = list("aaaaaaabbbbbbbbccccccc")
np.random.seed(123)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, agreed that it shouldn't matter here. I added this a while back unfortunately and don't remember the specifics.

My guess is I was debugging general from_pandas failures with my iloc implementation, and found it hard to follow when stepping repeatedly in the debugger when the values weren't the same each time.

Happy to remove if it feels like a distraction (or, to file an issue for a more holistic audit that all random tests are seeded, which I feel like should be the default in general)

Comment on lines -2538 to -2539
def test_gh580():
df = pd.DataFrame({"x": np.arange(10, dtype=float)})
ddf = dd.from_pandas(df, 2)
assert_eq(np.cos(df["x"]), np.cos(ddf["x"]))
assert_eq(np.cos(df["x"]), np.cos(ddf["x"]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out trying to just add a comment documenting what this test was doing, since it's a bit cryptic, and ultimately decided it was stale/redundant enough to be considered "dead code".

Looking into the history (#580, #582) it seems it was verifying some workaround related to ufuncs that has since been superseded with more correct behavior. I'm not sure if the duplicated assertion is intentional or was ever significant, and I believe e.g. test_ufunc.py covers whatever is being tested here (and then some).

Happy to leave it in, but in that case maybe someone else should try their hand at documenting exactly what this test is doing? If no one can, that seems like a good motivation to remove it, I think?

@@ -1,5 +1,5 @@
numpydoc
sphinx
sphinx<3.5.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, removed

Copy link
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR would be easier to review if it did one thing. Comment improvements would be merged very quickly while discussion of changes to tests could happen separately.

if np.isnan(sum(map(sum, blockdims))):
raise ValueError(
"Arrays chunk sizes (%s) are unknown.\n\n"
"Array's chunk sizes (%s) are unknown.\n\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Array's chunk sizes (%s) are unknown.\n\n"
"Arrays' chunk sizes (%s) are unknown.\n\n"

Base automatically changed from master to main March 8, 2021 20:19
@ryan-williams
Copy link
Contributor Author

alright, I trimmed this to just the comment typo fixes. I'll send a PR w/ the dead code removal, but defer to y'all about whether the "obsolete test" removal / random-test-seeding is worth the PR noise.

@ryan-williams ryan-williams mentioned this pull request Mar 14, 2021
3 tasks
@ryan-williams
Copy link
Contributor Author

hmm this looks like a spurious failure? pip failed to resolve some dep? Can someone try running it again? (I dl'd a full log just in case)

Run pre-commit/action@v2.0.0
install pre-commit
  /opt/hostedtoolcache/Python/3.9.2/x64/bin/pip install pre-commit
  Collecting pre-commit
    Downloading pre_commit-2.11.1-py2.py3-none-any.whl (187 kB)
  Collecting virtualenv>=20.0.8
    Downloading virtualenv-20.4.2-py2.py3-none-any.whl (7.2 MB)
  Collecting toml
    Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
  Collecting pyyaml>=5.1
    Downloading PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl (630 kB)
  ERROR: Exception:
  Traceback (most recent call last):
    File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 171, in _merge_into_criterion
      crit = self.state.criteria[name]
  KeyError: 'pyyaml'

@ryan-williams ryan-williams changed the title comment fixes, dead code removal comment fixes Mar 15, 2021
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryan-williams!

@jrbourbeau jrbourbeau merged commit a1187b1 into dask:main Mar 15, 2021
@ryan-williams ryan-williams deleted the nits branch March 15, 2021 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants