Skip to content

Feat: read entrypoints in dask.sizeof#7688

Merged
jsignell merged 24 commits intodask:mainfrom
agoose77:feature-add-entrypoint-sizeof
Jun 29, 2022
Merged

Feat: read entrypoints in dask.sizeof#7688
jsignell merged 24 commits intodask:mainfrom
agoose77:feature-add-entrypoint-sizeof

Conversation

@agoose77
Copy link
Copy Markdown
Contributor

@agoose77 agoose77 commented May 21, 2021

An example plugin package is here: https://github.com/agoose77/dask-sizeof-snek

A "useful" example is also the Scikit-HEP sizeof package, which is a practical example of why this is useful.

@agoose77
Copy link
Copy Markdown
Contributor Author

agoose77 commented May 21, 2021

This is a WIP PR. The logic to support entry-points is, as you can see, very simple.

At the moment, entrypoints are loaded as soon as dask.sizeof is imported. It would be possible to modify the Dispatch object such that we only load entry-points when there is no dispatch for a given type. However, if entry-points are needed by an application, this just moves the cost of reading entry_points to the point at which sizeof is dispatched upon. This would eliminate the cost for applications which do not need entry-point based implementations.

I decided not to use the "name" field for anything practical; it would be possible to use the name as the import path to a Python class object, such that Dask would be responsible for calling sizeof.register. However precludes the ability to use register_lazy, or to register multiple plugins.

I haven't added any tests yet, and I didn't see anywhere in the docs to put any new documentation on this. Any suggestions here would be helpful!

@agoose77
Copy link
Copy Markdown
Contributor Author

@mrocklin @jsignell @jrbourbeau I'd appreciate any of your thoughts on this 👋

@agoose77 agoose77 marked this pull request as ready for review May 21, 2021 13:10
Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice enhancement and sorry I hadn't gotten a chance to look at it yet. It'd be helpful if you could write some documentation and maybe open a WIP PR on a downstream library that implements sizeof entrypoints. That would help with discussion.

@jsignell
Copy link
Copy Markdown
Member

@agoose77 we are in agreement that this is a nice improvement, so you should feel free to finish it up whenever you get a chance.

You can put the documentation in the docstring or you could write a new section on https://docs.dask.org/en/latest/develop.html#

@agoose77 agoose77 force-pushed the feature-add-entrypoint-sizeof branch from 6729024 to 49c9824 Compare June 16, 2021 12:55
@agoose77
Copy link
Copy Markdown
Contributor Author

@jsignell how does the documentation look? Is that sufficiently detailed?

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is fantastic - thank you so much for writing it up!

I am wondering what would happen if two packages both implement sizeof for the same type. I guess that was already a risk with the old approach, but at least in the old approach, the sizeof only gets registered when the external library is imported. Do you have a sense of what should happen for this kind of case in the entry points approach?

@jrbourbeau
Copy link
Copy Markdown
Member

Thanks for the updates @agoose77! For reference, we run a few code style checks which are outlined at https://docs.dask.org/en/latest/develop.html#code-formatting. In particular, I recommend using the (optional) pre-commit hooks, as they will ensure all style checks are automatically run when you make a git commit locally

@agoose77
Copy link
Copy Markdown
Contributor Author

Thanks for the updates @agoose77! For reference, we run a few code style checks which are outlined at https://docs.dask.org/en/latest/develop.html#code-formatting. In particular, I recommend using the (optional) pre-commit hooks, as they will ensure all style checks are automatically run when you make a git commit locally

Huh, I didn't notice the .pre-commit-config.yaml, my bad. I'll run the linters.

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agoose77! Can we add a test to ensure that sizeof entrypoints are properly registered?

dask/sizeof.py Outdated
entry_points = importlib_metadata.entry_points()

try:
sizeof_entry_points = entry_points["dask.sizeof"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Python 3.10+ / importlib_metadata >= 3.9.0 has deprecated dict access in favor of select(). This is what's causing the CI failures here (which are using importlib_metadata >= 3.9.0). To accommodate this, let's add a hasattr(entry_points, "select") check to determine which method we should use (see fsspec/filesystem_spec#680 for a place where a similar change was made).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrbourbeau yes, this was the reason I originally hoped to use a restricted version of importlib_metadata. I agree that your suggestion is going to be the most robust / least invasive going forward :)

@agoose77
Copy link
Copy Markdown
Contributor Author

Thanks @agoose77! Can we add a test to ensure that sizeof entrypoints are properly registered?

What do you mean by this? I already test for the group key, would you like a warning?

@jrbourbeau
Copy link
Copy Markdown
Member

Sorry, I mean can we add a test to our test suite for this? Right now we don't have any tests which check that this functionality works as we expect it to

@agoose77
Copy link
Copy Markdown
Contributor Author

@jrbourbeau I'm not against it in principle, but I'm not even sure how we would test this - really we would just be testing importlib_metadata - this code just invokes the given entry-point function with a single arg, and so it's really only external user code that is testable.

@jrbourbeau
Copy link
Copy Markdown
Member

I agree there's no need to test importlib_metadata functionality. Though we should test that calling _register_entry_point_plugins results in sizeof entrypoints actually being registered with the dask.sizeof.sizeof dispatch object.

One possible test would be to do something like:

  1. Create some custom sizeof entrypoint implementation
  2. Call _register_entry_point_plugins
  3. assert that the dask.sizeof.sizeof dispatch object has a corresponding implementation for the custom entrypoint

We have some support for entrypoints in dask/distributed. Perhaps test_register_backend_entrypoint over there could help inform the test we add here.

@GPUtester
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@agoose77
Copy link
Copy Markdown
Contributor Author

@jrbourbeau finally got around to this. I've made the changes you mentioned earlier!

@jsignell
Copy link
Copy Markdown
Member

jsignell commented Sep 7, 2021

ok to test

@agoose77 agoose77 force-pushed the feature-add-entrypoint-sizeof branch from 10926bf to a058244 Compare March 30, 2022 13:36
@github-actions github-actions bot added the documentation Improve or add to documentation label Mar 30, 2022
@agoose77
Copy link
Copy Markdown
Contributor Author

agoose77 commented Mar 30, 2022

@graingert thanks, I've made those changes. I've reabsed on main too.

I am not sure where is best to add the link for the small docs that I wrote. Have you any suggestions?

@agoose77 agoose77 force-pushed the feature-add-entrypoint-sizeof branch from fcfbe7c to 1223f70 Compare April 2, 2022 10:17
@agoose77 agoose77 force-pushed the feature-add-entrypoint-sizeof branch from 1223f70 to 4d9f038 Compare April 2, 2022 10:24
@jsignell
Copy link
Copy Markdown
Member

jsignell commented Apr 7, 2022

For docs, I could imagine writing a new how-to section, those can be as short as you like, but I can't think of another place where it would fit.

@agoose77
Copy link
Copy Markdown
Contributor Author

@jsignell I think this is done, or at least ready for feedback!

@agoose77
Copy link
Copy Markdown
Contributor Author

agoose77 commented Jun 3, 2022

@jsignell Just pinging again to see whether I need to do anything further here? Let me know if there's no scope for looking at this right now, I can put it on the backburner.

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dropping the ball on this (again). This looks good to me! I will merge it at the end of the day unless there is any feedback from others.

@agoose77 agoose77 changed the title WIP: read entrypoints in dask.sizeof Feat: read entrypoints in dask.sizeof Jun 6, 2022
@agoose77
Copy link
Copy Markdown
Contributor Author

agoose77 commented Jun 6, 2022

Please, no need to apologise, this is a big project and a small PR! I'll keep my eye on the issue :)

@jsignell
Copy link
Copy Markdown
Member

jsignell commented Jun 6, 2022

Seems like something in that last commit broke linting :)

@agoose77
Copy link
Copy Markdown
Contributor Author

agoose77 commented Jun 6, 2022

Ah, my mistake - I assumed that the linters were also run by precommit, but it looks like this particular linter was added recently.

Copy link
Copy Markdown
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks good. Thanks for working on it 🙏

Had a few questions below

@@ -1,4 +1,6 @@
import importlib.metadata
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the importlib_metadata backport package for earlier Python versions? If so, should we include that as a requirement on those Python versions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.8 includes importlib.metadata, so we only use the version guard to check which API to use (see next comment)

dask/sizeof.py Outdated
Comment on lines +222 to +225
if sys.version_info >= (3, 10):
sizeof_entry_points = entry_points.select(group="dask.sizeof")
else:
sizeof_entry_points = entry_points.get("dask.sizeof", [])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we know this is Python 3.10 related as opposed to something else (like the importlib_metadata backport package being used)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think part of the issue is this allows an _EntryPoints object to be passed in. Though maybe we could get rid of that functionality (if it is not needed) and move that code from Distributed into a utility function here and use in both libraries

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@graingert is there a particular reason for your suggestion? pyupgrade should catch this case too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's a perf benefit from sharing the loaded entrypoints between dask and distributed, my view is that the distributed implementation is overkill for what we actually need here.

@jsignell
Copy link
Copy Markdown
Member

This seems like a great first step. We can always iterate and refactor if we like. But for now merging! Thanks for sticking with this @agoose77! And thank you @graingert and @jakirkham for reviewing.

@jsignell jsignell merged commit 0a7c41d into dask:main Jun 29, 2022
@agoose77
Copy link
Copy Markdown
Contributor Author

Thanks all for helping get this merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config documentation Improve or add to documentation needs review Needs review from a contributor.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: add sizeof entry-points hook

7 participants