Feat: read entrypoints in dask.sizeof#7688
Conversation
|
This is a WIP PR. The logic to support entry-points is, as you can see, very simple. At the moment, entrypoints are loaded as soon as I decided not to use the "name" field for anything practical; it would be possible to use the name as the import path to a Python class object, such that Dask would be responsible for calling I haven't added any tests yet, and I didn't see anywhere in the docs to put any new documentation on this. Any suggestions here would be helpful! |
|
@mrocklin @jsignell @jrbourbeau I'd appreciate any of your thoughts on this 👋 |
jsignell
left a comment
There was a problem hiding this comment.
This seems like a nice enhancement and sorry I hadn't gotten a chance to look at it yet. It'd be helpful if you could write some documentation and maybe open a WIP PR on a downstream library that implements sizeof entrypoints. That would help with discussion.
|
@agoose77 we are in agreement that this is a nice improvement, so you should feel free to finish it up whenever you get a chance. You can put the documentation in the docstring or you could write a new section on https://docs.dask.org/en/latest/develop.html# |
6729024 to
49c9824
Compare
|
@jsignell how does the documentation look? Is that sufficiently detailed? |
jsignell
left a comment
There was a problem hiding this comment.
This documentation is fantastic - thank you so much for writing it up!
I am wondering what would happen if two packages both implement sizeof for the same type. I guess that was already a risk with the old approach, but at least in the old approach, the sizeof only gets registered when the external library is imported. Do you have a sense of what should happen for this kind of case in the entry points approach?
|
Thanks for the updates @agoose77! For reference, we run a few code style checks which are outlined at https://docs.dask.org/en/latest/develop.html#code-formatting. In particular, I recommend using the (optional) |
Huh, I didn't notice the |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @agoose77! Can we add a test to ensure that sizeof entrypoints are properly registered?
dask/sizeof.py
Outdated
| entry_points = importlib_metadata.entry_points() | ||
|
|
||
| try: | ||
| sizeof_entry_points = entry_points["dask.sizeof"] |
There was a problem hiding this comment.
It looks like Python 3.10+ / importlib_metadata >= 3.9.0 has deprecated dict access in favor of select(). This is what's causing the CI failures here (which are using importlib_metadata >= 3.9.0). To accommodate this, let's add a hasattr(entry_points, "select") check to determine which method we should use (see fsspec/filesystem_spec#680 for a place where a similar change was made).
There was a problem hiding this comment.
@jrbourbeau yes, this was the reason I originally hoped to use a restricted version of importlib_metadata. I agree that your suggestion is going to be the most robust / least invasive going forward :)
What do you mean by this? I already test for the group key, would you like a warning? |
|
Sorry, I mean can we add a test to our test suite for this? Right now we don't have any tests which check that this functionality works as we expect it to |
|
@jrbourbeau I'm not against it in principle, but I'm not even sure how we would test this - really we would just be testing |
|
I agree there's no need to test One possible test would be to do something like:
We have some support for entrypoints in |
|
Can one of the admins verify this patch? |
|
@jrbourbeau finally got around to this. I've made the changes you mentioned earlier! |
|
ok to test |
10926bf to
a058244
Compare
|
@graingert thanks, I've made those changes. I've reabsed on main too. I am not sure where is best to add the link for the small docs that I wrote. Have you any suggestions? |
fcfbe7c to
1223f70
Compare
1223f70 to
4d9f038
Compare
|
For docs, I could imagine writing a new how-to section, those can be as short as you like, but I can't think of another place where it would fit. |
|
@jsignell I think this is done, or at least ready for feedback! |
|
@jsignell Just pinging again to see whether I need to do anything further here? Let me know if there's no scope for looking at this right now, I can put it on the backburner. |
jsignell
left a comment
There was a problem hiding this comment.
Sorry for dropping the ball on this (again). This looks good to me! I will merge it at the end of the day unless there is any feedback from others.
|
Please, no need to apologise, this is a big project and a small PR! I'll keep my eye on the issue :) |
|
Seems like something in that last commit broke linting :) |
|
Ah, my mistake - I assumed that the linters were also run by precommit, but it looks like this particular linter was added recently. |
jakirkham
left a comment
There was a problem hiding this comment.
Generally this looks good. Thanks for working on it 🙏
Had a few questions below
| @@ -1,4 +1,6 @@ | |||
| import importlib.metadata | |||
There was a problem hiding this comment.
Should we use the importlib_metadata backport package for earlier Python versions? If so, should we include that as a requirement on those Python versions?
There was a problem hiding this comment.
Python 3.8 includes importlib.metadata, so we only use the version guard to check which API to use (see next comment)
dask/sizeof.py
Outdated
| if sys.version_info >= (3, 10): | ||
| sizeof_entry_points = entry_points.select(group="dask.sizeof") | ||
| else: | ||
| sizeof_entry_points = entry_points.get("dask.sizeof", []) |
There was a problem hiding this comment.
Why do we know this is Python 3.10 related as opposed to something else (like the importlib_metadata backport package being used)?
There was a problem hiding this comment.
I think copying the implementation from distributed would be best https://github.com/dask/distributed/blob/50d29110a430fe8457f73163082716844e1f11a6/distributed/comm/registry.py#L15
There was a problem hiding this comment.
Well I think part of the issue is this allows an _EntryPoints object to be passed in. Though maybe we could get rid of that functionality (if it is not needed) and move that code from Distributed into a utility function here and use in both libraries
There was a problem hiding this comment.
@graingert is there a particular reason for your suggestion? pyupgrade should catch this case too.
There was a problem hiding this comment.
Unless there's a perf benefit from sharing the loaded entrypoints between dask and distributed, my view is that the distributed implementation is overkill for what we actually need here.
|
This seems like a great first step. We can always iterate and refactor if we like. But for now merging! Thanks for sticking with this @agoose77! And thank you @graingert and @jakirkham for reviewing. |
|
Thanks all for helping get this merged! |
black dask/flake8 dask/isort daskAn example plugin package is here: https://github.com/agoose77/dask-sizeof-snek
A "useful" example is also the Scikit-HEP sizeof package, which is a practical example of why this is useful.