Skip to content

Parse top-level import name from wheel file#3006

Merged
ryanking13 merged 29 commits intopyodide:mainfrom
ryanking13:toplevel
Sep 6, 2022
Merged

Parse top-level import name from wheel file#3006
ryanking13 merged 29 commits intopyodide:mainfrom
ryanking13:toplevel

Conversation

@ryanking13
Copy link
Copy Markdown
Member

@ryanking13 ryanking13 commented Aug 23, 2022

Description

Changes how imports key in repodata.json is calculated. This key is used in loadPackagesFromImport to get the top-level import name of a package.

Previously we were using test/imports key in meta.yaml to calculate this key, without mentioning this in docs. Therefore it was very easy to make a mistake such as #3001.

This PR changes this by calculating top-level imports dynamically from the wheel file:

1) Try to parse the top-level import name from top_level.txt file inside the wheel file. This file exists if the package is built with setuptools, but may not exist if the package is built with flit, poetry, etc.
2) Iterate the top-level directories and files inside the wheel file, and add to top-level import if there are directories containing python file or if there is a top-level python file.
3) If both of the above fails, use the package name inside meta.yaml as a top-level import name (e.g. shared libraries)

FYI these packages don't have top_level.txt:

- packages that use flit: pyparsing, more_itertools, tomli, tomli_w, threadpoolctl, typing_extensions
- packages that use hatchling: jsonschema, soupsieve,
~~- packages that use poetry: pkgconfig
- packages that use distutils: distlib (changed to setuptools in latest release)

Checklists

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The nice thing is it also catches private imports (e.g. _pytest). It's an interesting approach, but note that are a number of problematic imports it finds below. Some are easy to fix, others are harder. Also this is hard to notice / review for users so I wonder whether the current system (but possibly with auto-generated list of imports by mkpkg using this code) so it could be reviewed would be more reliable.

image

test and tests should be explicitly excluded,
image

image

Not sure about those lib* folders, but maybe not critical,
image

image

image

I stopped at "l" letter in repodata.json. I imagine there are more of the same types of issues later.

@ryanking13
Copy link
Copy Markdown
Member Author

but note that are a number of problematic imports it finds below. Some are easy to fix, others are harder.

Right... it seems like the top_level.txt is not very reliable so probably we should iterate directories even if top_level.txt exist. Thanks for pointing that out.

so I wonder whether the current system (but possibly with auto-generated list of imports by mkpkg using this code) so it could be reviewed would be more reliable.

Yeah, that would be more flexible.

@ryanking13 ryanking13 marked this pull request as draft August 23, 2022 08:30
@ryanking13
Copy link
Copy Markdown
Member Author

ryanking13 commented Aug 23, 2022

The issue with auto-generating top level imports in mkpkg is that the final directory structure is not visible until we build a wheel.
Probably we can download the wheel for other platforms and use its structure.

@ofek
Copy link
Copy Markdown

ofek commented Aug 23, 2022

Iterate the top-level directories and files inside the wheel file, and add to top-level import if there are directories containing __init__py or if there is a top-level python file.

Don't forget about implicit namespace packages https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

Copy link
Copy Markdown
Member Author

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated such as:

  1. Add a new key package/top-level in meta.yaml which is used for top-level import names.
  2. package/top-level is auto calculated in mkpkg command. This is done by downloading a wheel file of a native platform using its directory structure. (I hope the directory structure doesn't vary between platforms...)
  3. If a wheel file doesn't exist, it will add a placeholder so one can manually add top-level import names.

Also I stopped using top_level.txt since it often contains invalid import names as pointed out by Roman.

To make a review easier, I added comments on packages that have changes.

version: 21.4.0
top-level:
- attr
- attrs
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added attrs

version: "1.79"
top-level:
- Bio
- BioSQL
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added BioSQL.

version: 3.5.2

top-level:
- pylab
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matplotlib has a top-level file pylab.py. Not sure about the usage of it.

name: pycryptodomex
version: 3.15.0
top-level:
- Cryptodome
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intestingly, this file was named wrong (meta.yml), so this package was not built.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the build is faling, disabled for now.

name: pyrsistent
version: 0.18.1
top-level:
- _pyrsistent_version
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a top-level version file...

@@ -1,6 +1,8 @@
package:
name: ruamel
name: ruamel.yaml
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruamel.yaml is a correct package name.

version: 0.19.3
top-level:
- skimage
# - doc # scikit-image has top level `doc` directory... but it is too common name so we don't want to use it as a hint for top-level import.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-image have top-level doc directory, but the name doc is too common so I think we don't want to load scikit-image is someone do import doc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: setuptools
version: 62.6.0
top-level:
- _distutils_hack
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setuptools has _distutils_hack.

name: sympy
version: 1.10.1
top-level:
- isympy
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sympy has top-leve isympy.py file.

name: toolz
version: 0.11.2
top-level:
- tlz
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tlz.

@hoodmane
Copy link
Copy Markdown
Member

hoodmane commented Sep 2, 2022

Sorry for the delay on review for this @ryanking13. I'll try to look at it when I get a chance. Seems like it's in good shape.

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, otherwise LGTM. Thanks!

version: 0.19.3
top-level:
- skimage
# - doc # scikit-image has top level `doc` directory... but it is too common name so we don't want to use it as a hint for top-level import.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanking13
Copy link
Copy Markdown
Member Author

Thanks for the thorough reviews!

@ryanking13 ryanking13 merged commit 555f782 into pyodide:main Sep 6, 2022
@ryanking13 ryanking13 deleted the toplevel branch September 6, 2022 06:20
@ryanking13 ryanking13 mentioned this pull request Sep 8, 2022
3 tasks
@rth rth mentioned this pull request Sep 10, 2022
3 tasks
hoodmane pushed a commit that referenced this pull request Nov 16, 2022
pycryptodomex was added in #2966, but it had an invalid recipe name meta.yml (not meta.yaml) so our build system didn't build that package.

I found that in #3006, and I also found that it is not building well, so I disabled it then. So, in other words, pycryptodomex never worked in Pyodide.

I would like to remove it from the changelog and the repository for now, so that we don't add it to our next stable release accidentally. Perhaps someone interested can re-add this package.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants