Skip to content

Allow package-data for stubs packages#248

Merged
abravalheri merged 6 commits into
abravalheri:mainfrom
cdce8p-forks:stubs-package-data
Jan 21, 2026
Merged

Allow package-data for stubs packages#248
abravalheri merged 6 commits into
abravalheri:mainfrom
cdce8p-forks:stubs-package-data

Conversation

@cdce8p

@cdce8p cdce8p commented Mar 29, 2025

Copy link
Copy Markdown
Contributor

I'm currently working on moving the stubs_uploader from setup.py to pyproject.toml. It's used to build and upload the typeshed stubs packages automatically. While doing so I noticed that setuptools rejects stub package names like requests-stubs for package-data. I know that .pyi and py.typed files are auto-included. Each stubs package also contains a METADATA.toml file which should be included in the distribution. Do to the nature of the project, it would be desirable to use include-package-data = false and instead specify the wanted files directly.

This PR slightly modifies the schema to allow for that. If accepted, it would be great if a new version could be released so I include these changes in setuptools.

https://github.com/typeshed-internal/stub_uploader/blob/cbdc0b85972e476b4dedbd626cd12dadf7dd60a9/stub_uploader/build_wheel.py#L51-L84

@cdce8p

cdce8p commented Mar 31, 2025

Copy link
Copy Markdown
Contributor Author

/CC @abravalheri

@abravalheri

Copy link
Copy Markdown
Owner

Sorry for the delay.

I think that this is very reasonable. Can we somehow re-use the existing definition?

https://github.com/abravalheri/validate-pyproject/blob/v0.24.1/src/validate_pyproject/plugins/setuptools.schema.json#L251-L261?

If accepted, it would be great if a new version could be released so I include these changes in setuptools.

Sure, it may take a little bit however, I want to wait for the situation to stabilise.

@cdce8p

cdce8p commented Apr 23, 2025

Copy link
Copy Markdown
Contributor Author

Sorry for the late reply. I somehow lost track of it.

I think that this is very reasonable. Can we somehow re-use the existing definition?

https://github.com/abravalheri/validate-pyproject/blob/v0.24.1/src/validate_pyproject/plugins/setuptools.schema.json#L251-L261?

Unless I'm missing something, it isn't really possible unfortunately. While the package name itself is normalized by setuptools, I don't think the name in the package-data dict is. Thus the best we can probably do is add a separate validation function for it to reduce duplications. See 80a6d7d

@cdce8p

cdce8p commented Aug 5, 2025

Copy link
Copy Markdown
Contributor Author

@abravalheri Did you had a chance to look at this again?

@abravalheri

abravalheri commented Aug 6, 2025

Copy link
Copy Markdown
Owner

Hi @cdce8p, sorry for the delay in getting back to this.

Unless I'm missing something, it isn't really possible unfortunately. While the package name itself is normalized by setuptools, I don't think the name in the package-data dict is. Thus the best we can probably do is add a separate validation function for it to reduce duplications.

I am not sure I follow. I tested the following snippet for package-data and exclude-package-data on top of 4b72449 and the added tests seem to pass as expected1:

        "anyOf": [
          { "$ref": "#/definitions/package-name" },
          { "const": "*" }
        ]

Note that #/definitions/package-name already includes the *-stubs case, so that part is covered.

As for the python-module-name-relaxed format used in the shared definition, I believe it’s consistent with how we validate package and package-dir, so it should be fine. We could technically abstract this further into a new definition in the JSON Schema to reduce repetition even more, but I haven’t found a good name to abstract such a concept, so I think the snippet above is already a reasonable compromise2.

Personally, I’d prefer keeping this logic in the JSON Schema itself rather than introducing another format function. If you’re okay with that, I can push my local copy with these changes.

Footnotes

  1. The only error I got testing locally on 3.12 was FAILED tests/test_pre_compile.py::test_invalid_examples_api[store/ruff-unknown.toml-cli_pre_compile] - urllib.error.URLError: <urlopen error [Errno 11] Resource temporarily unavailable> which is unrelated to the schema changes.

  2. Maybe python-module-name-relaxed-or-wildcard (based on your suggested function)... It is a big name, but it does a good job.

@cdce8p

cdce8p commented Aug 6, 2025

Copy link
Copy Markdown
Contributor Author

I am not sure I follow. I tested the following snippet for package-data and exclude-package-data on top of 4b72449 and the added tests seem to pass as expected:

        "anyOf": [
          { "$ref": "#/definitions/package-name" },
          { "const": "*" }
        ]

Note that #/definitions/package-name already includes the *-stubs case, so that part is covered.

As for the python-module-name-relaxed format used in the shared definition, I believe it’s consistent with how we validate package and package-dir, so it should be fine.

Not quite, the point I'm worried about is that the package name itself is normalized by setuptools, however the package-data module name isn't. Even with python_module_name_relaxed aborting early for stubs packages, this would still allow invalid names for modules inside the package-data dict. E.g.

[project]
name = "some-project"

[tool.setuptools.package-data]
"some-project" = ["*.yaml"]

Files stored in some_project/.... Validation would succeed while the data files are not included.

@cdce8p

cdce8p commented Oct 23, 2025

Copy link
Copy Markdown
Contributor Author

Any update?

@abravalheri

Copy link
Copy Markdown
Owner

Sorry for the delay,

```toml
[project]
name = "some-project"

[tool.setuptools.package-data]
"some-project" = ["*.yaml"]

Files stored in some_project/.... Validation would succeed while the data files are not included.

So if I understand correctly, the error you are concerned about is when the person writes some-project but the actual directory uses underscore instead of dashes (or vice versa)...

This is a tricky one... but since both variations are allowed to coexist and they are both individually considered valid packages, I don't think we can enforce that distinction. This was a feature request from the setuptools issue tracker (if I am not wrong), previously only _ were allowed.

Now both some-project or some_project are allowed to exist, so they both can have data files inside of them...

@cdce8p

cdce8p commented Nov 27, 2025

Copy link
Copy Markdown
Contributor Author

So if I understand correctly, the error you are concerned about is when the person writes some-project but the actual directory uses underscore instead of dashes (or vice versa)...

Correct.

This is a tricky one... but since both variations are allowed to coexist and they are both individually considered valid packages, I don't think we can enforce that distinction. This was a feature request from the setuptools issue tracker (if I am not wrong), previously only _ were allowed.

Now both some-project or some_project are allowed to exist, so they both can have data files inside of them...

I think of it differently. On is the project name itself whereas the other references the directory. So I think the distinction does make sense. Unless I'm missing something, there are basically two options now

  1. Leave the PR as is. This enforces that the name in package-data matches the directory exactly (without normalization). This is how setuptools handles it currently. The PR would only allow the added -stubs suffix.
  2. Or we could relax the validation for package-data names. In that case though I believe we should definitely add the normalization to setuptools. Otherwise users might encounter situations where package files aren't included when they should have been, without any errors from setuptools.

Personally, I'd recommend option 1. If there is a desire for package-data normalization, this could be added separately later.

@abravalheri

abravalheri commented Nov 28, 2025

Copy link
Copy Markdown
Owner

Thanks for the clarification. I think we have some partial alignment, but I would like to refine one point for accuracy (and comment on its consequences):

This enforces that the name in package-data matches the directory exactly (without normalization). This is how setuptools handles it currently.

This statement seem to imply that "setuptools enforces the name to match the directory", which would not be entirely precise. What setuptools actually does is: if the directory name matches the package name in the configuration (exactly, without normalisation), then the files are included. If they don't match, setuptools simply doesn't include them. It doesn't actively enforce the match beyond that behaviour.

It's also worth noting that setuptools allows two distinct package names that differ only by _ vs -. For example, pip-run and pip_run are both valid and treated as separate packages, each capable of having its own package data.

This flexibility was introduced because of a feature request from setuptools: previously, validate-pyproject enforced normalisation to _ (since identifiers in import statements require underscores). But the FR highlighted real-world use cases where dashes are useful/desired: pip-run being the canonical example. Both importlib.import_module and runpy.run_module handle names with dashes correctly, so the request was accepted.

Here's a quick demo showing Python treating them as distinct:

# docker run --rm -it python:3.12-bookworm /bin/bash

mkdir -p /tmp/pkg/pip_run
echo "print('1: this is pip_run')" > /tmp/pkg/pip_run/__main__.py

mkdir -p /tmp/pkg/pip-run
echo "print('2: this is pip-run')" > /tmp/pkg/pip-run/__main__.py

cd /tmp/pkg
python -m pip_run

# 1: this is pip_run
python -c 'print(__import__("importlib").import_module("pip_run"))'
# <module 'pip_run' (namespace) from ['/tmp/pkg/pip_run']>

python -m pip-run
# 2: this is pip-run
python -c 'print(__import__("importlib").import_module("pip-run"))'
# <module 'pip-run' (namespace) from ['/tmp/pkg/pip-run']>

(not saying that this is a good or bad thing, just illustrating existing behaviour).

(Sorry for leaning on pip-run. It's the example from the original FR.)

So, let's assume a hypothetical package wants to offer a "normal" API access. It would likely have pkg_name/py.typed or pkg_name/__init__.pyi.

If the same package also want to have a convenient CLI shortcut via -m, it might want to include pkg-name/py.typed or pkg-name/__init__.pyi. In practice, projects may choose to have all four files coexisting inside a single wheel.


At the end of the day allowing - or applying normalisation would be mutually exclusive options. Since setuptools and validate-pyproject do allow - (as initially requested in the pip-run-related FR), I think it make sense to keep following that line, and use consistent validation for package names in packages = [...] and package-data = { ... } in pyproject.toml.

@cdce8p

cdce8p commented Nov 29, 2025

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed explanation! It seems I've got some things messed up along the way. If I understand you correctly, your suggestion is to reuse the package-name definition, which is already used for packages and package-dir, here. Correct?

Just pushed a new commit for it.

@cdce8p

cdce8p commented Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

@abravalheri Any update?

@abravalheri abravalheri merged commit c55827b into abravalheri:main Jan 21, 2026
22 checks passed
@abravalheri

Copy link
Copy Markdown
Owner

Sure, sorry for the delay :P

(it does not apply immediately to setuptools until we update the code there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants