Add dependency on `pathlib-abc` by barneygale · Pull Request #132 · jaraco/zipp

barneygale · 2025-03-24T17:24:59Z

Make zipp.Path subclass pathlib_abc.ReadablePath. This allows us to remove implementations of read_text(), read_bytes(), glob(), joinpath() and __truediv__(), and to simplify implementations of a couple of few more methods.

Maintain a tree of PathInfo objects representing the hierarchy of zip file members. We traverse the tree whenever we need to resolve a path to a ZipInfo object. This effectively hides the .zip-specific quirk that directories are recorded with a trailing slash in their filenames.

Adjust __str__() so that it doesn't raise for unnamed zip files. Instead it returns a string beginning with :fileobj:. The prefix is made available as a new anchor attribute.

Add the following methods/attributes, which are required by ReadablePath:

info
parser
__open_rb__()
readlink()
with_segments()

Disable the following ReadablePath methods/attributes that we don't (yet) test in zipp:

anchor
parts
parents
__rtruediv__()
with_name()
with_stem()
with_suffix()
full_match()
walk()
copy()
copy_into()

barneygale · 2025-03-24T22:33:52Z

Jason, this might be a good opportunity for me to say "thank you" for zipp! It's a gem of a package and it was the original inspiration for my work on pathlib ~5 years ago, which led to me becoming a CPython triager and then a developer. Thank you for the splendid inspiration :-)

I hope I can convince you that this PR is a good idea. To summarize the improvements:

It provides a glob() implementation that's 100% compatible with stdlib pathlib/glob.
It solves all issues around trailing slashes for directories. For example, zipp.Path(..., at=at).is_dir() now returns the correct value whether or not at has a trailing slash!
It adds a few attributes/methods that are useful to users, with more available but disabled
- We could enable walk(), which is pretty handy!
- copy() and copy_into() would allow zip files to be extracted with zipp.Path(...).copy(pathlib.Path(...)) (3.14+ only)
It's a stepping stone towards full write support
- We could also subclass WritablePath and add mkdir() and symlink_to() methods
- This would allow zip files to be populated with pathlib.Path(...).copy(zipp.Path(...)) (3.14+ only)
- Copies between zip files can be done with zipp.Path(...).copy(zipp.Path(...))
~~It removes 250 lines of code!~~
- Edit: I've restored most of the code I deleted because I realised much of it is public and useful independently of zipp.Path

I've tried to keep the changes minimal. I'd love to hear what you think, when you have time. Thanks again!

jaraco

This is great. I love what you're thinking here, and I love the elegance that comes out of it (especially in the Path class). Let's work through the details and some of the questions, but I'm confident we can get something very similar to this merged.

pyproject.toml

tests/test_path.py

zipp/__init__.py

jaraco · 2025-04-26T14:04:04Z

pyproject.toml

 ]
 requires-python = ">=3.9"
 dependencies = [
+	"pathlib-abc == 0.4.1"


I'm not sure zipp can have dependencies. Adding the first dependency is always the worst. I can see that I'd previously vendored packages to avoid having dependencies. Perhaps that was simply because of the need to port into CPython, in which case this dependency won't be an issue.

@jaraco As you say, in CPython it should be OK because you can pull in the relevant stuff from pathlib.types and pathlib._os.

But I think you might have a point about the PyPI package. zipp is core packaging infrastructure, and if it depends on pathlib-abc then that becomes core infra too. But the testing/publishing setup isn't as sophisticated as zipp's, and you'd be dependent on me to release new versions as you need.

Just a thought: would you be interested in co-maintaining pathlib-abc? It would be terrific if you can bring it up to the same standard as your other packages, but I understand completely if you'd rather not, or don't have the time.

jaraco · 2025-04-26T14:31:53Z

zipp/__init__.py

+    def __reduce__(self):
+        return (self.__class__, (self.root.filename, self.at))


This change replaces the InitializationState mix-in, which specifically aimed to reconstruct the object from any inputs that were pickleable, so if the ZipFile was supplied and it was somehow pickleable, the Path would also be pickleable. With this change, that assumption is lost and pickleability is limited to zipfiles that have a meaningful filename.

This change feels to me to be unrelated to the adoption of pathlib-abc. How does it relate?

Right! Thanks for the explanation - I didn't quite understand what InitializationState was doing. I'll restore previous behaviour.

BTW what's the use case for pickleable zipp.Path objects? It took me a bit by surprise because obviously zipfile.ZipFile isn't pickleable

I've "fixed" this by recording the initial argument as Path._initial_arg and propagating that to derived paths. I guess I don't really see the point of turning it into a whole class with its own backing module - isn't that something of an over-generalisation of something used only once? Perhaps I'm missing something

BTW what's the use case for pickleable zipp.Path objects? It took me a bit by surprise because obviously zipfile.ZipFile isn't pickleable

#81 describes the motivation.

I've "fixed" this by recording the initial argument as Path._initial_arg and propagating that to derived paths. I guess I don't really see the point of turning it into a whole class with its own backing module - isn't that something of an over-generalisation of something used only once? Perhaps I'm missing something

My motivation for creating the standalone class was to separate concerns. By making it an independent class, it allows that functionality to be selectively added or removed, and it makes it easier to see what bits affect Pickleability (instead of reading through the whole code of Path to try to infer which bits are relevant). I try to avoid entangling concerns wherever possible. It also makes it easier to detect when that behavior is altered (helps avoid unintentional regressions). It does also have the benefit of potentially being generalizable and reusable, which could prove useful should another project ask for something similar.

Gotcha, thanks for the extra detail. I've restored the InitializedState implementation.

But the proposed zipp.Path implementation doesn't use CompleteDirs nor FastLookup, and I don't think it can inherit InitializedState directly. So I think I still need a Path.__reduce__ method, unless I'm missing something

There's little value in retaining the InitializedState if it's not somehow inherited by the Path class. I wonder why Path couldn't inherit from InitializedState. If that class can't be used to add pickleability to zipfile.ZipFile (or whatever property of Path that isn't pickleable), then maybe it's not usable. It gets complicated because Path encapsulates ZipFile, so pickleability concerns are entangled. I'll take another look.

I wonder why Path couldn't inherit from InitializedState.

The problem comes when you want to create derived paths, e.g. from Path.parent or Path.iterdir(). Those methods call with_segments() (previously _next()). That method calls something like:

return self.__class__(self.root, at)

Where self.root is a ZipFile object, not a filename string, because you don't want to re-open the zip file with every derived path. But if zipp.Path were to inherit InitializedState, then only the ZipFile object (not the str) would be captured by InitializedState, and so any derived paths wouldn't be pickleable.

There's little value in retaining the InitializedState if it's not somehow inherited by the Path class.

The only fly in the ointment is that zipfile exposes CompleteDirs, which inherits InitializedState, so arguably we need to go through a deprecation period first

zipp/__init__.py

jaraco · 2025-04-26T15:55:17Z

I've pushed a few changes to address some of the nitpicky linter and other obvious fixes. I've saved those changes in a separate branch, so it's no problem if you wish to re-write anything and force push.

I've also noticed there are some missed lines in the coverage:

Name                       Stmts   Miss  Cover   Missing
--------------------------------------------------------
conftest.py                    4      0   100%
docs/conf.py                  15      0   100%
tests/__init__.py              0      0   100%
tests/_support.py              5      0   100%
tests/_test_params.py         21      0   100%
tests/compat/__init__.py       0      0   100%
tests/compat/py38.py           2      0   100%
tests/compat/py39.py           4      0   100%
tests/compat/py310.py          4      0   100%
tests/test_complexity.py      52      1    98%   30
tests/test_path.py           435      3    99%   193-195
tests/write-alpharep.py        2      0   100%
zipp/__init__.py             154      4    97%   81, 245, 309, 314
zipp/compat/__init__.py        0      0   100%
zipp/compat/overlay.py        11      0   100%
zipp/compat/py310.py           3      0   100%
--------------------------------------------------------
TOTAL                        712      8    99%

It would be nice to get back to 100%.

jaraco · 2025-04-26T15:57:19Z

One failure I see when I build locally is test_encoding_warnings:

____________________________________________________________ TestPath.test_encoding_warnings ____________________________________________________________

self = <tests.test_path.TestPath testMethod=test_encoding_warnings>, alpharep = <zipfile.ZipFile file=<_io.BytesIO object at 0x1057ceb60> mode='w'>

    @unittest.skipIf(
        not getattr(sys.flags, 'warn_default_encoding', 0),
        "Requires warn_default_encoding",
    )
    @pass_alpharep
    def test_encoding_warnings(self, alpharep):
        """EncodingWarning must blame the read_text and open calls."""
        assert sys.flags.warn_default_encoding
        root = zipfile.Path(alpharep)
        with self.assertWarns(EncodingWarning) as wc:  # noqa: F821 (astral-sh/ruff#13296)
            root.joinpath("a.txt").read_text()
>       assert __file__ == wc.filename
E       AssertionError: assert '/Users/jarac.../test_path.py' == '/Users/jarac...ib_abc/_os.py'
E         
E         - /Users/jaraco/code/jaraco/zipp/.tox/py/lib/python3.13/site-packages/pathlib_abc/_os.py
E         + /Users/jaraco/code/jaraco/zipp/tests/test_path.py

tests/test_path.py:192: AssertionError

It's important that the stack level is set correctly so it warns at the the right location.

barneygale · 2025-04-27T16:54:26Z

I've opened a CPython PR to fix test_encoding_warnings: python/cpython#133051

barneygale · 2025-04-27T17:06:56Z

Thanks for the review btw, much appreciated. I have plenty to be getting on with :)

If you wouldn't mind, please could you review #135? I think we need to fix #134 somehow or other before this PR will be ready to land.

jaraco · 2025-05-04T22:58:46Z

I took a quick refresh of this review. It looks like it's progressing nicely. I may have time this week to work on this more. It's top of my list.

barneygale · 2025-05-05T10:48:09Z

I took a quick refresh of this review. It looks like it's progressing nicely. I may have time this week to work on this more. It's top of my list.

No rush at all mate, this is a big change and I understand completely if you can only review occasionally

barneygale · 2025-05-05T13:35:11Z

Here's a CPython PR that replaces __str__() with __vfspath__() in the pathlib ABCs: python/cpython#133437

zipp/__init__.py

Make `zipp.Path` subclass `pathlib_abc.ReadablePath`. This allows us to remove implementations of `read_text()`, `read_bytes()`, `glob()`, `joinpath()` and `__truediv__()`, and to simplify implementations of a couple of few more methods. Maintain a tree of `PathInfo` objects representing the hierarchy of zip file members. We traverse the tree whenever we need to resolve a path to a `ZipInfo` object. This effectively hides the `.zip`-specific quirk that directories are recorded with a trailing slash in their filenames. Adjust `__str__()` so that it doesn't raise for unnamed zip files. Instead it returns a string beginning with `:fileobj:`. The prefix is made available as a new `anchor` attribute. Add the following methods/attributes, which are required by `ReadablePath`: - `info` - `parser` - `__open_rb__()` - `readlink()` - `with_segments()` Disable the following `ReadablePath` methods/attributes that we don't (yet) test in zipp: - `anchor` - `parts` - `parents` - `__rtruediv__()` - `with_name()` - `with_stem()` - `with_suffix()` - `full_match()` - `walk()` - `copy()` - `copy_into()`

barneygale · 2025-07-27T14:13:07Z

Hey @jaraco, I think this is reviewable now.

I've published a new version of pathlib-abc that adds a JoinablePath.__vfspath__() method, which replaces __str__() as an abstract method. Consequently this patch no longer makes any changes to zipp.Path.__str__().

I haven't yet made any zipp changelog/docs changes. Let me know when you think they're due.

Coverage is reduced because InitializedState, CompleteDirs, FastLookup, save_method_args, Translator and separator are no longer used by Path. I'm not sure whether to write new tests for them or remove them from zipp - let me know what you thinks best?

barneygale · 2025-08-18T23:35:14Z

It would be good to land #149 first, as it would simplify this patch.

barneygale mentioned this pull request Mar 24, 2025

glob('**') returns all files and directories #102

Open

barneygale mentioned this pull request Apr 10, 2025

Make pathlib ABCs usable by zipfile.Path python/cpython#128520

Open

17 tasks

jaraco reviewed Apr 26, 2025

View reviewed changes

barneygale mentioned this pull request May 3, 2025

zipfile: file type issues python/cpython#133324

Open

barneygale mentioned this pull request May 5, 2025

zipfile: zipfile.Path’s glob() and rglob() are not documented python/cpython#133360

Open

jaraco mentioned this pull request May 7, 2025

Path.name returns the absolute file path on Windows #133

Closed

jaraco reviewed May 26, 2025

View reviewed changes

zipp/__init__.py Show resolved Hide resolved

barneygale force-pushed the pathlib-abc branch from f662822 to a25b331 Compare July 27, 2025 13:31

barneygale added 2 commits July 27, 2025 14:46

Skip test in old PyPy versions

fe112ba

Lint

5811902

barneygale requested a review from jaraco July 27, 2025 14:13

		def __reduce__(self):
		return (self.__class__, (self.root.filename, self.at))

Uh oh!

Conversation

barneygale commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barneygale commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaraco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jaraco Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

barneygale May 16, 2025

Choose a reason for hiding this comment

Uh oh!

jaraco Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

barneygale Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

barneygale Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaraco May 4, 2025

Choose a reason for hiding this comment

Uh oh!

barneygale May 5, 2025

Choose a reason for hiding this comment

Uh oh!

jaraco May 7, 2025

Choose a reason for hiding this comment

Uh oh!

barneygale May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barneygale May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jaraco commented Apr 26, 2025

Uh oh!

jaraco commented Apr 26, 2025

Uh oh!

barneygale commented Apr 27, 2025

Uh oh!

barneygale commented Apr 27, 2025

Uh oh!

jaraco commented May 4, 2025

Uh oh!

barneygale commented May 5, 2025

Uh oh!

barneygale commented May 5, 2025

Uh oh!

Uh oh!

barneygale commented Jul 27, 2025

Uh oh!

barneygale commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

barneygale commented Mar 24, 2025 •

edited

Loading

barneygale commented Mar 24, 2025 •

edited

Loading

barneygale Apr 27, 2025 •

edited

Loading

barneygale May 10, 2025 •

edited

Loading

barneygale May 10, 2025 •

edited

Loading