Conversation
|
Jason, this might be a good opportunity for me to say "thank you" for I hope I can convince you that this PR is a good idea. To summarize the improvements:
I've tried to keep the changes minimal. I'd love to hear what you think, when you have time. Thanks again! |
jaraco
left a comment
There was a problem hiding this comment.
This is great. I love what you're thinking here, and I love the elegance that comes out of it (especially in the Path class). Let's work through the details and some of the questions, but I'm confident we can get something very similar to this merged.
pyproject.toml
Outdated
| ] | ||
| requires-python = ">=3.9" | ||
| dependencies = [ | ||
| "pathlib-abc == 0.4.1" |
There was a problem hiding this comment.
I'm not sure zipp can have dependencies. Adding the first dependency is always the worst. I can see that I'd previously vendored packages to avoid having dependencies. Perhaps that was simply because of the need to port into CPython, in which case this dependency won't be an issue.
There was a problem hiding this comment.
@jaraco As you say, in CPython it should be OK because you can pull in the relevant stuff from pathlib.types and pathlib._os.
But I think you might have a point about the PyPI package. zipp is core packaging infrastructure, and if it depends on pathlib-abc then that becomes core infra too. But the testing/publishing setup isn't as sophisticated as zipp's, and you'd be dependent on me to release new versions as you need.
Just a thought: would you be interested in co-maintaining pathlib-abc? It would be terrific if you can bring it up to the same standard as your other packages, but I understand completely if you'd rather not, or don't have the time.
zipp/__init__.py
Outdated
| def __reduce__(self): | ||
| return (self.__class__, (self.root.filename, self.at)) |
There was a problem hiding this comment.
This change replaces the InitializationState mix-in, which specifically aimed to reconstruct the object from any inputs that were pickleable, so if the ZipFile was supplied and it was somehow pickleable, the Path would also be pickleable. With this change, that assumption is lost and pickleability is limited to zipfiles that have a meaningful filename.
This change feels to me to be unrelated to the adoption of pathlib-abc. How does it relate?
There was a problem hiding this comment.
Right! Thanks for the explanation - I didn't quite understand what InitializationState was doing. I'll restore previous behaviour.
BTW what's the use case for pickleable zipp.Path objects? It took me a bit by surprise because obviously zipfile.ZipFile isn't pickleable
There was a problem hiding this comment.
I've "fixed" this by recording the initial argument as Path._initial_arg and propagating that to derived paths. I guess I don't really see the point of turning it into a whole class with its own backing module - isn't that something of an over-generalisation of something used only once? Perhaps I'm missing something
There was a problem hiding this comment.
BTW what's the use case for pickleable
zipp.Pathobjects? It took me a bit by surprise because obviouslyzipfile.ZipFileisn't pickleable
#81 describes the motivation.
I've "fixed" this by recording the initial argument as
Path._initial_argand propagating that to derived paths. I guess I don't really see the point of turning it into a whole class with its own backing module - isn't that something of an over-generalisation of something used only once? Perhaps I'm missing something
My motivation for creating the standalone class was to separate concerns. By making it an independent class, it allows that functionality to be selectively added or removed, and it makes it easier to see what bits affect Pickleability (instead of reading through the whole code of Path to try to infer which bits are relevant). I try to avoid entangling concerns wherever possible. It also makes it easier to detect when that behavior is altered (helps avoid unintentional regressions). It does also have the benefit of potentially being generalizable and reusable, which could prove useful should another project ask for something similar.
There was a problem hiding this comment.
Gotcha, thanks for the extra detail. I've restored the InitializedState implementation.
But the proposed zipp.Path implementation doesn't use CompleteDirs nor FastLookup, and I don't think it can inherit InitializedState directly. So I think I still need a Path.__reduce__ method, unless I'm missing something
There was a problem hiding this comment.
There's little value in retaining the InitializedState if it's not somehow inherited by the Path class. I wonder why Path couldn't inherit from InitializedState. If that class can't be used to add pickleability to zipfile.ZipFile (or whatever property of Path that isn't pickleable), then maybe it's not usable. It gets complicated because Path encapsulates ZipFile, so pickleability concerns are entangled. I'll take another look.
There was a problem hiding this comment.
I wonder why
Pathcouldn't inherit fromInitializedState.
The problem comes when you want to create derived paths, e.g. from Path.parent or Path.iterdir(). Those methods call with_segments() (previously _next()). That method calls something like:
return self.__class__(self.root, at)Where self.root is a ZipFile object, not a filename string, because you don't want to re-open the zip file with every derived path. But if zipp.Path were to inherit InitializedState, then only the ZipFile object (not the str) would be captured by InitializedState, and so any derived paths wouldn't be pickleable.
There was a problem hiding this comment.
There's little value in retaining the
InitializedStateif it's not somehow inherited by thePathclass.
The only fly in the ointment is that zipfile exposes CompleteDirs, which inherits InitializedState, so arguably we need to go through a deprecation period first
|
I've pushed a few changes to address some of the nitpicky linter and other obvious fixes. I've saved those changes in a separate branch, so it's no problem if you wish to re-write anything and force push. I've also noticed there are some missed lines in the coverage: It would be nice to get back to 100%. |
|
One failure I see when I build locally is test_encoding_warnings: It's important that the stack level is set correctly so it warns at the the right location. |
|
I've opened a CPython PR to fix |
|
I took a quick refresh of this review. It looks like it's progressing nicely. I may have time this week to work on this more. It's top of my list. |
No rush at all mate, this is a big change and I understand completely if you can only review occasionally |
|
Here's a CPython PR that replaces |
Make `zipp.Path` subclass `pathlib_abc.ReadablePath`. This allows us to remove implementations of `read_text()`, `read_bytes()`, `glob()`, `joinpath()` and `__truediv__()`, and to simplify implementations of a couple of few more methods. Maintain a tree of `PathInfo` objects representing the hierarchy of zip file members. We traverse the tree whenever we need to resolve a path to a `ZipInfo` object. This effectively hides the `.zip`-specific quirk that directories are recorded with a trailing slash in their filenames. Adjust `__str__()` so that it doesn't raise for unnamed zip files. Instead it returns a string beginning with `:fileobj:`. The prefix is made available as a new `anchor` attribute. Add the following methods/attributes, which are required by `ReadablePath`: - `info` - `parser` - `__open_rb__()` - `readlink()` - `with_segments()` Disable the following `ReadablePath` methods/attributes that we don't (yet) test in zipp: - `anchor` - `parts` - `parents` - `__rtruediv__()` - `with_name()` - `with_stem()` - `with_suffix()` - `full_match()` - `walk()` - `copy()` - `copy_into()`
|
Hey @jaraco, I think this is reviewable now. I've published a new version of pathlib-abc that adds a I haven't yet made any zipp changelog/docs changes. Let me know when you think they're due. Coverage is reduced because |
|
It would be good to land #149 first, as it would simplify this patch. |
Make
zipp.Pathsubclasspathlib_abc.ReadablePath. This allows us to remove implementations ofread_text(),read_bytes(),glob(),joinpath()and__truediv__(), and to simplify implementations of a couple of few more methods.Maintain a tree of
PathInfoobjects representing the hierarchy of zip file members. We traverse the tree whenever we need to resolve a path to aZipInfoobject. This effectively hides the.zip-specific quirk that directories are recorded with a trailing slash in their filenames.Adjust
__str__()so that it doesn't raise for unnamed zip files. Instead it returns a string beginning with:fileobj:. The prefix is made available as a newanchorattribute.Add the following methods/attributes, which are required by
ReadablePath:infoparser__open_rb__()readlink()with_segments()Disable the following
ReadablePathmethods/attributes that we don't (yet) test in zipp:anchorpartsparents__rtruediv__()with_name()with_stem()with_suffix()full_match()walk()copy()copy_into()