Skip to content

Conversation

@larsoner
Copy link
Contributor

@larsoner larsoner commented Nov 30, 2023

Distilled based on the conversation in pypa/pip#11835 (comment) and related comments (cc @pfmoore). Happy to modify language or whatever, or close if this recommendation is incorrect!


📚 Documentation preview 📚: https://python-packaging-user-guide--1423.org.readthedocs.build/en/1423/

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a subtle "gotcha" that probably warrants a warning, but we need to be careful not to make too much of this particular issue. After all, it's basically just saying "if you don't maintain the metadata the way you should, then things will break". Which is fairly self-evident when you put it like that.

I don't expect the people who are writing files into site-packages will even think they need to read this spec - so this probably isn't going to signficantly reduce how often this issue comes up in practice. It's more about raising general awareness, and should be written from that perspective.

@jeanas
Copy link
Contributor

jeanas commented Nov 30, 2023

It's definitely correct that if you dynamically change files in site-packages, then you must update RECORD (BTW, this applies to modifying existing files as well, since RECORD contains hashes). However, I think that this text could lead to the impression that dynamically changing installed packages is officially supported (as long as you do update RECORD). Is there consensus on that? It seems really fragile and janky, to me anyway. What if site-packages is read-only? What if a Python process unrelated to the package that is updating itself is running at the same time (maybe pip, maybe something that uses importlib APIs) and this creates a race condition? Also, what about environment reproducibility?

@jeanas
Copy link
Contributor

jeanas commented Nov 30, 2023

Sorry, I started writing before @pfmoore posted and didn't see his comments.

@pfmoore
Copy link
Member

pfmoore commented Nov 30, 2023

However, I think that this text could lead to the impression that dynamically changing installed packages is officially supported (as long as you do update RECORD). Is there consensus on that?

I think the opposite is true, the presumption is that only installers should be writing to package directories. It's not a rule (how would you even define it without having to pin down what counts as an installer?) but it's definitely baked into how people think about things.

I'd be a strong -1 on anything that suggested modifying installed packages was supported. Do it if you have to, but when everything breaks, you have nobody to blame but yourself :-) And in particular, don't expect us to tell you what you need to do in order to do it safely.

@jeanas
Copy link
Contributor

jeanas commented Nov 30, 2023

Here's my suggestion:

diff --git a/source/specifications/recording-installed-packages.rst b/source/specifications/recording-installed-packages.rst
index 2610717d..005b6e5b 100644
--- a/source/specifications/recording-installed-packages.rst
+++ b/source/specifications/recording-installed-packages.rst
@@ -110,6 +110,16 @@ The RECORD file
 The ``RECORD`` file holds the list of installed files.
 It is a CSV file containing one record (line) per installed file.
 
+.. note::
+
+   It is *strongly discouraged* for an installed package to modify itself
+   (e.g., store cache files under its namespace in ``site-packages``).
+   Changes inside ``site-packages`` should be left to specialized installer
+   tools such as pip. If a package is nevertheless modified in this way,
+   then the ``RECORD`` must be updated, otherwise uninstalling the package
+   will leave unlisted files in place (possibly resulting in a zombie
+   namespace package).
+
 The CSV dialect must be readable with the default ``reader`` of Python's
 ``csv`` module:
 

@larsoner
Copy link
Contributor Author

larsoner commented Dec 1, 2023

👍 works for me! @jeanas do you want to open a new PR, comment inline as a suggestion so you can get credit, or want me to just copy-paste here? Would be nice for you to get credit but if you don't care then I can easily copy-paste

@jeanas
Copy link
Contributor

jeanas commented Dec 1, 2023

Just copy paste, it doesn't matter to me.

@larsoner
Copy link
Contributor Author

larsoner commented Dec 1, 2023

Put in @jeanas wording, @pfmoore do you want to look?

@willingc willingc enabled auto-merge December 1, 2023 22:02
@willingc willingc added this pull request to the merge queue Dec 1, 2023
@willingc willingc added the diataxis: reference Information oriented technical description (https://diataxis.fr/reference/) label Dec 1, 2023
Merged via the queue into pypa:main with commit b461de8 Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diataxis: reference Information oriented technical description (https://diataxis.fr/reference/)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants