Skip to content

[CI] Prevent tmp_src test fixture from copying the .git/.tox and other heavy folders#2922

Closed
abravalheri wants to merge 2 commits intopypa:mainfrom
abravalheri:fix-tmp-src
Closed

[CI] Prevent tmp_src test fixture from copying the .git/.tox and other heavy folders#2922
abravalheri wants to merge 2 commits intopypa:mainfrom
abravalheri:fix-tmp-src

Conversation

@abravalheri
Copy link
Copy Markdown
Contributor

@abravalheri abravalheri commented Dec 11, 2021

The tmp_src fixture copies the setuptools directory to prevent errors that appear when running tests concurrently. However it seems that is copying everything including the .git directory (and possibly others like .tox). These directories can be quite heavy and error prone to copy.

The changes introduced here prevent copying these unnecessary folders/files. As a side effect, the tests should run slightly faster.

Summary of changes

  • In the tmp_src fixture, instead of blindly using shutil.copytree, glob the src directory and copy entry-by-entry, if they are not hidden or unnecessary file/folders.

Closes #2921

Pull Request Checklist

The `tmp_src` fixture copies the setuptools directory to prevent errors
that appear when running tests concurrently. However it seems that is
copying everything including the `.git` directory (and possibly others
like `.tox`). These directories can be quite heavy and error prone to
copy.

The changes introduced here prevent copying these unnecessary
folders/files. As a side effect, the tests should run slightly faster.
@abravalheri abravalheri changed the title [CI] Prevent tests from copying the .git/.tox and other heavy folders [CI] Prevent tmp_src test fixture from copying the .git/.tox and other heavy folders Dec 11, 2021
@abravalheri abravalheri marked this pull request as ready for review December 11, 2021 18:26
@jaraco
Copy link
Copy Markdown
Member

jaraco commented Dec 18, 2021

There is value to copying the git metadata. If Setuptools were using setuptools_scm for file discovery or version attribution (both of which I'd like to adopt here if not for bootstrapping issues), the git directory would be necessary. If this PR is adopted, it will introduce an impediment to adopting setuptools_scm. I could also imagine other scenarios down the road where a .-prefixed file or the docs are pertinent to an invocation.

I definitely see the value in excluding .tox and maybe build, but I imagine the cost of copying all of the other trees is negligible, especially given that it's the necessary .git directory that's the main offender.

I do note that checking out a shallower version of the repo could save some (most?) of this cost.

I wonder if it would be possible instead, in order to avoid copying the source tree, to simply implement a lock, so that only one test at a time can rely on the source tree, saving the copying altogether (but losing the benefit of parallelism for those tests). I wonder if pytest-xdist has a feature that you can take tests that should not be run together - that would be even more efficient, allowing xdist to chose tests that can be run in parallel while deferring those that would conflict.

@abravalheri
Copy link
Copy Markdown
Contributor Author

abravalheri commented Dec 21, 2021

Thank you very much for the review and discussion @jaraco, I can clearly (and agree with) your points.

I do note that checking out a shallower version of the repo could save some (most?) of this cost.

In some cases that might help, but the .git directory is always tricky to copy around... I can definitely see the current CI infrastructure struggling with it from times to times (e.g. https://github.com/pypa/setuptools/runs/4590066172?check_suite_focus=true#step:5:113, https://github.com/pypa/setuptools/runs/4605477320?check_suite_focus=true#step:5:121 - other timeout errors may also be related to it).

I wonder if it would be possible instead, in order to avoid copying the source tree, to simply implement a lock

It seems that the latest versions of pytest-xdist have a --dist loadgroup feature that "kind of works as a lock". All tests marked with the same group should be executed by the same worker. This might be something worth exploring, however there is a risk of slowing down the tests.

Another approach that can be worthy exploring is setting (for each test) the following options in a pydistutils.cfg style file:

egg_info.egg_base
sdist.dist_dir
build.dist_dir
build.build_base
# Any other missing? Other commands seem to derive their 
# configuration from these ones when not explicitly provided

I believe these are the directories that end up being populated during the build, right?

The approach with the pydistutils.cfg seems to be the one that would deliver better isolation (the rationale is that, even if we use xdist loadgroups, a file/folder left behind from one test might influence the other)...

What would be your opinion on this?

@abravalheri
Copy link
Copy Markdown
Contributor Author

I think I found a better approach, so I will close this PR in favour of #2968.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] tmp_src fixture seems to be copying everything including .git

2 participants