Skip to content

Use global cache directory#5122

Closed
Thomasdezeeuw wants to merge 6 commits intomainfrom
thomas/global_cache
Closed

Use global cache directory#5122
Thomasdezeeuw wants to merge 6 commits intomainfrom
thomas/global_cache

Conversation

@Thomasdezeeuw
Copy link
Contributor

@Thomasdezeeuw Thomasdezeeuw commented Jun 15, 2023

Summary

This is based on #5117.

Updates #1292.

Falling back to the old .ruff_cache directory in the current directory
if the global cache directory can't be determined.

For reference a number of other Python tools also use global caches (by
default):

Python projects that don't:

Other tools:

Test Plan

Existing tests should keep working.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 15, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      6.4±0.02ms     6.4 MB/sec    1.02      6.5±0.01ms     6.3 MB/sec
formatter/numpy/ctypeslib.py               1.00   1309.8±6.23µs    12.7 MB/sec    1.02   1338.3±2.93µs    12.4 MB/sec
formatter/numpy/globals.py                 1.00    127.6±0.28µs    23.1 MB/sec    1.03    131.2±0.90µs    22.5 MB/sec
formatter/pydantic/types.py                1.00      2.6±0.01ms     9.8 MB/sec    1.01      2.6±0.01ms     9.7 MB/sec
linter/all-rules/large/dataset.py          1.11     15.0±2.04ms     2.7 MB/sec    1.00     13.5±0.08ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.3±0.02ms     5.1 MB/sec    1.00      3.3±0.01ms     5.1 MB/sec
linter/all-rules/numpy/globals.py          1.00    416.9±0.88µs     7.1 MB/sec    1.01    420.6±0.64µs     7.0 MB/sec
linter/all-rules/pydantic/types.py         1.00      5.9±0.05ms     4.3 MB/sec    1.00      5.9±0.04ms     4.3 MB/sec
linter/default-rules/large/dataset.py      1.00      6.6±0.03ms     6.1 MB/sec    1.00      6.6±0.03ms     6.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1430.1±3.72µs    11.6 MB/sec    1.00   1436.1±4.77µs    11.6 MB/sec
linter/default-rules/numpy/globals.py      1.00    161.9±0.26µs    18.2 MB/sec    1.00    161.6±0.18µs    18.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.0±0.01ms     8.5 MB/sec    1.01      3.0±0.01ms     8.5 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      7.7±0.11ms     5.3 MB/sec    1.03      8.0±0.23ms     5.1 MB/sec
formatter/numpy/ctypeslib.py               1.00  1603.3±26.62µs    10.4 MB/sec    1.02  1632.0±28.39µs    10.2 MB/sec
formatter/numpy/globals.py                 1.00    157.6±3.63µs    18.7 MB/sec    1.04    164.5±7.44µs    17.9 MB/sec
formatter/pydantic/types.py                1.00      3.2±0.05ms     7.9 MB/sec    1.02      3.3±0.10ms     7.8 MB/sec
linter/all-rules/large/dataset.py          1.00     15.9±0.24ms     2.6 MB/sec    1.03     16.4±0.21ms     2.5 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      4.2±0.08ms     4.0 MB/sec    1.05      4.4±0.15ms     3.8 MB/sec
linter/all-rules/numpy/globals.py          1.00    506.2±9.03µs     5.8 MB/sec    1.02    517.9±8.60µs     5.7 MB/sec
linter/all-rules/pydantic/types.py         1.00      7.1±0.11ms     3.6 MB/sec    1.06      7.5±0.13ms     3.4 MB/sec
linter/default-rules/large/dataset.py      1.00      8.1±0.10ms     5.0 MB/sec    1.05      8.5±0.14ms     4.8 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1771.2±26.41µs     9.4 MB/sec    1.01  1795.7±24.22µs     9.3 MB/sec
linter/default-rules/numpy/globals.py      1.00    204.3±4.23µs    14.4 MB/sec    1.02    208.0±4.08µs    14.2 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.8±0.06ms     6.8 MB/sec    1.02      3.8±0.06ms     6.6 MB/sec

@Thomasdezeeuw
Copy link
Contributor Author

I forgot to remove the (old) cache initialization:

// Initialize the cache.
if cache.into() {
fn init_cache(path: &Path) {
if let Err(e) = cache::init(path) {
error!("Failed to initialize cache at {}: {e:?}", path.display());
}
}
match pyproject_config.strategy {
PyprojectDiscoveryStrategy::Fixed => {
init_cache(&pyproject_config.settings.cli.cache_dir);
}
PyprojectDiscoveryStrategy::Hierarchical => {
for settings in std::iter::once(&pyproject_config.settings).chain(resolver.iter()) {
init_cache(&settings.cli.cache_dir);
}
}
}
};

Base automatically changed from thomas/cache to main June 19, 2023 15:46
@MichaReiser
Copy link
Member

Can you add an example on how to use Ruff in CI, e.g. together with the github cache action? Would it make sense to automatically detect CI and then use a different directory?

Thomas de Zeeuw added 5 commits June 20, 2023 14:34
This avoids an allocation in the code and in the usage of global the sub
directory isn't really needed.
Uses ensure_tag of the cachedir crate instead of doing the check
ourselves. Also uses std::fs::write to write the .gitignore file.
Not relevant to global cache directories.
@Thomasdezeeuw Thomasdezeeuw marked this pull request as ready for review June 20, 2023 12:48
@Thomasdezeeuw
Copy link
Contributor Author

Can you add an example on how to use Ruff in CI, e.g. together with the github cache action?

When running on the CI you probably want to se the cache-dir argument to use a fixed cache.

Would it make sense to automatically detect CI and then use a different directory?

I'm not sure if we can consistently detect this, but even if we could than we have the problem what directory to use. I don't think GitHub Actions has a default directory that is always cache, I think you need to explicitly set this.

Maybe an example GitHub Actions config file can tackle both issues.

@Thomasdezeeuw
Copy link
Contributor Author

@MichaReiser I believe you were still on the fence on whether or not a global cache directory would be an improvement, have you made a decision on this?

@MichaReiser
Copy link
Member

MichaReiser commented Jun 28, 2023

@MichaReiser I believe you were still on the fence on whether or not a global cache directory would be an improvement, have you made a decision on this?

Thanks for pinging me. Not really. My concern is that this change introduces additional complexity and potentially breaks the workflows of users and, thus, the cost of the change may be too height.

My concerns are:

  • ruff clean now needs an additional option to limit the scope to a single project
  • A user could unintentionally nuke all caches by simply running ruff clean because they assume that the command is scoped to the current project (as are all? other commands). There's no real harm by this, but may be surprising.
  • I would be interested in seeing an example on how to setup the github action (with the cache action) when using the global cache directory (or would we recommend using a local directory?) for a task that runs cross-platform.

@MichaReiser
Copy link
Member

But... this is not a me decision. If the team is in favor then go for it.

@Thomasdezeeuw
Copy link
Contributor Author

Thanks for pinging me. Not really. My concern is that this change introduces additional complexity and potentially breaks the workflows of users and, thus, the cost of the change may be too height.

My concerns are:

* `ruff clean` now needs an additional option to limit the scope to a single project

* A user could unintentionally nuke all caches by simply running `ruff clean` because they assume that the command is scoped to the current project (as are all? other commands). There's no real harm by this, but may be surprising.

The above two concerns are valid. Don't really have a good suggestion to resolve these.

* I would be interested in seeing an example on how to setup the github action (with the cache action) when using the global cache directory (or would we recommend using a local directory?) for a task that runs cross-platform.

We could recommend to set --cache-dir flag and cache that directory in GitHub Action.

But... this is not a me decision. If the team is in favor then go for it.

Maybe you can discuss this within the team? I think I can still merge this but, I think consensus is important.

@MichaReiser
Copy link
Member

Maybe you can discuss this within the team? I think I can still merge this but, I think consensus is important.

Will do. I'll bring it up in our next standup.

@zanieb
Copy link
Member

zanieb commented Jun 28, 2023

We could recommend to set --cache-dir flag and cache that directory in GitHub Action.

We should probably just provide a GitHub Action with reasonable defaults anyway.

@MichaReiser MichaReiser self-assigned this Jul 3, 2023
@MichaReiser MichaReiser added the breaking Breaking API change label Jul 3, 2023
@MichaReiser
Copy link
Member

I brought this PR up in today's standup, and we decided not to merge this as of today because moving to a global cache directory only provides limited benefits in our view and comes with a few downsides. This isn't saying that we don't want to move to a global cache directory eventually. It's just that we want to have more time to consider all the tradeoffs to decide this for good.

Thanks again @Thomasdezeeuw for working on this and all your caching improvements.

@MichaReiser MichaReiser closed this Jul 4, 2023
@MichaReiser MichaReiser deleted the thomas/global_cache branch February 20, 2025 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking API change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants