Skip to content

ENH add auto .gitignore in the Memory cache folder#1674

Merged
tomMoral merged 27 commits intojoblib:mainfrom
StefanieSenger:gitignore
Mar 5, 2025
Merged

ENH add auto .gitignore in the Memory cache folder#1674
tomMoral merged 27 commits intojoblib:mainfrom
StefanieSenger:gitignore

Conversation

@StefanieSenger
Copy link
Copy Markdown
Contributor

@StefanieSenger StefanieSenger commented Feb 27, 2025

Closes #1643

This PR adds an automatically created .gitignore file to the cache folder created with Memory().__init__().

Memory().clear() and rm_subdirs() are modified to delete joblib's .gitignore file while removing the other joblib folders.` Now it stays.

PR also add a corresponding test.

Copy link
Copy Markdown
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For directory that we do not create, I would not add the .gitignore.
I also added a few nipicks.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.43%. Comparing base (389daf2) to head (5699336).
Report is 25 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1674      +/-   ##
==========================================
- Coverage   95.51%   95.43%   -0.08%     
==========================================
  Files          46       46              
  Lines        7711     7731      +20     
==========================================
+ Hits         7365     7378      +13     
- Misses        346      353       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

joblib/disk.py Outdated
Comment on lines +77 to +78
The directory indicated by `path` is left in place, and its subdirectories
are erased.
The directory indicated by `path` is left in place, and its subdirectories and the
.gitignore file are erased.
Copy link
Copy Markdown
Contributor Author

@StefanieSenger StefanieSenger Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty directory is not tracked by git anyways, but I assume there must be some benefit of re-using this directory somehow (so the .gitignore file should also be removed), otherwise I cannot see how users would benefit from a function like this one - compared to simply deleting the whole folder.

Let's say users have added other files into the cache folder - they would be wiped as well. This is what puzzles me the most about this...

Copy link
Copy Markdown
Member

@lesteve lesteve Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch but I would not add .gitignore logic here, mostly for simplicity reasons.

rm_subdirs seems to imply that it deletes subdirectories so it was not deleting any file before and I think we can decide to keep it like this, at least not changing anything doesn't seem to have any downside

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @lesteve, I have removed this. So the file stays in the new folder also after running rm_subdirs() or clearing cache.

Copy link
Copy Markdown
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @lesteve and @tomMoral,

thank you for your comments, I have fixed the doctest failure by adding a verbosity level to the memory object in this example (still not sure why this didn't raise before, since it was unrelated to the PR) and have implemented your requested changes.

This PR is ready for a review.

joblib/disk.py Outdated
Comment on lines +77 to +78
The directory indicated by `path` is left in place, and its subdirectories
are erased.
The directory indicated by `path` is left in place, and its subdirectories and the
.gitignore file are erased.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @lesteve, I have removed this. So the file stays in the new folder also after running rm_subdirs() or clearing cache.

Copy link
Copy Markdown
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reopened a tracking issue for the str vs pathlib.Path inconsistency.

+1 for merging this PR without resolving that bug and instead fix it in a dedicated PR.

Comment on lines +459 to +462
# automatically add `.gitignore` file to the cache folder; the condition is
# necessary because in Memory.__init__, the user passed `location` param is
# modified to be either `{location}` or `{location}/joblib `depending on input
# type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# automatically add `.gitignore` file to the cache folder; the condition is
# necessary because in Memory.__init__, the user passed `location` param is
# modified to be either `{location}` or `{location}/joblib `depending on input
# type
# Automatically add `.gitignore` file to the cache folder.
# XXX: the condition is necessary because in `Memory.__init__`, the user
# passed `location` param is modified to be either `{location}` or
# `{location}/joblib `depending on input type (`pathlib.Path` vs `str`).
# The proper resolution of this inconsistency is tracked in:
# https://github.com/joblib/joblib/issues/1684

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commited this, but what is XXX supposed to mean or stand for?

@ogrisel
Copy link
Copy Markdown
Contributor

ogrisel commented Mar 3, 2025

@StefanieSenger can you please add an entry in the changelog to document this enhancement?

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Copy Markdown
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments but I think we are close to the end

StefanieSenger and others added 3 commits March 5, 2025 09:31
Co-authored-by: Thomas Moreau <thomas.moreau.2010@gmail.com>
Co-authored-by: Thomas Moreau <thomas.moreau.2010@gmail.com>
@StefanieSenger
Copy link
Copy Markdown
Contributor Author

Thank you for reviewing and taking care of absolute paths, @tomMoral. I had not been aware that you could pass them.

I have committed your suggestions.

@tomMoral tomMoral merged commit adc3a7c into joblib:main Mar 5, 2025
29 checks passed
@tomMoral
Copy link
Copy Markdown
Contributor

tomMoral commented Mar 5, 2025

Merging!
Thanks a lot @StefanieSenger for pushing this through! 🎉
It will avoid a lot of cache to end up in github repo ^^

@StefanieSenger StefanieSenger deleted the gitignore branch March 5, 2025 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-added .gitignore in cache directories

4 participants