Skip to content

ENH: add support for setting astropy's cache and config directories via environment variables#17567

Closed
neutrinoceros wants to merge 1 commit intoastropy:mainfrom
neutrinoceros:config/enh/17507/support_setting_cache_dir_via_env_var
Closed

ENH: add support for setting astropy's cache and config directories via environment variables#17567
neutrinoceros wants to merge 1 commit intoastropy:mainfrom
neutrinoceros:config/enh/17507/support_setting_cache_dir_via_env_var

Conversation

@neutrinoceros
Copy link
Contributor

@neutrinoceros neutrinoceros commented Dec 19, 2024

Description

This is a proof of concept solution to #17507, based atop #17554
Fixes #17507

TODO:

  • implement ASTROPY_CACHE_DIR
  • stabilize CI
  • implement ASTROPY_CONFIG_DIR
  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@neutrinoceros neutrinoceros added this to the v7.1.0 milestone Dec 19, 2024
@github-actions
Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@github-actions
Copy link
Contributor

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var branch from d31ce13 to f1bac6b Compare December 19, 2024 17:58
@neutrinoceros neutrinoceros changed the title ENH: add support for setting astropy's cache directory via an environment variable ENH: add support for setting astropy's cache directory via an environment variable (⏰ wait for #17554) Dec 19, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my testing strategy is not ideal in that it repeats fixture code but I couldn't find a smart way to avoid that. In particular, parametrized fixtures didn't seem applicable here given the subtle differences between each setup.
These tests are also very likely thread unsafe as they all modify a global state (os.environ). This isn't a problem right now but could be in the future when we start experimenting with free-threaded Python.

@pllim
Copy link
Member

pllim commented Dec 19, 2024

Thanks! Should I look at this now or should I wait for the CI to be green first?

@neutrinoceros
Copy link
Contributor Author

Feel free to have a look now but I'll be fixing those errors tomorrow. The takeaway message from now is: this can be implemented on top of #17554, but maybe that point doesn't get across as well with a red CI 😅


"""
return _get_dir_path(rootname, set_temp_config, "config")
return set_temp_config.get_dir_path(rootname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about using set_temp_config here. It was meant for temporary stuff, not suitable for cases like you want to permanently change the path for a machine.

Currently your patch only deals with cache, but config also should be consistent:

HOME
  |__ .astropy
        |___ cache
        |___ config

I think the ask was to change where HOME points to, which affects both cache and config.

What I had in mind, I think, was simpler, we add one more if in _get_dir_path to check for the new env var first before going to XDG env var (for both cache and config).

Additionally, the user doc would need to make clear of this new option and how it might catch users unawares if they accidentally have those new env var defined, etc.

Also have to make sure we don't break https://docs.astropy.org/en/latest/config/index.html#customizing-config-location-in-affiliated-packages in the process.

Copy link
Contributor Author

@neutrinoceros neutrinoceros Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is from #17554. I'll move your comment and answer there !
to clarify, only the second commit on this PR is to be reviewed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually never mind, I think I should answer here because the second half of your comment makes the most sense in this context;

I am not sure about using set_temp_config here.

Note that it is already being used. I'm just moving the function around to make it a class method instead of a floating function that takes a class as one of its arguments.

Currently your patch only deals with cache, but config also should be consistent

You're right, it seems I was half-remembering a conversation from #17507. I'll add ASTROPY_CONFIG_DIR too once I stabilized CI (listed both tasks as todos on the original post).

What I had in mind, I think, was simpler, we add one more if in _get_dir_path to check for the new env var first before going to XDG env var (for both cache and config).

that is actually exactly what I did in this branch. It's easier to see if you just look at the second commit.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var branch from f1bac6b to 3d12505 Compare December 20, 2024 07:33
@neutrinoceros
Copy link
Contributor Author

Reproducing the failures in CI is a bit costly but feasible: I need to run the whole test suite sequentially.
It took me a couple of iterations to realize that my tests are causing other tests to fail, not the code change itself. So, I need to form a better test strategy and try to avoid side effects.

@neutrinoceros
Copy link
Contributor Author

Epilog: the teardow logic in my test functions was flawed and caused changes to ASTROPY_CACHE_DIR to leak. I'll push the fix in a couple minutes.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var branch 2 times, most recently from 407eccc to 0bbc9b6 Compare December 20, 2024 08:46
@neutrinoceros neutrinoceros changed the title ENH: add support for setting astropy's cache directory via an environment variable (⏰ wait for #17554) ENH: add support for setting astropy's cache and config directories via environment variables (⏰ wait for #17554) Dec 20, 2024
@neutrinoceros
Copy link
Contributor Author

Once I got tests to be stable (at least locally), adding ASTROPY_CONFIG_DIR was trivial.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var branch 2 times, most recently from 9da2256 to 6445dcd Compare December 20, 2024 09:28
@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var branch from 6445dcd to a87f45c Compare January 15, 2025 07:29
@neutrinoceros neutrinoceros changed the title ENH: add support for setting astropy's cache and config directories via environment variables (⏰ wait for #17554) ENH: add support for setting astropy's cache and config directories via environment variables Jan 15, 2025
@neutrinoceros
Copy link
Contributor Author

With #17554 merged, this is now ready for review.

@neutrinoceros neutrinoceros marked this pull request as ready for review January 15, 2025 07:30
@pllim pllim added Extra CI Run cron CI as part of PR Build all wheels Run all the wheel builds rather than just a selection labels Jan 15, 2025
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our testing infrastructure sets up temporary cache and configuration directories for the tests to ensure both that the normal cache and configuration do not interfere with the tests and also to ensure that the tests don't leave anything behind in the normal directories. This is set up in conftest.py, astropy/conftest.py, docs/conftest.py and in our test runner using the XDG environment variables. The new Astropy environment variables are supposed to have priority over the XDG variables, so using them will turn off that testing setup unless the setup is updated accordingly.

The config documentation needs to be updated, but pull request currently only adds a change log entry.

Comment on lines +169 to +186
if (usrpth_str := os.getenv(cls._astropy_env_dir)) is not None:
usrpth = Path(usrpth_str).expanduser()

if usrpth.is_file():
raise FileExistsError(
f"Cannot create directory under {usrpth_str} as requested "
f"via the environment variable {cls._astropy_env_dir}: "
"a file with this name already exists."
)

if not usrpth.parent.exists():
raise FileNotFoundError(
f"Cannot create directory under {usrpth_str} as requested "
f"via the environment variable {cls._astropy_env_dir}: "
f"the parent directory {usrpth.parent} does not exist."
)
usrpth.mkdir(exist_ok=True)
return usrpth.resolve()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simplest both for us and for the users if the effect of the new Astropy-specific environment variables would be as similar as possible to the effect of the already available XDG environment variables. My (naive) expectation is that the code that handles that should look something like

for env_dir in (cls._astropy_env_dir, cls._xdg_env_dir):
  ...

Why is there a separate code block for the new functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what my reasoning was, maybe I was just trying to get the most simple behavior right from the get go; the existing logic for XDG variables is quite complex and I probably thought I couldn't test it all efficiently.

usrpth.mkdir(exist_ok=True)
return usrpth.resolve()

if (xch := cls._temp_path) is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls._temp_path being handled after the Astropy environment variables means that set_temp_cache and set_temp_config will fail silently at runtime if the corresponding Astropy environment variable is used. The fact that this is not being caught be the tests means that we need to add tests that would check which directory customization mechanism has priority if multiple are used simultaneously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Comment on lines +121 to +125
def _cleanup_environment(env_var_name: str, old_value: str | None) -> None:
if old_value is None:
os.environ.pop(env_var_name)
else:
os.environ[env_var_name] = old_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you implementing this functionality yourself instead of making use of pytest monkeypatching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to reserve mock/monkeypatching as last resorts because I think they are not well understood and tend to effectively make tests harder to read or refactor (see for instance #16659).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have added several new tests, but you have not modified any of the existing ones. I would have thought that the tests involving the XDG environment variables could be rewritten to be parametrized tests that check both the XDG and the new Astropy environment variables.

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

What is the intended design? That is, if both XDG and ASTROPY variables are defined, which would take precedence? All this should be documented in the user facing doc.

@pllim
Copy link
Member

pllim commented Jan 16, 2025

p.s. Ooops I didn't see #17567 (review) before posting my comment as I had an outdated tab open.

@neutrinoceros
Copy link
Contributor Author

Thank you both for reviewing this. In hindsight, I realize my approach is different enough from what I should have done that I prefer replacing this PR with a brand new one instead of mutating this one so much that it doesn't retain any of its history. I'll move it to draft for now.

@neutrinoceros
Copy link
Contributor Author

neutrinoceros commented Jan 27, 2025

Closing this as the alternative PR #17640 appears to be converging.

@neutrinoceros neutrinoceros deleted the config/enh/17507/support_setting_cache_dir_via_env_var branch January 27, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build all wheels Run all the wheel builds rather than just a selection config Extra CI Run cron CI as part of PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Let user define cache location

3 participants