Skip to content

ENH: add support for setting astropy's cache and config directories via environment variables (blocked by #17934)#17640

Draft
neutrinoceros wants to merge 3 commits intoastropy:mainfrom
neutrinoceros:config/enh/17507/support_setting_cache_dir_via_env_var_2
Draft

ENH: add support for setting astropy's cache and config directories via environment variables (blocked by #17934)#17640
neutrinoceros wants to merge 3 commits intoastropy:mainfrom
neutrinoceros:config/enh/17507/support_setting_cache_dir_via_env_var_2

Conversation

@neutrinoceros
Copy link
Contributor

@neutrinoceros neutrinoceros commented Jan 16, 2025

Description

The approach is different enough from #17507 that it warranted a separate PR. Hopefully this one satisfies the needs highlighted in @eerovaher's and @pllim's early review there.
I'll add documentation once everyone is happy with the approach !

Blocked by

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@neutrinoceros neutrinoceros added this to the v7.1.0 milestone Jan 16, 2025
@github-actions
Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@github-actions
Copy link
Contributor

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch 3 times, most recently from 2c5461e to d0a3b7b Compare January 16, 2025 09:52
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to focus on the big picture first before diving into implementation and tests.

  • Should we allow mix and match, e.g., ASTROPY_CONFIG_DIR and XDG_CACHE_HOME?
  • What if a long time user now starts to use ASTROPY prefix but has a lot of stuff in XDG already; should we have helper function to help them move the config and cache over? We might already have export/import for cache, but not sure about config.
  • Is it simpler going forward if we have a plan to eventually exclusively use the ASTROPY prefix?
  • Are there special use cases (e.g., HPC, cloud, people sending their config to collaborators manually) where this change will come back and bite us?

Comment on lines +45 to +49
pytest.param("XDG_CONFIG_HOME", paths.get_config_dir_path, id="config"),
pytest.param(
"XDG_CACHE_HOME",
paths.set_temp_cache,
paths.get_cache_dir_path,
id="xdg-cache",
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to parametrize twice: once over "ASTROPY" and "XDG" and once over "cache" and "config"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I just pushed a second commit to that effect. Let me know if you like it better.

@eerovaher
Copy link
Member

Should we allow mix and match, e.g., ASTROPY_CONFIG_DIR and XDG_CACHE_HOME?

Obviously yes.

What if a long time user now starts to use ASTROPY prefix but has a lot of stuff in XDG already; should we have helper function to help them move the config and cache over? We might already have export/import for cache, but not sure about config.

If a user wants to customize the astropy cache or config directories then it should be their job to ensure that the relevant files are moved.

Independently from the above, temporarily setting the Astropy environment variable allows running astropy with the default configuration, but if there is some clever mechanism to automagically move the configuration then this use case will not be possible.

Is it simpler going forward if we have a plan to eventually exclusively use the ASTROPY prefix?

Handling the Astropy and XDG environment variables has a lot of common code and they can share tests too, if the tests are parametrized correctly. The code functionality over code complexity ratio very favorable for supporting both sets of variables.

Are there special use cases (e.g., HPC, cloud, people sending their config to collaborators manually) where this change will come back and bite us?

In its current state the changes could cause problems to anyone who has set the Astropy environment variables and then runs our tests, like I already described in the other pull request, but addressing that is not complicated (it is only somewhat tedious because of code repetition). Once that has been taken care of I don't foresee any real problems.

@neutrinoceros
Copy link
Contributor Author

I agree with @eerovaher's answers.

In its current state the changes could cause problems to anyone who has set the Astropy environment variables and then runs our tests, like I already described in the other pull request, but addressing that is not complicated (it is only somewhat tedious because of code repetition).

sorry, I missed that. I'll address this shortly !

@neutrinoceros
Copy link
Contributor Author

@eerovaher I should have addressed all your points so far

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch from 603eff0 to 7878773 Compare January 19, 2025 10:51
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/conftest.py still needs to be updated.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch 2 times, most recently from e6145c0 to 4147533 Compare January 22, 2025 09:41
@neutrinoceros
Copy link
Contributor Author

It looks like we are getting somewhere with this, and the testing approach was hugely improved thanks to @eerovaher's suggestions already, so let's undraft this now !

@neutrinoceros neutrinoceros marked this pull request as ready for review January 22, 2025 09:43
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/conftest.py still needs to be updated and config documentation should be updated too.

Comment on lines +40 to +49
@pytest.fixture
def isolated_setup(monkeypatch):
# ignore global state of the test session
monkeypatch.delenv("ASTROPY_CACHE_DIR", raising=False)
monkeypatch.delenv("ASTROPY_CONFIG_DIR", raising=False)
monkeypatch.delenv("XDG_CACHE_HOME", raising=False)
monkeypatch.delenv("XDG_CONFIG_HOME", raising=False)

monkeypatch.setattr(paths.set_temp_cache, "_temp_path", None)
monkeypatch.setattr(paths.set_temp_config, "_temp_path", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread safe? Don't you need to use locks explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. The current implementation is just meant for code sharing. I'll add locking now but thread safety will only be guaranteed in between users of this fixture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...thread safety will only be guaranteed in between users of this fixture.

That's why it would be better to make the code inherently type safe, but if that's too complicated then the fixture will have to do.

_directory_type: Literal["cache", "config"]
_directory_env_var: Literal["XDG_CACHE_HOME", "XDG_CONFIG_HOME"]
_xdg_env_dir: Literal["XDG_CACHE_HOME", "XDG_CONFIG_HOME"]
_astropy_env_dir: Literal["ASTROPY_CACHE_DIR", "ASTROPY_CONFIG_DIR"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to name the variables ASTROPY_CACHE_HOME and ASTROPY_CONFIG_HOME so that they would be more similar to the corresponding XDG variable names? Or are the *_CACHE_DIR and *_CONFIG_DIR used commonly enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any widely used example of any. I know I'm biased to view *_DIR as more "natural" because I've seen it used in HPC culture where I spent my years as a researcher. No strong opinions on this point, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASTROPY_*_DIR names are not bad, but we should think about the names very carefully because whatever we choose will be very difficult to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's worth taking the time to make sure we're not ignoring an existing convention.
I'll do some more research now.

Copy link
Contributor Author

@neutrinoceros neutrinoceros Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a list of resources in arbitrary places I happened to think of

so I found at least two popular packaging tools (uv and poetry) happen to use the naming convention I spontaneously used (i.e. <TOOLNAME>_CONFIG_DIR). The only directly comparable case I found (MPLCONFIGDIR) doesn't fit our overarching convention for naming env variables ASTROPY_* (using underscores between words), so I think we can safely disregard it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using ASTROPY_{CONFIG/CACHE}_DIR, with the caveat that it should directly point to the directory to be used. The system shouldn't look for another astropy directory in there.

"environment_variable,func",
"env_var_template",
[
# Regression test for #17514 - XDG_CACHE_HOME had no effect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be kept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking this again it's clear that the comment isn't very helpful anymore because the parametrization of this test has changed too much. So the information in this comment is still valuable, but the exact phrasing is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerovaher , so you want a new comment added with different phrasing, still?

xdg_target_dir.mkdir(parents=True)
monkeypatch.setenv(xdg_env_var, str(tmp_path / "xdg"))

monkeypatch.setattr(cls, "_temp_path", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed if you are using the isolated_setup fixture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just accidental left overs. Thanks !

env_var_template,
dir_type,
):
match dir_type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match-statement shouldn't be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my solution to your previous suggestion #17640 (comment)
If none of my approaches are satisfactory, could you suggest another one ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include cls and func in the parametrization like you do in the other tests.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch from 4147533 to 33b0854 Compare January 23, 2025 09:41
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best if more people would have a look at the proposed environment variable names.



@pytest.fixture
def isolated_setup(monkeypatch):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture itself looks fine, but this name seems too generic.

"environment_variable,func",
"env_var_template",
[
# Regression test for #17514 - XDG_CACHE_HOME had no effect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking this again it's clear that the comment isn't very helpful anymore because the parametrization of this test has changed too much. So the information in this comment is still valuable, but the exact phrasing is not.

env_var_template,
dir_type,
):
match dir_type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include cls and func in the parametrization like you do in the other tests.

``$HOME/.astropy/config``. It can be customized with the environment variable
``XDG_CONFIG_HOME`` in which case the ``$XDG_CONFIG_HOME/astropy`` directory
must exist. Note that ``XDG_CONFIG_HOME`` comes from a Linux-centric
``$HOME/.astropy/config``. It can be customized with the environment variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default should be $HOME/.config/astropy, but that's far beyond the scope of this pull request.

Comment on lines +76 to +78
.. note::
``ASTROPY_CONFIG_DIR`` and ``ASTROPY_CACHE_DIR`` require ``astropy`` 7.1
or newer. They are ignored in older versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our documentation is versioned together with the code, so this should not be needed at all.

Suggested change
.. note::
``ASTROPY_CONFIG_DIR`` and ``ASTROPY_CACHE_DIR`` require ``astropy`` 7.1
or newer. They are ignored in older versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have other places in configuration docs that mention when some behaviors started being supported. I think it's helpful for anyone trying to use this, not realizing that they are using a version that's too old.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually this information should be in the API documentation, not in the narrative documentation. But we don't have a good way of documenting environment variables in our API documentation, so maybe we should keep this here.

Comment on lines +63 to +59
.. note::
``astropy`` also respects ``XDG_CACHE_HOME`` and ``ASTROPY_CACHE_DIR`` for
defining cache location. If both are defined, the latter takes precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache has its own documentation starting at:

The ``astropy`` cache is stored in a centralized place (on Linux machines by

It appears to be incomplete, but now would be a good time to improve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I missed that !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking more closely, it may not be an oversight, but an intentional decision to avoid repetitions; the very next line to the one you quoted links to the page I'm updating here "for more details", so I think it's reasonable to keep my change as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be reasonable to document the environment variables that control the cache directory in config documentation instead of the caching documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the cost of having docs that repeats the underlying logic feels greater than the mild inconvenience for users to have to navigate to another page, in particular because it creates a risk of these sections going out of sync.
In any case this came from #9182, which is a pretty heavy PR, and where this point didn't seem to be discussed at all. Maybe we should ask for a third opinion here ? @pllim, what do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a high level, there is a philosophy that "every page could be first page" because of search engine. So having multiple sections link back to the text (instead of repeating word by word) could help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including important information about how our cache works in our config documentation still doesn't sound good to me. Documenting config in cache documentation doesn't sound any better. One option would be to have a separate page for documenting environment variables and both config and cache documentation could link to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for falling behind. I'm coming back to this PR and I just pushed a patch where I centralized the information on environment variables to a dedicated page. I'll undraft the PR back when I'm 100% sure that it renders at intented.

@pllim
Copy link
Member

pllim commented Jan 27, 2025

@ejeschke , would this satisfy your original request to be able to custom cache/config directories for Astropy for Ginga's use?

Thanks, all!

@ejeschke
Copy link
Contributor

@ejeschke , would this satisfy your original request to be able to custom cache/config directories for Astropy for Ginga's use?

Thanks, @neutrinoceros, @pllim ! Yes, this looks very useful for several applications where we utilize astropy here at the observatory.

@pllim
Copy link
Member

pllim commented Jan 27, 2025

@ejeschke , are you interested to test out this branch before merge?

@ejeschke
Copy link
Contributor

@ejeschke , are you interested to test out this branch before merge?

I will do so, but not necessary to hold up the merge as long as the developers have tested it thoroughly. It's a rather straightforward change.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch from 6e1224d to 3f49c01 Compare March 25, 2025 12:50
@neutrinoceros
Copy link
Contributor Author

neutrinoceros commented Mar 25, 2025

Actually, I failed to remember that _get_dir_path already creates missing dirs automatically. So, in hindsight, I went for compatible and consistent behavior: a warning is emmited for any incorrectly set env var.

I suppose this warrants a changelog fragment on its own, but I'd like to get feedback from @pllim on this approach first.

@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch from e4d3ff5 to 3d1412f Compare March 25, 2025 14:16
@pllim
Copy link
Member

pllim commented Mar 25, 2025

Thanks! I will get back to you soon but probably not for a few days.

@eerovaher
Copy link
Member

It looks to me like this pull request could be split into three separate pull requests:

  1. introducing a separate documentation page for the environment variables that astropy uses,
  2. validating the directories set by the environment variables,
  3. introducing the astropy-specific environment variables (which is what this pull request should be limited to).

@neutrinoceros
Copy link
Contributor Author

I opened

@neutrinoceros neutrinoceros changed the title ENH: add support for setting astropy's cache and config directories via environment variables ENH: add support for setting astropy's cache and config directories via environment variables (blocked by #17932 and #17934) Mar 25, 2025
@pllim pllim modified the milestones: v7.1.0, v7.2.0 Apr 21, 2025
XDG_CONFIG_HOME environment variable is set and the
``$XDG_CONFIG_HOME/astropy`` directory exists, it will be that directory.
ASTROPY_CONFIG_DIR (or XDG_CONFIG_HOME) environment variable is set and the
``$ASTROPY_CONFIG_DIR/astropy`` directory exists, it will be that directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let ASTROPY_CONFIG_DIR directly point to the directory instead of expecting another level of astropy in there.

That makes sense for the XDG_... scheme, it doesn't for a specific environment variable for astropy. Let it point to the directory directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea and the naming difference between *_DIR and *_HOME should make it reasonably clear that one set of variables points to the directories themselves and the other to the parent directories. However, it would be best to try behave similarly to other packages that define analogous environment variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the case for all systems I know. Just checking the specific examples posted above:

The logic should be:

astropy looks at the following locations for it's configuration (highest precedence first):

  • $ASTROPY_CONFIG_DIR
  • $XDG_CONFIG_HOME/astropy
  • $HOME/.astropy/config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that astropy should look for the configuration in $ASTROPY_CONFIG_DIR instead of $ASTROPY_CONFIG_DIR/astropy.

I disagree that if neither $ASTROPY_CONFIG_DIR nor $XDG_CONFIG_HOME/astropy are set then astropy should use $HOME/.astropy/config. It should instead use $HOME/.config/astropy (see #6803). But I do acknowledge that this is beyond the scope of this pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should instead use $HOME/.config/astropy

Agreed! Maybe add that in-between to keep backwards compatibility?

Copy link
Member

@maxnoe maxnoe Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    $ASTROPY_CONFIG_DIR
    $XDG_CONFIG_HOME/astropy
    $HOME/.config/astropy
    $HOME/.astropy/config 

@maxnoe
Copy link
Member

maxnoe commented Jul 17, 2025

Is there something we can do to help advance this PR?

@pllim
Copy link
Member

pllim commented Jul 17, 2025

I think @neutrinoceros needs to resolve conflicts, for a start. If he no longer has time or interest to pursue, I could possibly take over with reduced scope.

@neutrinoceros
Copy link
Contributor Author

I'll pick it up as soon as possible. Unlikely to be able to this week, but I'll do my best to get back to it next week. Thanks for the ping and for your patience.

@pllim
Copy link
Member

pllim commented Jul 17, 2025

I also don't think #17934 necessarily block this. We can still keep the old behavior for the XDG stuff in this PR.

@github-actions github-actions bot added the Close? Tell stale bot that this issue/PR is stale label Aug 23, 2025
@github-actions
Copy link
Contributor

Hi humans 👋 - this pull request hasn't had any new commits for approximately 4 months. I plan to close this in 30 days if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock.

If you believe I commented on this pull request incorrectly, please report this here.

@neutrinoceros
Copy link
Contributor Author

In lights of recent comments on #17934, I would still like it to go forward before I rebase and undraft this one, otherwise I'll be facing avoidable merge conflicts. Will keep this open in the mean time.

@neutrinoceros neutrinoceros added keep-open and removed Close? Tell stale bot that this issue/PR is stale labels Aug 23, 2025
@neutrinoceros neutrinoceros changed the title ENH: add support for setting astropy's cache and config directories via environment variables (blocked by #17932 and #17934) ENH: add support for setting astropy's cache and config directories via environment variables (blocked by #17934) Sep 1, 2025
@neutrinoceros neutrinoceros force-pushed the config/enh/17507/support_setting_cache_dir_via_env_var_2 branch from 3d1412f to f9bb03a Compare September 2, 2025 14:54
@neutrinoceros
Copy link
Contributor Author

I dropped some of the commits that were splitted to #17932 and #17934, but I'm trying to avoid solving similar merge conflicts more than needed, so I'll only do a full rebase once #17934 is merged.

@astrofrog astrofrog modified the milestones: v7.2.0, v8.0.0 Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build all wheels Run all the wheel builds rather than just a selection config Extra CI Run cron CI as part of PR keep-open

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Let user define cache location Use user-specified custom cache directory

6 participants