Skip to content

UX: warn when env variables are ignored in cache and config directory discovery#17934

Merged
pllim merged 3 commits intoastropy:mainfrom
neutrinoceros:api/validate-dir-env-vars
Nov 2, 2025
Merged

UX: warn when env variables are ignored in cache and config directory discovery#17934
pllim merged 3 commits intoastropy:mainfrom
neutrinoceros:api/validate-dir-env-vars

Conversation

@neutrinoceros
Copy link
Contributor

@neutrinoceros neutrinoceros commented Mar 25, 2025

Description

This is split from #17640 at @eerovaher's request.

Fix #18436

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@neutrinoceros neutrinoceros added this to the v7.1.0 milestone Mar 25, 2025
@neutrinoceros neutrinoceros added config API change PRs and issues that change an existing API, possibly requiring a deprecation period labels Mar 25, 2025
@neutrinoceros neutrinoceros requested a review from eerovaher March 25, 2025 18:56
@github-actions github-actions bot added the utils label Mar 25, 2025
@github-actions
Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@github-actions
Copy link
Contributor

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@eerovaher
Copy link
Member

Have you thought about what should happen if the directory exists but the user doesn't have permissions to access it?

@neutrinoceros
Copy link
Contributor Author

Read access to the cache and config directories should be sufficient as long as one doesn't need to download additional data. So, we could check for read permission and raise an exception if not present. It's probably a good idea to check for write access too (at least for the cache directory), but a warning might be sufficient there.

... but I think these improvements are orthogonal to what I'm doing here, right ?

@eerovaher
Copy link
Member

In this pull request we should think about what should we do to validate the directories. As far as I can tell there are three things that can go wrong:

  1. the directory does not exist, in which case it might be a good idea to warn the user (like the patch here already does),
  2. user does not have access to the directory, but that should cause runtime errors which might be more informative so a separate warning might not be needed,
  3. the directory was specified using a relative, not an absolute path, in which case changing the working directory might automatically change the cache and config directories too, which would be not be good.

Is there any failure mode I missed?

@pllim
Copy link
Member

pllim commented Apr 14, 2025

Sorry I am late to the discussion. The XDG behavior has been established for like almost 15 years now, there is little benefit to only now throw warning. In my original review of #17640 , I want the new ASTROPY env vars to either auto-create the dir tree or be verbose instead of silently falling back to what XDG does. Because in that case, user already purposely chooses to not use the XDG behavior. So in response to Eero's #17640 (comment) , I am not sure if (2) and (3) should be separate.

@eerovaher
Copy link
Member

API consistency makes things simpler to implement, simpler to test, simpler to document and, most importantly, simpler for the users to understand. Why should we deviate from the general guideline in this case?

@pllim
Copy link
Member

pllim commented Apr 14, 2025

Cost / benefit. There is little benefit to break something that no one complained about for almost 15 years.

@neutrinoceros
Copy link
Contributor Author

Is a warning considered "breaking" now ? I think the complexity cost of treating ASTROPY_* env vars differently may be justified if we know of a case where this PR, as is, would break, but I'm not found of piling up complexity purely out of caution. What about giving this change a chance for 7.1.0 (perhaps 8.0.0 ?) first ?

@pllim
Copy link
Member

pllim commented Apr 15, 2025

Is a warning considered "breaking" now ?

Yes.

@neutrinoceros
Copy link
Contributor Author

I don't get it. In what conditions do we expect any user to have these variables set to invalid locations, and it not be a mistake ?

@eerovaher
Copy link
Member

I'd be willing to accept that the users don't get warnings because that is the current behavior (I don't really like the current behavior, but it is what it is). But I cannot agree that the ASTROPY_*_DIR and XDG_*_HOME variables should behave differently in this regard.

@pllim pllim modified the milestones: v7.1.0, v7.2.0 Apr 21, 2025
@maxnoe
Copy link
Member

maxnoe commented Apr 23, 2025

But I cannot agree that the ASTROPY_DIR and XDG_HOME variables should behave differently in this regard.

Why not? One is specific to astropy and you can be sure that this variable is set because of astropy.

The other is a generic variable that users might set for all kinds of reasons and have no idea it actually also affects astropy.

@maxnoe
Copy link
Member

maxnoe commented Apr 23, 2025

Also, XDG_CONFIG_HOME has an additional failure mode, because it does not directly point to the config directory itself but you expect it in ${XDG_CONFIG_HOME}/astropy, you have the additional case where XDG_CONFIG_HOME exists (is readable / writable) but ${XDG_CONFIG_HOME}/astropy doesn't.

@hamogu
Copy link
Member

hamogu commented Aug 7, 2025

I agree with @neutrinoceros and @eerovaher above: This is a bug and fixing bugs is exactly what minor releases are for (in fact, that's even what bug fix releases are for, it doesn't even have to be a minor release). To paraphrase what others have said above: There is no way someone sets the defaults for their configuration directories without wanting to set the defaults for their configuration directories.

So, if I have a typo and that directory does not exist or the disk is not mounted or something, then I absolutely have to know that - astropy should issue a warning or exception.

@taldcroft
Copy link
Member

I also agree with @neutrinoceros and @eerovaher and now @hamogu above on the principle of adding a warning here.

I have not looked at the code or done technical review, but I think that once @eerovaher agrees that this is technically sound then it can be approved for 7.2.

@maxnoe
Copy link
Member

maxnoe commented Aug 14, 2025

I agree for the config dir, but I think there should be no warning for a non-existent cache directory as long as the location is writable. It's common to clear caches or create them fresh by setting the cache variable to another value. An empty cache shouldn't be worth a warning.

@neutrinoceros
Copy link
Contributor Author

that's a good point. We could create the astropy subdirectory automatically and silently, just not the parent dir. Does this sound sane to you ?

@github-actions github-actions bot added the Close? Tell stale bot that this issue/PR is stale label Aug 23, 2025
@github-actions
Copy link
Contributor

Hi humans 👋 - this pull request hasn't had any new commits for approximately 4 months. I plan to close this in 30 days if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock.

If you believe I commented on this pull request incorrectly, please report this here.

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're very close now. The logic route LGTM but I feel like the warning stuff could be simplified.

@@ -0,0 +1,3 @@
``get_config_dir()`` and ``get_cache_dir()`` now emit warnings in all cases
where the ``XDG_CACHE_HOME`` (resp ``XDG_CONFIG_HOME``) environment variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what "resp" means. Can you please clarify more here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoop, that must be french only, I assumed it was common in english too. It's just short for "respectively". I'll just use the unabbreviated word then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW, I found 4 other occurences in the code base, some of which were not from french contributors 😅, but I agree there's no point shortening a word if it's not commonly understood. Thanks !)

@neutrinoceros
Copy link
Contributor Author

@pllim I haven't included your suggested simplifications for now, as I anticipate they would lead to further churn in following developments, but if you still think it's worth it, please apply all of them and squash(-merge?) !

@pllim pllim force-pushed the api/validate-dir-env-vars branch from 4d5be99 to 90e8e13 Compare October 31, 2025 16:05
@pllim

This comment was marked as resolved.

@pllim
Copy link
Member

pllim commented Oct 31, 2025

I haven't included your suggested simplifications for now

It is okay. Since you anticipate for the follow-up PR, no need to apply the suggestions. Thanks for the clarifications!

@neutrinoceros
Copy link
Contributor Author

Do you want to clean up the commits now?

On it

@neutrinoceros neutrinoceros force-pushed the api/validate-dir-env-vars branch from 90e8e13 to f953162 Compare October 31, 2025 16:24
@neutrinoceros
Copy link
Contributor Author

rebased and cleaned the branch history !

@neutrinoceros neutrinoceros force-pushed the api/validate-dir-env-vars branch from 0487ad2 to 74e5561 Compare October 31, 2025 18:03
Comment on lines +235 to +245
if linkto.exists() and linkto.resolve() == maindir:
# no-op
linkto = None
elif sys.platform.startswith("win"):
warn("is not supported on Windows")
linkto = None
elif maindir.is_dir() and linkto.resolve() != maindir:
warn(
f"the default location, {maindir}, already exists, and takes precedence"
)
linkto = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more aligned with existing main logic, which skips linkto as long as it exists, no matter what. Here, we can emit extra warnings, but we should not change the main logic.

Suggested change
if linkto.exists() and linkto.resolve() == maindir:
# no-op
linkto = None
elif sys.platform.startswith("win"):
warn("is not supported on Windows")
linkto = None
elif maindir.is_dir() and linkto.resolve() != maindir:
warn(
f"the default location, {maindir}, already exists, and takes precedence"
)
linkto = None
if linkto.exists():
linkto = None
if maindir.is_dir() and linkto.resolve() != maindir:
warn(
f"the default location, {maindir}, already exists, and takes precedence"
)
elif sys.platform.startswith("win"):
warn("is not supported on Windows")
linkto = None

For reference, on main:

            if (
                not sys.platform.startswith("win")
                and linkto is not None
                and not linkto.exists()
            ):
                linkto.symlink_to(maindir)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I can't take this suggestion as is because:

linkto = None
if maindir.is_dir() and linkto.resolve() != maindir:
    ...

would raise an AttributeError: None has not attribute 'resolve'.
Though I take your point. Let me think about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm now pretty sure that you just discovered yet another bug in my implementation, but this one is worse, in a sense, because I think it shows how I already broke perfect bug-for-bug compat between 7.0 and 7.1. I'm going to add a test case for it and revert to 7.0 behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(luckily, this should actually make the logic simpler, for once)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind that, the weird behavior I was looking at was actually already in 6.1, which is before I got involved with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so bear with me: at this point in the code, we've already returned something if linkto is either a symlink or a dir, and we warned + discarded it if it was a file. Knowing that it's not None here, we already know that it can't be an existing path, so a branch such as if linkto.exists() will not be reachable at all. I would normally be okay with having redundant checks, since this logic is spread across more than one function, but here I actually think it's better to keep is as "simple" as possible, so my conclusion was to just remove this branch. I hope this makes sense

@neutrinoceros neutrinoceros force-pushed the api/validate-dir-env-vars branch from 47eb532 to 7ab9245 Compare October 31, 2025 19:18
@neutrinoceros neutrinoceros force-pushed the api/validate-dir-env-vars branch from 9a68306 to 0f0749f Compare November 1, 2025 05:56
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now at the state where

  • it issues the warning it originally wanted to issue
  • a few more warnings for some weird combos
  • no annoying unexpected warning for common use cases (as far as I could locally test and I did not test on Windows)

While not perfect, we should get this in before we confuse ourselves further. Thank you very much for your patience!

@pllim pllim enabled auto-merge (squash) November 2, 2025 18:47
@neutrinoceros
Copy link
Contributor Author

And thank you a thousand times over for your unfailing diligence throughout this difficult review. I know we both got very frustrated with it but I'm convinced the end result is much better than what I originally envisioned, and I couldn't have made it without your help.

@pllim pllim merged commit b115548 into astropy:main Nov 2, 2025
49 of 50 checks passed
@neutrinoceros neutrinoceros deleted the api/validate-dir-env-vars branch November 2, 2025 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Build all wheels Run all the wheel builds rather than just a selection config Extra CI Run cron CI as part of PR Ready-for-final-review utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom cache location does not work when the directory does not exist

6 participants