Skip to content

RFC: rewrite config.paths._get_dir_path as a class method#17554

Merged
pllim merged 1 commit intoastropy:mainfrom
neutrinoceros:config/rfc/simplify_get_dir_path
Jan 14, 2025
Merged

RFC: rewrite config.paths._get_dir_path as a class method#17554
pllim merged 1 commit intoastropy:mainfrom
neutrinoceros:config/rfc/simplify_get_dir_path

Conversation

@neutrinoceros
Copy link
Contributor

Description

Follow up to #17188 and #17546
This was initially discussed in #17546 (comment). I'll quote myself here for convenience:

I think your new annotations are revealing a small design issue with these functions:
if fallback can only be exactly 'cache' or 'config', shouldn't this mean that cls can only be exactly set_temp_cache or set_temp_config ? We could also naturally exclude invalid combinations like cls=set_temp_cache, fallback='config' by simply removing the fallback argument and making it a class attribute instead (in which case we can also keep the generalized annotation you used for cls)

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@github-actions
Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@github-actions
Copy link
Contributor

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@neutrinoceros neutrinoceros marked this pull request as ready for review December 18, 2024 07:59
Comment on lines 41 to 43
_, _, fallback = cls.__name__.rpartition("_")
if (
(xdg_dir := os.getenv(f"XDG_{fallback.upper()}_HOME")) is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fallback is used for constructing the name of an environment variable. Having a fallback parameter with the type Literal["cache", "config"] lets type checkers ensure that the environment variable that will be looked up is valid. The change here removes that information from the type system because the __name__ of a subclass of _SetTempPath is not restricted.

The current annotation is not perfect because it does not prevent calls like _get_dir_path(..., set_temp_cache, "config"), so it can be a good idea to introduce a class attribute for set_temp_cache and set_temp_config to eliminate such calling errors, but trying to extract something from the class name is not good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you. How about this ?

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from 4f50fa9 to 4d77c3f Compare December 18, 2024 14:45

return wrapper

@classmethod
Copy link
Contributor Author

@neutrinoceros neutrinoceros Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intentionally avoiding astropy.utils.decorators.classproperty because this module is so low in the stack that I don't think it's wise to make it depend on other parts of astropy

@eerovaher
Copy link
Member

I'm starting to get the impression that _get_dir_path() should be a class method in _SetTempPath, but I might not have time to think this through properly this week.

@neutrinoceros neutrinoceros changed the title RFC: simplify config.paths._get_dir_path RFC: simplify config.paths._get_dir_path Dec 18, 2024
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given #17507 discussions, I am not sure if this is worth doing...

@neutrinoceros
Copy link
Contributor Author

I'm not sure what you mean here @pllim. Are you worried that we paint ourselves in a corner with this PR ? IMO, removing existing complexity will likely help more than hurt with adding new functionality. But I'm actually neutral.

I'm starting to get the impression that _get_dir_path() should be a class method in _SetTempPath, but I might not have time to think this through properly this week.

This comment grew on me and it now seems clear to me that this function belongs in this class. I'm going to push this change now, hopefully you'll agree with it when you have more time.

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from 4d77c3f to c716fb5 Compare December 19, 2024 09:27
@neutrinoceros neutrinoceros changed the title RFC: simplify config.paths._get_dir_path RFC: rewrite config.paths._get_dir_path as a class method Dec 19, 2024
@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from c716fb5 to 2ff0d45 Compare December 19, 2024 09:29
@pllim
Copy link
Member

pllim commented Dec 19, 2024

I am not sure yet but I have a feeling that all this code will change when we support custom env var.

@neutrinoceros
Copy link
Contributor Author

It will indeed, but I expect that change will be almost completely additive. I could open a follow up PR as a draft if it helps lifting your hesitation.

@neutrinoceros
Copy link
Contributor Author

neutrinoceros commented Dec 19, 2024

@pllim I've opened #17567 to address #17507

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from 2ff0d45 to c51d91c Compare December 20, 2024 07:33
@@ -145,6 +119,7 @@ def get_cache_dir(rootname: str = "astropy") -> str:
class _SetTempPath:
_temp_path: Path | None = None
_default_path_getter: Callable[[str], str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need _default_path_getter at all anymore?

Copy link
Contributor Author

@neutrinoceros neutrinoceros Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. It can easily be refactored out as follow:

diff --git a/astropy/config/paths.py b/astropy/config/paths.py
index 35ee8b2786..73e62ddb1b 100644
--- a/astropy/config/paths.py
+++ b/astropy/config/paths.py
@@ -118,7 +118,6 @@ if get_cache_dir_path.__doc__ is not None:
 
 class _SetTempPath:
     _temp_path: Path | None = None
-    _default_path_getter: Callable[[str], str]
     _fallback_dirname: Literal["cache", "config"]
 
     def __init__(
@@ -133,8 +132,9 @@ class _SetTempPath:
 
     def __enter__(self) -> str:
         self.__class__._temp_path = self._path
+        mtd = get_cache_dir if self._fallback_dirname == "cache" else get_config_dir
         try:
-            return self._default_path_getter("astropy")
+            return mtd(rootname="astropy")
         except Exception:
             self.__class__._temp_path = self._prev_path
             raise
@@ -211,7 +211,6 @@ class set_temp_config(_SetTempPath):
         context (default: False).
     """
 
-    _default_path_getter = staticmethod(get_config_dir)
     _fallback_dirname = "config"
 
     def __enter__(self) -> str:
@@ -267,7 +266,6 @@ class set_temp_cache(_SetTempPath):
         context (default: False).
     """
 
-    _default_path_getter = staticmethod(get_cache_dir)
     _fallback_dirname = "cache"

and it would be a good idea to do something similar, as it currently only serves as a confusing layer of indirection (IMO).
However, I think it's out of scope here. Can I do it as a follow up PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use cls.get_dir_path instead of mtd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, that's a lot cleaner !

class _SetTempPath:
_temp_path: Path | None = None
_default_path_getter: Callable[[str], str]
_fallback_dirname: Literal["cache", "config"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment to explain why there are only two allowed values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from c51d91c to 7b9fbb4 Compare December 26, 2024 10:26
return wrapper

@classmethod
def get_dir_path(cls, rootname: str) -> Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be a public method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is that this method being used outside the class itself, it shouldn't be private, though I understand there's an ambiguity here about how we interpret "private".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Eero pointed out above, might be slightly less controversial to keep the leading underscore since the original name already has it. There is no expectation that this new class method will be used beyond this module, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no expectation that this new class method will be used beyond this module, right?

indeed, there isn't

else:
linkto = None

return _find_or_create_root_dir(cls._fallback_dirname, linkto, rootname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _find_or_create_root_dir were a class method too then we wouldn't have to pass cls._fallback_dirname as an argument when we call it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, let me give this a spin

@neutrinoceros neutrinoceros marked this pull request as draft January 1, 2025 09:38
@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from 13a9bf0 to e6aa8af Compare January 1, 2025 09:59
@neutrinoceros neutrinoceros marked this pull request as ready for review January 1, 2025 10:04
@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch 2 times, most recently from 6b9683e to 4170299 Compare January 7, 2025 16:51
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Thanks!

@pllim
Copy link
Member

pllim commented Jan 7, 2025

@eerovaher , do you want to do a final review?

return path.resolve()

if (
(xdg_dir := os.getenv(f"XDG_{cls._fallback_dirname.upper()}_HOME"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the code is storing the name of the corresponding directory on the class as _fallback_dirname (by the way, that's not a good name) and constructing the name of the environment variable from _fallback_dirname at runtime. It looks like this is trying to be too clever and it would be simpler to store both the directory name and the environment variable name in separate class attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all fair points. I've taken this into account in my latest update.

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from 4170299 to b842871 Compare January 10, 2025 15:05
# This base class serves as a deduplication layer for its only two intended
# children (set_temp_cache and set_temp_config)
_directory_type: Literal["cache", "config"]
_directory_env_var: str
Copy link
Member

@eerovaher eerovaher Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_directory_env_var should be annotated using Literal because it can only hold a small set of values which we are not free to choose because they are defined by an external standard. It is better to document that in the annotation than in a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not satisfied with this suggestion because it leaves room for 2 invalid cases, namely

    _directory_typ = "cache"
    _directory_env_var = "XDG_CONFIG_HOME"

and

    _directory_typ = "config"
    _directory_env_var = "XDG_CACHE_HOME"

So I refactored this again into an enum. Now, the pairing is expressed in the annotation.

Copy link
Contributor Author

@neutrinoceros neutrinoceros Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: my enum refactor was broken beyond repair. Welp, maybe I'll stop chasing perfection here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that it's simpler to check that these two attributes that are right next to each other agree when reading the code than it would be to understand some clever mechanism that would generate one from the other at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm still slightly frustrated by the outcome but I can certainly live with it !

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch 2 times, most recently from b05d4ea to 7531359 Compare January 11, 2025 08:32
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request was triggered by the observation that the cls and fallback parameters of _get_dir_path() were not independent from each other, which suggested that the fallback parameter should instead be a class attribute in the _SetTempPath subclasses. Another observation was that because _get_dir_path() expected cls to be a subclass of _SetTempPath then it essentially was a class method of _SetTempPath even if it was not explicitly decorated as such. A few more similar changes followed from that.

Overall this pull request simplifies the code because it ensures that functions that are tightly coupled with _SetTempPath are defined as methods of that class instead of being scattered all over the module.

# This base class serves as a deduplication layer for its only two intended
# children (set_temp_cache and set_temp_config)
_directory_type: Literal["cache", "config"]
_xdg_directory_env_var: Literal["XDG_CACHE_HOME", "XDG_CONFIG_HOME"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the plan to follow up with #17567 for a different env var name, maybe no need to hardcode xdg into the new class attribute now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@neutrinoceros neutrinoceros force-pushed the config/rfc/simplify_get_dir_path branch from 7531359 to 6956a5c Compare January 14, 2025 16:14
@pllim pllim merged commit 8594e8e into astropy:main Jan 14, 2025
23 of 24 checks passed
@pllim
Copy link
Member

pllim commented Jan 14, 2025

Thanks for your patience!

@neutrinoceros neutrinoceros deleted the config/rfc/simplify_get_dir_path branch January 14, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants