Skip to content

Config package and data downloaders#35

Closed
eteq wants to merge 57 commits intoastropy:masterfrom
eteq:config-pkg
Closed

Config package and data downloaders#35
eteq wants to merge 57 commits intoastropy:masterfrom
eteq:config-pkg

Conversation

@eteq
Copy link
Member

@eteq eteq commented Oct 27, 2011

This pull request implements the layout of the config package, And also implements functionality for remote download/caching of data files and configuration file utility functions.

The remote data aspect of this is ready to go (including tests). One choice I ended up making that could be changed if desired: As it stands right now, data included in the source distribution should live in the astropy/data directory (or subdirectories), rather than in data directories scattered around in each package. If that's strongly desired, it can probably be implemented, although it may require fiddling with setup.py. I shy aware from that because it'll be confusing for future developers - specifically, how does the directory structure of package map onto the directory structure in the data directory? It could be multiple ways, but the simplest thing is to just put them all in one place.

As for the configuration files, there are a few questions to be addressed:

  • Should this simple interface (get_config('subpackagename')) be it? I could implement a scheme where package authors could just define constants at the top of the modules and have them automatically get passed out to the configuration files (or a new version adopted or whatever), but I wasn't sure what everyone else wanted, so I started with this basic interface (which will be useful in more complicated cases, anyway).
  • Right now, the configuration files and remote data cache are in $HOME/.astropy ... given that affiliated packages will want to use this stuff, should we use the .astropy name? Right now, inside that directory there will be directories with the name e.g. astropy for the astropy core, afpkg1 for the first afilliated package, and so on... You specify which one you want with the second argument to get_config. This could change to have separate main directories for each affiliate package, but I think this is easier to deal with, as it doens't clutter up the user's home directory.
  • In the docstring for get_config I'm pointing to the configobj web site for definitions of the ConfigObj object and its interface. Should I pull in a copy of the ConfigObj documentation and included it in the sphinx docs (just as a separate html file include in the source dist)?

Once this has been decided on and the pull request completed, I will write up documentation describing how to use these modules in the sphinx docs and issue a new pull request.

@astrofrog
Copy link
Member

This seems to include commits by @mdboom that have already been merged in - maybe a rebase is necessary against the most recent master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the directory created in get_config_dir?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the .astropy directory is, but the config directory (for astropy, this is .astropy/astropy) is not until this point. (see below for more on this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Ok. Disregard ;)

@mdboom
Copy link
Contributor

mdboom commented Oct 27, 2011

See my low-level comments above.

Here's some comments pertaining to the pull request discussion:

  • I would really like to see subpackage-specific data live within the subpackages. It makes it clear which code cares about what data. astropy.wcs already has some for its tests, and it just loads them using paths relative to the source code (using file) since because they are small local files, there no special caching to do etc. They are installed using the get_data_files() hook in setup_package.py. I'm not sure why this approach isn't sufficient for small, local files where all of the caching/hashing infrastructure etc. isn't necessary. If there's good reasons to put all data under "data/", then perhaps by convention we could have a parallel directory structure there, where wcs's data lives under "data/wcs"?
  • Configuration and cache files are separate and shouldn't be commingled. Configuration is something I would want to synchronize between machines, backup and basically guard with my life. Cache data is ephemeral and opaque -- I don't care what's in it or if it disappears as long as it can be recreated. (As alluded in my line comments, this is why LSB recommends putting configuration under ~/.local and cache under ~/.cache). To that end, I like the suggestion of configuration all being under the same root folder, including for affiliated packages, as long as everything is organized by subpackage underneath that. For caching, I think we should have a single bucket of stuff for anything using the astropy caching framework. No need to divvy that up at all.
  • I think it's sufficient to link to (but not include) the ConfigObj docs. Should we document this as a blanket policy wrt external libraries?

@eteq
Copy link
Member Author

eteq commented Oct 27, 2011

Those updates are just a rebase against master to get rid of the extraneous commits.

@eteq
Copy link
Member Author

eteq commented Oct 28, 2011

@mdboom - Regarding the general comments(in the same order):

  • I think its important to have a one-stop-shop function to get data file names, because otherwise people will end up implementing different slightly incompatible ways of organizing their data files, even if we ask them not to... and they will have to use tricks like looking at the file variable to figure out the directory and such, and that sort of thing is error-prone if we ever have to refactor anything. Remember that this same infrastructure will be used by affiliated packages, and we want to reduce the complications involved in merging in those packages.
    Having said that, it's certainly possible to implement the function something like get_data_filename('wcs/filename'), and have that intelligently go looking in the wcs source directory for a "data" directory. But that's a bit confusing if you don't know there's magic being used. The parallel structure in the data directory is cleaner, as you suggest. Note that that is supported in the current version - all you have to do is move the astropy/wcs/data directory into data/wcs.
  • I agree with this sentiment, so I'm not sure what you're saying needs to be changed here... currently, if I assume your home directory is /home/mdboom, the cached data goes in /home/mdboom/.astropy/datacache, and all configuration goes in /home/mdboom/.astropy/astropy, so the cache and config info are in separate directories. I'm avoiding doing something like putting it in ~/.local or ~/.cache because we want to stay cross-platform, and e.g. windows is laid out differently (hence the ridiculous code for get_config_dir that looks for something like a home directory). Or are you suggesting that the configuration be in /home/mdboom/.astropy/config/astropy or something like that (along with affiliated package config)? I could certain do that quite easily.
  • You're probably right we should say in general we don't want to copy over external docs. The only reason why I say maybe ConfigObj is because it's not obvious how to use it without that, and the docs are all one single html file, so it's pretty straightforward to include them. I don't have a strong opinion on this, though - I could go either way.

@astrofrog
Copy link
Member

One quick question - do we really expect to have so many configuration options that we'll need to split it up into multiple fils, one for each sub-package? If not, then we can have a single file with e.g.

[wcs]
origin = 0

[io.fits]
memmap = True

[util.odict]
option = True

That would be easier than having to create a config file for each sub-package, which could be a pain. Maybe you were already thinking about having this?

If we went with that, then we could even have just a file called .astropyrc and a directory called .astropy.cache. If you foresee the need for other astropy-related files that are not cache files then you could have .astropy/astropyrc (consistent with matplotlib) and .astropy/cache or .astropy.cache for the cache. Just some ideas :-)

@hamogu
Copy link
Member

hamogu commented Oct 28, 2011

+1 for @astrofrog

Having all options in one file makes it so much easier to backup, to synchronize or to give my options to colleagues new to astropy. I don't mind if that single file needs a few hundred lines, because I could just use a text editor to modify it. And, if the options are named sensibly, it's often easier to edit a text file then to search though the documentation to find functions like astropy.util.odict.set_option(option=True) (to stay with @astrofrog's example).

If you think memmap = True is too short, .astropyrc could contain comment lines::

[io.fits]
# Should memory mapping be used to speed up fits file reading? [True, False]
memmap = True

The fitting package sherpa is one example for a larger package which has a single .sherpa.rc file structured in this fashion.

@mdboom
Copy link
Contributor

mdboom commented Oct 28, 2011

@eteq: In response to the general comments:

  • Sure. That seems fine.
  • I'm suggesting that cache goes into ~/.cache/astropy and configuration goes into ~/.local/astropy. The idea is that all of my configuration, not just from astropy, would go in ~/.local. Mac OS-X has a similar standard. Caches are meant to go in ~/Library/Caches. Native apps usually put configuration in ~/Library/Preferences. I know these are platform-specific differences, but they are very useful differences when managing multiple systems etc..
  • Sure.

To @astrofrog 's comment: I don't know if being consistent with how matplotlib handles configuration should be a goal :) It's been on my TODO list for some time to make it more like I described above. My suggestion (to put the files in the modern platform-specific locations for configuration and cache) is orthogonal to whether we decide to have a single file for configuration or a directory with multiple files.

@eteq
Copy link
Member Author

eteq commented Oct 28, 2011

@mdboom regarding the directory structure: My point was that there's no straightforward way to do this in a cross-platform way. I suppose we could look to see if its unix-like, and put it in ~/.local and ~/.cache in those cases, use ~/Libarary/Caches if its OS X, and something else sepcific to windows, but I think that will be confusing to users, because its not a consistent standard. It also isn't inline with ipython or matplotlib, which I think our users are more likely to be familiar with... So while I see the value of the separate directory structures, I'm wondering if we should be bowing to the practical fact that it may be confusing and not terribly portable. I'd be good to get other peoples' opinion on this, though - I am not terribly attached to either solution, I just want to do whatever will be the least confusing.

@eteq
Copy link
Member Author

eteq commented Oct 28, 2011

@astrofrog and @hamogu : I had conversations with a couple other people who favored multiple-file scheme, as that then allows you to skip having sections entirely for smaller subpackages. Again, I don't really have a terribly strong opinion here, so one or many files is fine.

Two complications to consider with a single-file scheme, though: First, affiliated packages that get merged will have to switch to using the astropy config file, so settings from affiliated packages will have to get copied over. This may not be an issue, but is something to be aware of. Second, it's a bit more complicated to enforce the sectioning in ConfigObj if its not at a per-file level. To add a section in ConfigObj, you do configobj['io.fits'] = {}... and the person acccessing the config file will have to be sure to do something like this, which is a bit awkward. Alternatively, I could have it automatically always pick out the right section, but then it's a little more confusing because you have to do sec.parent.write() or similar to write the file. Again, not a major problem, but a potential source of confusion that isn't present for single-file setups, where the onus is on the writer of the package to actually look up ConfigObj before they do anythign else.

@astrofrog
Copy link
Member

@eteq - ok, I don't have a strong preference for single file vs multiple file. By the way, do your comments mean that affiliated packages will be allowed (and should be encouraged) to place their configuration files in the astropy configuration directory? (to avoid having to move the config once the package is merged) If so, how should we deal with name changes? (e.g. specutils might become part of astropy.spectrum)

@mdboom - I don't really have a strong preference for where the cache should go, but I think we want to keep the configuration files in the same location for all systems (~/.astropy for example) rather than having it in ~/Library/Preferences/ on mac for example (I associate that directory with plist files, not unix-style configuration files).

@eteq
Copy link
Member Author

eteq commented Oct 30, 2011

@astrofrog - you make a good point with the name change bit... that's actually a good reason to use a single file for astropy. So perhaps it sould be that the core astropy package configuration would be in ~/.astropy/astropy.cfg , affiliated packages would go in ~/.astropy/affilpkgname.cfg ? Then if the name changes in the switch to astropy, there's no confusion because it ends up in a different file astropy.cfg anyway.

@mdboom - just to be clear, were you saying you thought single-file (e.g. like matplotlib) is a good idea, a bad idea, or you don't have a strong preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be relative imports .configs, .data and .affiliated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I fixed that with a commit just now

@astrofrog
Copy link
Member

@eteq - maybe we could even do ~/.astropy/astropy.cfg for the core package and ~/.astropy/affil/affilpkgname.cfg for affiliated packages (i.e. sandbox affiliated packages into ~/.astropy/affil) - I don't have a strong opinion though.

@mdboom
Copy link
Contributor

mdboom commented Oct 31, 2011

@eteq: Note that the current version of IPython puts its configuration XDG_CONFIG_HOME on Linux. matplotlib has a bug out for this change as well, which will probably be fixed for the next major release. IPython doesn't have any cache, of course, only configuration.

If we really want everything under ~/.astropy, then may I suggest the following: configuration goes under ~/.astropy/config, and cache under ~/.astropy.cache. That will let Linux users put symlinks to each of those under ~/.local and ~/.cache and have a sane backup and sync strategy. An equally good alternative would be to put configuration in ~/.astropy and cache in ~/.cache/astropy.

I don't have strong feelings about whether configuration is a single file or not.

@eteq
Copy link
Member Author

eteq commented Oct 31, 2011

@mdboom: Hmm... it looks like IPython first looks for XDG_CONFIG_HOME and then does .config if it doesn't find it. They do have something like a cache in the form of the history database - that's stored in the config dir (although you're right that just because other people do something that doesn't mean we should). There also seems to be a XDG_CACHE_HOME variable that XDG defines... but neither of those variables seem to be defined by default on either OS X or Ubuntu (although the Ubuntu I checked is a bit out-of-date).

I'm leaning towards a variant of your intermediate solution - I'll look for a directory called astropy in XDG_CONFIG_HOME, then if nothing's there, I'll try ~/.config/astropy, and then ~/.local/astropy. If none of those exist, if XDG_CONFIG_HOME is defined, it will use that, otherwise it will use ~/.astropy/config. similarly, for the cache, it will try XDG_CACHE_HOME and ~/.cache/astropy, and if not found, it will either use XDG_CACHE_HOME or ~/.astropy/cache. If XDG_CACHE_HOME or XDG_CONFIG_HOME are present, I think the safest thing is to still define the ~/.astropy directory, but symlink the subdirs to the appropriate XDG directories. Does that seem like a good solution that satisfies all needs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion for this function that came up in other discussions: It should also be possible to specify the location of the Astropy config dir in an environment variable, such as ASTROPY_CONFIG_DIR. Another nice feature which matplotlib has, is to look for a '.astropy' directory in the cwd. I know we haven't done anything with logging yet, but there should be a log message as to which config dir is being used. If you prefer I could just open a separate issue for this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this makes a lot of sense (modulo my comments below). Regarding the message, for now I'll use print statements, with #TODO comments that clearly indicate this should be switched to the logging framework as soon as it has been merged.

@eteq
Copy link
Member Author

eteq commented Nov 21, 2011

I am closing this pull request and replacing it with a new one because the new version is pretty much different almost everywhere...

@eteq eteq closed this Nov 21, 2011
@eteq eteq mentioned this pull request Nov 21, 2011
mdboom pushed a commit to mdboom/astropy that referenced this pull request Jul 18, 2013
Define an UnitNotSet class so we can have uninitialised units
embray added a commit to embray/astropy that referenced this pull request Jan 23, 2015
…eds_stub attribute to all Extension objects (awesome =_=), so adding this extension to the extension list *before* the original build_ext.finalize causes things to break later when the original build_ext.run tries to access that attribute.
embray pushed a commit to embray/astropy that referenced this pull request Jan 23, 2015
embray added a commit to embray/astropy that referenced this pull request Jan 23, 2015
…ight, but will be useful in the future as a template for tests for build_ext. This also involved reorganizing setup_helpers so that it will be easier to reset its module global variables between tests.
jeffjennings pushed a commit to jeffjennings/astropy that referenced this pull request Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants