-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hi,
I'm now regularly using xarray (& dask) for organising and analysing the output of the simulation code I use (BOUT++) and it's very helpful, thank you!.
However my current approach is quite clunky at dealing the extra information and functionality that's specific to the simulation code I'm using, and I have questions about what the recommended way to extend the xarray Dataset class is. This seems like a general enough problem that I thought I would make an issue for it.
Desired
What I ideally want to do is extend the xarray.Dataset class to accommodate extra attributes and methods, while retaining as much xarray functionality as possible, but avoiding reimplementing any of the API. This might not be possible, but ideally I want to make a BoutDataset class which contains extra attributes to hold information about the run which doesn't naturally fit into the xarray data model, extra methods to perform analysis/plotting which only users of this code would require, but also be able to use xarray-specific methods and top-level functions:
bd = BoutDataset('/path/to/data')
ds = bd.data # access the wrapped xarray dataset
extra_data = bd.extra_data # access the BOUT-specific data
bd.isel(time=-1) # use xarray dataset methods
bd2 = BoutDataset('/path/to/other/data')
concatenated_bd = xr.concat([bd, bd2]) # apply top-level xarray functions to the data
bd.plot_tokamak() # methods implementing bout-specific functionalityProblems with my current approach
I have read the documentation about extending xarray, and the issue threads about subclassing Datasets (#706) and accessors (#1080), but I wanted to check that what I'm doing is the recommended approach.
Right now I'm trying to do something like
@xr.register_dataset_accessor('bout')
class BoutDataset:
def __init__(self, path):
self.data = collect_data(path) # collect all my numerical data from output files
self.extra_data = read_extra_data(path) # collect extra data about the simulation
def plot_tokamak():
plot_in_bout_specific_way(self.data, self.extra_data)which works in the sense that I can do
bd = BoutDataset('/path/to/data')
ds = bd.bout.data # access the wrapped xarray dataset
extra_data = bd.bout.extra_data # access the BOUT-specific data
bd.bout.plot_tokamak() # methods implementing bout-specific functionalitybut not so well with
bd.isel(time=-1) # AttributeError: 'BoutDataset' object has no attribute 'isel'
bd.bout.data.isel(time=-1) # have to do this instead, but this returns an xr.Dataset not a BoutDataset
concatenated_bd = xr.concat([bd1, bd2]) # TypeError: can only concatenate xarray Dataset and DataArray objects, got <class 'BoutDataset'>
concatenated_ds = xr.concat([bd1.bout.data, bd2.bout.data]) # again have to do this instead, which again returns an xr.Dataset not a BoutDatasetIf I have to reimplement the APl for methods like .isel() and top-level functions like concat(), then why should I not just subclass xr.Dataset?
There aren't very many top-level xarray functions so reimplementing them would be okay, but there are loads of Dataset methods. However I think I know how I want my BoutDataset class to behave when an xr.Dataset method is called on it: I want it to implement that method on the underlying dataset and return the full BoutDatset with extra data and attributes still attached.
Is it possible to do something like:
"if calling an xr.Dataset method on an instance of BoutDataset, call the corresponding method on the wrapped dataset and return a BoutDataset that has the extra BOUT-specific data propagated through"?
Thanks in advance, apologies if this is either impossible or relatively trivial, I just thought other xarray users might have the same questions.