implement the inline repr by keewis · Pull Request #22 · xarray-contrib/pint-xarray

keewis · 2020-08-06T17:22:54Z

As discussed in hgrecco/pint#1133, this monkeypatches the _repr_inline_ method introduced in pydata/xarray#4248 onto pint.Quantity, resulting in:

In [1]: import xarray as xr 
   ...: import numpy as np 
   ...: import pint 
   ...: import pint_xarray 
   ...:  
   ...: ds = xr.Dataset({"a": ("x", np.linspace(0, 1, 100), {"units": "m"})}) 
   ...: display(ds) 
   ...:  
   ...: quantified = ds.pint.quantify() 
   ...: display(quantified)
<xarray.Dataset>
Dimensions:  (x: 100)
Dimensions without coordinates: x
Data variables:
    a        (x) float64 0.0 0.0101 0.0202 0.0303 ... 0.9697 0.9798 0.9899 1.0
<xarray.Dataset>
Dimensions:  (x: 100)
Dimensions without coordinates: x
Data variables:
    a        (x) float64 [m] 0.0 0.0101 0.0202 0.0303 ... 0.9798 0.9899 1.0

However, this uses xarray.core.formatting.format_array_flat which is not public API (maybe it should be?).

Also, no tests, yet, because I don't know how to best check a repr.

codecov · 2020-08-06T17:23:59Z

Codecov Report

Merging #22 into master will decrease coverage by 1.08%.
The diff coverage is 78.02%.

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
- Coverage   87.25%   86.16%   -1.09%     
==========================================
  Files           9       11       +2     
  Lines         706      795      +89     
==========================================
+ Hits          616      685      +69     
- Misses         90      110      +20

Impacted Files	Coverage Δ
pint_xarray/formatting.py	`74.35% <74.35%> (ø)`
pint_xarray/__init__.py	`100.00% <100.00%> (ø)`
pint_xarray/tests/test_formatting.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4347a32...35ff26e. Read the comment docs.

keewis · 2020-08-06T23:01:31Z

thinking about this some more, this doesn't work for anything other than numpy.ndarray objects. For now I guess we could use the magnitude's _repr_inline_ (falling back to __repr__ if that doesn't exist) and only use format_array_flat if the magnitude is a ndarray.

However, as we nest deeper (e.g. xarray(pint(uncertainties(dask(sparse(cupy))))) – for argument's sake, let's assume that this actually makes sense) this might break or become really complicated. Does anyone have any ideas how to deal with that?

If I'm simply missing something we have that discussion here, otherwise I guess we should open a issue on xarray's issue tracker.

jthielen · 2020-08-06T23:30:47Z

Yes, I agree that format_array_flat should probably just apply to magnitude being an ndarray.

I think a cascading series of _repr_inline_ should work for nested arrays, so long as

the metadata of the higher nested objects is considered the priority (if not, then we're back to a fully managed solution to the likes of Design: nested _meta dask/dask#5329)
small max lengths are handled gracefully (i.e., a minimum where it is just like Dask.Array(...), then ..., then nothing)
we're okay with the lowest arrays in large nesting chains not having any information show up in the inline repr (situation where there is not enough characters to even describe the full nesting has to be accounted for somehow)
it can be adopted without too much complaint across the ecosystem

Assuming all this, then each layer of the nesting will reduce the max length of the inline repr string available to the layers below it, until a layer reaches a reasonable minimum where it "gives up". At least that's the natural design that I inferred from the simple _repr_inline_(max_width) API.

All that being said, it might still be good to bring up on xarray's end since this is a more general issue with inline reprs of nested duck arrays, with nothing pint-specific other than it being the motivating use-case.

jthielen · 2020-08-06T23:34:30Z

Also, as far as tests go, I'd think a unit test for this could be as simple as

assert q._repr_inline_(40) == "expected 40 character inline string repr..."

for a couple combinations of dtypes, shapes, and max widths. I'd see no need for integration tests making sure the final repr is as expected (since that's more on xarray's end).

keewis · 2020-08-20T15:27:04Z

not sure how to test that we forward to duck arrays properly, but that's something that depends on pydata/xarray#4324.

So I guess this is ready for review?

jthielen · 2020-08-23T17:13:11Z

My only suggestion is vendoring xarray.core.formatting.format_array_flat instead of using it from xarray. I've gotten burned before relying on xarray's private API. If at some point format_array_flat becomes public API, then we can remove the vendoring and just rely on that.

Otherwise looks good!

keewis · 2020-08-23T20:07:06Z

Vendoring format_array_flat would increase the number of direct dependencies (we'd depend on pandas) and also pull in 12 other functions, so this might be pretty painful. We could make it semi-public in xarray instead? Any thoughts on this, @shoyer?

shoyer · 2020-08-23T20:24:24Z

👍 for vendoring. I suspect you can make use a much simplified version of the associated helper functions, because pandas is really only used for datetime formatting.

keewis · 2020-08-23T20:46:51Z

well, although it doesn't really make sense, this works:

In [5]: import pint
   ...: import pandas as pd
   ...: 
   ...: ureg = pint.UnitRegistry()
   ...: ureg.Quantity(pd.date_range("2010-01-01", periods=10).to_numpy(), "m")
Out[5]: 
array(['2010-01-01T00:00:00.000000000', '2010-01-02T00:00:00.000000000',
       '2010-01-03T00:00:00.000000000', '2010-01-04T00:00:00.000000000',
       '2010-01-05T00:00:00.000000000', '2010-01-06T00:00:00.000000000',
       '2010-01-07T00:00:00.000000000', '2010-01-08T00:00:00.000000000',
       '2010-01-09T00:00:00.000000000', '2010-01-10T00:00:00.000000000'],
      dtype='datetime64[ns]') <Unit('meter')>

if we disregard that, vendoring would indeed be much easier.

keewis · 2020-08-23T21:12:49Z

with the vendoring the coverage decreases, but I think that's fine as long as we keep it in sync with the xarray version.

jthielen · 2020-08-24T16:44:18Z

Looks good now with the vendoring.

I did catch one more thing on another go-through (sorry about that). Hopefully this won't matter in practice, but the pathological case where max_width - len(units_repr) - 3 < 0 doesn't look to be explicitly handled. There should probably at least be a test that it degrades properly and does not error? Not sure what we actually want the repr to look like in that case (perhaps just ...)?

keewis · 2020-08-24T23:50:22Z

I switched to using the vendored maybe_truncate and added a test for max_width - len(units) - 3 < 0. Right now, it will keep the units, no matter how short the leftover width becomes. Not sure if that's okay? The displayed repr of dask and sparse arrays (which are special cased in xarray) will also be longer than max_width if necessary.

jthielen · 2020-08-26T17:05:02Z

I switched to using the vendored maybe_truncate and added a test for max_width - len(units) - 3 < 0. Right now, it will keep the units, no matter how short the leftover width becomes. Not sure if that's okay? The displayed repr of dask and sparse arrays (which are special cased in xarray) will also be longer than max_width if necessary.

Seems okay to me based on the existing behavior as you mentioned. If it becomes a problem, it can always be adjusted later.

keewis added 2 commits August 6, 2020 19:04

implement the inline repr

6a8c77d

use unit abbreviations

149750c

keewis mentioned this pull request Aug 7, 2020

constructing nested inline reprs pydata/xarray#4324

Open

keewis added 3 commits August 20, 2020 16:47

fall back to the repr for non-ndarray magnitudes

53a5071

add tests for the inline repr

79930e5

Merge branch 'master' into inline-repr

1c5af03

keewis mentioned this pull request Aug 22, 2020

API? #6

Open

16 tasks

vendor format_array_flat

a69ed59

keewis added 3 commits August 25, 2020 01:42

use maybe_truncate instead of slicing

4674197

add a test where the calculation of remaining chars becomes negative

a5a2424

Merge branch 'master' into inline-repr

41e51db

keewis added 2 commits August 26, 2020 11:09

configure pytest to use the xunit2 junit family

3dbe838

fix the pytest configuration

35ff26e

Merge branch 'master' into inline-repr

7abdf26

keewis merged commit 857a8e1 into xarray-contrib:master Aug 26, 2020

keewis deleted the inline-repr branch August 26, 2020 21:32

Conversation

keewis commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

keewis commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jthielen commented Aug 6, 2020

Uh oh!

jthielen commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Aug 20, 2020

Uh oh!

jthielen commented Aug 23, 2020

Uh oh!

keewis commented Aug 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Aug 23, 2020

Uh oh!

keewis commented Aug 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Aug 23, 2020

Uh oh!

jthielen commented Aug 24, 2020

Uh oh!

keewis commented Aug 24, 2020

Uh oh!

jthielen commented Aug 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keewis commented Aug 6, 2020 •

edited

Loading

codecov bot commented Aug 6, 2020 •

edited

Loading

keewis commented Aug 6, 2020 •

edited

Loading

jthielen commented Aug 6, 2020 •

edited

Loading

keewis commented Aug 23, 2020 •

edited

Loading

keewis commented Aug 23, 2020 •

edited

Loading