Read colormap and colorinterp from GeoTIFF as an attribute#414
Read colormap and colorinterp from GeoTIFF as an attribute#414weiji14 wants to merge 10 commits intocorteva:masterfrom
Conversation
Cherry-picked from pydata/xarray#3136 Co-Authored-By: Alando Ballantyne <alando@radiant.earth>
| ) | ||
| assert attrs == { | ||
| "add_offset": 0.0, | ||
| "colorinterp": (rasterio.enums.ColorInterp.gray,), |
There was a problem hiding this comment.
This will need to be converted to either a string representation or the integer representation. The reason for this is to enable calling to_netcdf without serialization issues.
Here is an example converting to the integer enumeration value:
>>> import rasterio
>>> rasterio.enums.ColorInterp.gray
<ColorInterp.gray: 1>
>>> rasterio.enums.ColorInterp.gray.value
1There was a problem hiding this comment.
An alternative would be to read/write the photometric creation option of the profile into/from the attributes. This is always in the string format and shouldn't have any serialization issues.
There was a problem hiding this comment.
Actually, the colorinterp enum can be serialized since it has an integer representation. It's the colormap dict which I'm having trouble with. The colormap dict looks something like this:
{0: (248, 224, 168, 255),
1: (240, 224, 152, 255),
2: (104, 104, 96, 255),
...
255: (248, 248, 248, 255)}Do you think we should keep this colormap as an xarray attr, or should it be loaded into the data_var as 4 bands (i.e. RGBA)?
There was a problem hiding this comment.
Since it is a dict, you could use json.dumps & json.loads to convert to/from a string.
There was a problem hiding this comment.
Ok, the json serialization has been implemented in 54f9560. Had to do some extra magic because the dict keys have to be int, but JSON only allows str keys. I did try an alternative method using yaml in 5b07d7f as suggested by https://stackoverflow.com/questions/17099556/why-do-int-keys-of-a-python-dict-turn-into-strings-when-using-json-dumps#comment24737635_17099556, but maybe will stick with JSON.
FYI, this is how the xarray.Dataset looks like:
<xarray.DataArray (band: 1, y: 180, x: 360)>
array([[[251, 251, ..., 251, 251],
[251, 252, ..., 251, 252],
...,
[ 68, 16, ..., 68, 217],
[149, 149, ..., 149, 16]]], dtype=uint8)
Coordinates:
* band (band) int64 1
* x (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
* y (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
spatial_ref int64 0
Attributes:
scale_factor: 1.0
add_offset: 0.0
colormap: {"0": [248, 224, 168, 255], "1": [240, 224, 152, 255], "2"...
colorinterp: (<ColorInterp.palette: 2>,)
There was a problem hiding this comment.
Do you think we should keep this
colormapas an xarray attr, or should it be loaded into the data_var as 4 bands (i.e. RGBA)?
Just to follow up on the data_var loading, I tried converting the earth_day_01d_p.tif GeoTiff to NetCDF directly using gdal_translate -of NetCDF -expand rgba earth_day_01d_p.tif earth_day_01d_p.nc.
This is the output of gdalinfo earth_day_01d_p.nc
Driver: netCDF/Network Common Data Format
Files: earth_day_01d_p.nc
Size is 512, 512
Metadata:
NC_GLOBAL#Conventions=CF-1.5
NC_GLOBAL#GDAL=GDAL 3.3.2, released 2021/09/01
NC_GLOBAL#GDAL_AREA_OR_POINT=Area
NC_GLOBAL#history=Sat Nov 06 23:41:47 2021: GDAL CreateCopy( earth_day_01d_p.nc, ... )
Subdatasets:
SUBDATASET_1_NAME=NETCDF:"earth_day_01d_p.nc":Band1
SUBDATASET_1_DESC=[180x360] Band1 (8-bit integer)
SUBDATASET_2_NAME=NETCDF:"earth_day_01d_p.nc":Band2
SUBDATASET_2_DESC=[180x360] Band2 (8-bit integer)
SUBDATASET_3_NAME=NETCDF:"earth_day_01d_p.nc":Band3
SUBDATASET_3_DESC=[180x360] Band3 (8-bit integer)
SUBDATASET_4_NAME=NETCDF:"earth_day_01d_p.nc":Band4
SUBDATASET_4_DESC=[180x360] Band4 (8-bit integer)
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0, 512.0)
Upper Right ( 512.0, 0.0)
Lower Right ( 512.0, 512.0)
Center ( 256.0, 256.0)
So the bands have been expanded into NetCDF subdatasets. I then loaded earth_day_01d_p.nc using rioxarray.open_rasterio, and it produces the following:
<xarray.Dataset>
Dimensions: (y: 180, x: 360, band: 1)
Coordinates:
* y (y) float64 89.5 88.5 87.5 86.5 85.5 ... -86.5 -87.5 -88.5 -89.5
* x (x) float64 -179.5 -178.5 -177.5 -176.5 ... 176.5 177.5 178.5 179.5
* band (band) int64 1
crs int64 0
Data variables:
Band1 (band, y, x) uint8 ...
Band2 (band, y, x) uint8 ...
Band3 (band, y, x) uint8 ...
Band4 (band, y, x) uint8 ...
Attributes:
Conventions: CF-1.5
GDAL: GDAL 3.3.2, released 2021/09/01
GDAL_AREA_OR_POINT: Area
history: Sat Nov 06 23:41:47 2021: GDAL CreateCopy( earth_day...
Notice the Band1-Band4 data variables. So I'm thinking there should be a way to mimic this structure. Need to find a way to use the colormap dict currently in the attr, and get these Bands 1/2/3/4 information out as data variables.
There was a problem hiding this comment.
Mind doing ncdump -h earth_day_01d_p.nc and pasting the output here?
rioxarray/_io.py
Outdated
| if hasattr(riods, "colormap"): | ||
| # A dict containing the colormap for a band | ||
| try: | ||
| attrs["colormap"] = riods.colormap(1) |
There was a problem hiding this comment.
This only gets the colormap for the first band. I imagine it would be good to handle this for all of the bands. docs
There was a problem hiding this comment.
Yep, probably need to loop through all the bands, but need to think about https://github.com/corteva/rioxarray/pull/414/files#r722041137 first 🙂
|
#429 should help. |
Codecov Report
@@ Coverage Diff @@
## master #414 +/- ##
==========================================
+ Coverage 96.07% 96.11% +0.04%
==========================================
Files 13 13
Lines 1530 1546 +16
==========================================
+ Hits 1470 1486 +16
Misses 60 60
Continue to review full report at Codecov.
|
| # write colorinterp and colormap | ||
| # see https://rasterio.readthedocs.io/en/latest/topics/color.html | ||
| try: | ||
| raster_handle.colorinterp = tags["colorinterp"] |
There was a problem hiding this comment.
Mind using separate try/except blocks for colorinterp and colormap?
|
Closing because this PR has become stale. Feel free to re-open or submit a new PR if you would like to contribute this feature. |
Preliminary support to read in and write color interpretation information from GeoTIFF files. Partially based on @alando46's code at pydata/xarray#3136. Rasterio reference at https://rasterio.readthedocs.io/en/latest/topics/color.html
docs/history.rstfor all changes anddocs/rioxarray.rstfor new API