Skip to content

flexible codecs cannot handle compound dtypes containing objects #333

@martindurant

Description

@martindurant

This is missing, but reasonably expectable, functionality. If you have a zarr array with a compound dtype (i.e., records) and any of the fields are object type (i.e., strings), then you cannot roundtrip the data even though JSON/msgpack/pickle are capable of converting the array.

Minimal, reproducible code sample, a copy-pastable example if possible

>>> a = np.array([('aaa', 1, 4.2),
...               ('bbb', 2, 8.4),
...               ('ccc', 3, 12.6)],
...              dtype=[('foo', 'O'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a, object_codec=numcodecs.JSON(), fill_value=None)
>>> z["foo"]
ValueError: setting an array element with a sequence.

(without fill_value, this errors earlier sue to the use of np.zeros to guess a fill; with an appropriate fill_value=("", 0, 0.), it fails at array creation too)

Problem description

JSON and similar codecs store only the dtype.str, which is a "Vxx" in these cases, which means a suitable empty array cannot be made at load time.

The following fixes this for JSON, but looks ugly.

--- a/numcodecs/json.py
+++ b/numcodecs/json.py
@@ -1,3 +1,4 @@
+import ast
 import json as _json
 import textwrap

@@ -56,14 +57,18 @@ class JSON(Codec):
     def encode(self, buf):
         buf = np.asarray(buf)
         items = buf.tolist()
-        items.append(buf.dtype.str)
+        items.append(str(buf.dtype))
         items.append(buf.shape)
         return self._encoder.encode(items).encode(self._text_encoding)

     def decode(self, buf, out=None):
         items = self._decoder.decode(ensure_text(buf, self._text_encoding))
-        dec = np.empty(items[-1], dtype=items[-2])
-        dec[:] = items[:-2]
+        if "[" in items[-2]:
+            dec = np.empty(items[-1], dtype=ast.literal_eval(items[-2]))
+            dec[:] = [tuple(_) for _ in items[:-2]]
+        else:
+            dec = np.empty(items[-1], dtype=items[-2])
+            dec[:] = items[:-2]
         if out is not None:
             np.copyto(out, dec)

Version and installation information

Please provide the following:

  • numcodecs.__version__ 0.10.0
  • Version of Python interpreter 3.8.8
  • Operating system: Mac
  • How NumCodecs was installed: pip from source

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions