Skip to content

Joblib.hash fails on numpy structured array with overlapping fields #826

@adrinjalali

Description

@adrinjalali

Encountered the issue while working on scikit-learn/scikit-learn#12866

The problem is that joblib.hash fails on numpy arrays of structured dtypes, if the fields in the type have overlapping offsets. Here's a minimal example:

>>> import numpy as np
>>> import joblib
>>> 
>>> d = np.dtype({'names': ['x', 'y'],
...               'formats': [np.int64, np.float64],
...               'offsets': [0, 0]})
>>> 
>>> a = np.array([1], dtype=d)
>>> joblib.hash(a)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File ".../joblib/hashing.py", line 263, in hash
    return hasher.hash(obj)
  File ".../joblib/hashing.py", line 69, in hash
    self.dump(obj)
  File "/usr/lib64/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File ".../joblib/hashing.py", line 243, in save
    Hasher.save(self, obj)
  File ".../joblib/hashing.py", line 95, in save
    Pickler.save(self, obj)
  File "/usr/lib64/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib64/python3.7/pickle.py", line 771, in save_tuple
    save(element)
  File ".../joblib/hashing.py", line 243, in save
    Hasher.save(self, obj)
  File ".../joblib/hashing.py", line 95, in save
    Pickler.save(self, obj)
  File "/usr/lib64/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib64/python3.7/pickle.py", line 786, in save_tuple
    save(element)
  File ".../joblib/hashing.py", line 242, in save
    obj = (klass, ('HASHED', obj.descr))
  File ".../numpy/core/_internal.py", line 115, in _array_descr
    "dtype.descr is not defined for types with overlapping or "
ValueError: dtype.descr is not defined for types with overlapping or out-of-order fields

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions