-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When building the HDF5 file of the graph database using exposure component from the propedia database (and protCID database as well), I get the following error:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/root/deeprankcore/deeprank-core/deeprankcore/query.py", line 245, in _process_one_query
graph.write_to_hdf5(output_path)
File "/root/deeprankcore/deeprank-core/deeprankcore/utils/graph.py", line 218, in write_to_hdf5
node_features_group.create_dataset(
File "/usr/local/lib/python3.9/site-packages/h5py/_hl/group.py", line 183, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/usr/local/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 86, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File "h5py/h5t.pyx", line 1664, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1688, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1748, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/netcache/data/dlepikhov/propedia_ssl/script/build_propedia.py", line 44, in <module>
h5_p = queries.process(
File "/root/deeprankcore/deeprank-core/deeprankcore/query.py", line 329, in process
pool.map(pool_function, self.queries)
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
This happens only when trying to calculate the exposure feature component. At first, I thought it is because H atoms are a problem, but removing them from pdb files didn't help.
Environment:
- OS system: ubuntu
- Version:
- Branch commit ID:
- Inputs:
To Reproduce
Steps/commands/screenshots to reproduce the behaviour:
Run the following script:
import sys
import os
sys.path.append(os.path.abspath("."))
from deeprankcore.features import torsion_angle, components, contact, exposure
from deeprankcore.query import QueryCollection, ProteinProteinInterfaceResidueQuery
from deeprankcore.dataset import GraphDataset
import pickle
import argparse
import glob
arg_parser = argparse.ArgumentParser(description="""
Script used to build the features using deeprankcore package.
""")
arg_parser.add_argument("--h5out",
help="Path where the HDF5 features will be saved."
)
arg_parser.add_argument("--pdb",
help="glob string to look for pdb files used to generate features."
)
arg_parser.add_argument("--nworkers",
help="""
Providing this argument will set a specific number of cpus used to process the query.
By default, all cpus are used.
""",
default=None,
type=int
)
a = arg_parser.parse_args()
pdb_paths = glob.glob(a.pdb)
queries = QueryCollection()
chain_ids = [p.split("/")[-1].replace(".pdb", "").split("_")[-2:] for p in pdb_paths]
print(f"Number of cases: {len(pdb_paths)}")
for i, p in enumerate(pdb_paths):
queries.add(ProteinProteinInterfaceResidueQuery(
pdb_path = p,
chain_id1 = chain_ids[i][0],
chain_id2 = chain_ids[i][1],
))
h5_p = queries.process(
a.h5out,
cpu_count = a.nworkers,
feature_modules = [
components,
torsion_angle,
contact,
exposure
]
)
Expected Results
Normally I get a HDF5 concatenated file.
Actual Results or Error Info
If applicable, add screenshots to help explain your problem.
Additional Context
Add any other context about the problem here.
Reactions are currently unavailable
Metadata
Metadata
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
Done