Skip to content

HDF5 file not generated when exposure feature module is used on PDB files from Propedia database #463

@DanLep97

Description

@DanLep97

Describe the bug
When building the HDF5 file of the graph database using exposure component from the propedia database (and protCID database as well), I get the following error:

"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/root/deeprankcore/deeprank-core/deeprankcore/query.py", line 245, in _process_one_query
    graph.write_to_hdf5(output_path)
  File "/root/deeprankcore/deeprank-core/deeprankcore/utils/graph.py", line 218, in write_to_hdf5
    node_features_group.create_dataset(
  File "/usr/local/lib/python3.9/site-packages/h5py/_hl/group.py", line 183, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/usr/local/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 86, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py/h5t.pyx", line 1664, in h5py.h5t.py_create
  File "h5py/h5t.pyx", line 1688, in h5py.h5t.py_create
  File "h5py/h5t.pyx", line 1748, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/netcache/data/dlepikhov/propedia_ssl/script/build_propedia.py", line 44, in <module>
    h5_p = queries.process(
  File "/root/deeprankcore/deeprank-core/deeprankcore/query.py", line 329, in process
    pool.map(pool_function, self.queries)
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

This happens only when trying to calculate the exposure feature component. At first, I thought it is because H atoms are a problem, but removing them from pdb files didn't help.

Environment:

  • OS system: ubuntu
  • Version:
  • Branch commit ID:
  • Inputs:

To Reproduce
Steps/commands/screenshots to reproduce the behaviour:

Run the following script:

import sys
import os
sys.path.append(os.path.abspath("."))
from deeprankcore.features import torsion_angle, components, contact, exposure
from deeprankcore.query import QueryCollection, ProteinProteinInterfaceResidueQuery
from deeprankcore.dataset import GraphDataset
import pickle
import argparse
import glob

arg_parser = argparse.ArgumentParser(description="""
    Script used to build the features using deeprankcore package.
""")
arg_parser.add_argument("--h5out",
    help="Path where the HDF5 features will be saved."
)
arg_parser.add_argument("--pdb",
    help="glob string to look for pdb files used to generate features."
)
arg_parser.add_argument("--nworkers",
    help="""
    Providing this argument will set a specific number of cpus used to process the query.
    By default, all cpus are used.
    """,
    default=None,
    type=int
)
a = arg_parser.parse_args()

pdb_paths = glob.glob(a.pdb)

queries = QueryCollection()

chain_ids = [p.split("/")[-1].replace(".pdb", "").split("_")[-2:] for p in pdb_paths]
print(f"Number of cases: {len(pdb_paths)}")

for i, p in enumerate(pdb_paths):
    queries.add(ProteinProteinInterfaceResidueQuery(
        pdb_path = p,
        chain_id1 = chain_ids[i][0],
        chain_id2 = chain_ids[i][1],
    ))

h5_p = queries.process(
    a.h5out,
    cpu_count = a.nworkers,
    feature_modules = [
        components,
        torsion_angle,
        contact,
        exposure
    ]
)

Expected Results
Normally I get a HDF5 concatenated file.

Actual Results or Error Info
If applicable, add screenshots to help explain your problem.

Additional Context
Add any other context about the problem here.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions