Skip to content

Creating queries from pdb files with underscore in the filename gives unexpected query ids #411

@DaniBodor

Description

@DaniBodor

What happens is that if the name until the underscore is repeated, it is flagged as an identical filename and everything behind the underscore is scrapped and replaced by the index.

For example, in the tests/data/hdf5/_generate_testdata.ipynb notebook, the "Generating 1ATN_ppi.hdf5" cell should adds these files:

pdb_paths = [
    str(PATH_TEST / "data/pdb/1ATN/1ATN_1w.pdb"),
    str(PATH_TEST / "data/pdb/1ATN/1ATN_2w.pdb"),
    str(PATH_TEST / "data/pdb/1ATN/1ATN_3w.pdb"),
    str(PATH_TEST / "data/pdb/1ATN/1ATN_4w.pdb")]

but then gives the following:

Query with ID residue-ppi:A-B:1ATN has already been added to the collection. Renaming it as residue-ppi:A-B:1ATN_2
Query with ID residue-ppi:A-B:1ATN has already been added to the collection. Renaming it as residue-ppi:A-B:1ATN_3
Query with ID residue-ppi:A-B:1ATN has already been added to the collection. Renaming it as residue-ppi:A-B:1ATN_4

This is likely due to add function in query.py not dealing with underscores in existing filenames and assumes them to result from index-numbering:

query_id_base = query_id.split("_")[0]
if query_id_base not in self.ids_count:
    self.ids_count[query_id_base] = 1
else:
    self.ids_count[query_id_base] += 1
    new_id = query.model_id.split("_")[0] + "_" + str(self.ids_count[query_id_base])
    query.model_id = new_id

Metadata

Metadata

Labels

Queryquery module related issuesstaleissue not touched from too much time

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions