Skip to content

Improve Trainer and DeeprankDataset for production testing #510

@gcroci2

Description

@gcroci2

There are some issues when using the package for testing a pre-trained model on newly generated data:

  • The GraphDataset class requires dataset_train as input even in such cases (whenever train is False). We should be able of using a test dataset without the need of the original model's training dataset. We can use the info stored in the pre-trained model for inherit the needed attributes. (see _check_inherited_params in dataset.py)
  • In the Trainer class' init, before loading parameters and the pretrained model there is a check for the target, which in a pre-trained model case could be not present at all.
  • The Trainer class expects the attribute epoch_saved_model, which should be saved within the state of the pre-trained model.
  • If the test dataset has no labels, the output exporter doesn't work (ValueError("All arrays must be of the same length"))

In order to make reasonable changes, I think we need to take into account all the possible scenarios using a mock example:

  • No pre-trained model, train, valid, and test. (should be good)
  • No pre-trained model, train, valid, no test. (should be good)
  • No pre-trained model, train only. (should be good)
  • Pre-trained model, test only, with labels. (the one to improve the code for)
  • Pre-trained model, test only, with no labels. (the one to improve the code for)

Metadata

Metadata

Assignees

Labels

staleissue not touched from too much time

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions