Skip to content

Add integration with huggingface dataset API #118

@SebieF

Description

@SebieF

For a lot of users it might be handy to just use a link to a dataset on huggingface, instead of manually providing the sequence and label fasta files.

The configuration could look like this for example:

sequence_file: null
hf_dataset:
  path: proteinea/fluorescence
  name: null  # Optional name can specify a dataset configuration
  sequence_column: primary
  target_column: log_fluorescence
protocol: sequence_to_class
embeddings_file: per_protein_embeddings.h5
model_choice: FNN
optimizer_choice: adam
learning_rate: 0.001
dropout_rate: 0.25
loss_choice: cross_entropy_loss

Further resources:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions