For a lot of users it might be handy to just use a link to a dataset on huggingface, instead of manually providing the sequence and label fasta files.
The configuration could look like this for example:
sequence_file: null
hf_dataset:
path: proteinea/fluorescence
name: null # Optional name can specify a dataset configuration
sequence_column: primary
target_column: log_fluorescence
protocol: sequence_to_class
embeddings_file: per_protein_embeddings.h5
model_choice: FNN
optimizer_choice: adam
learning_rate: 0.001
dropout_rate: 0.25
loss_choice: cross_entropy_loss
Further resources:
For a lot of users it might be handy to just use a link to a dataset on huggingface, instead of manually providing the sequence and label fasta files.
The configuration could look like this for example:
Further resources: