🚀 Feature request
It seems that run_glue.py has the train set and the validation set management for local CSV/JSON files, but it doesn't have args for managing the test set of the local datasets.
|
train_file: Optional[str] = field( |
|
default=None, metadata={"help": "A csv or a json file containing the training data."} |
|
) |
|
validation_file: Optional[str] = field( |
|
default=None, metadata={"help": "A csv or a json file containing the validation data."} |
|
) |
I think the script is intended to be used not only for the train/validation but the test, as glue tasks test sets are downloaded as shown in https://huggingface.co/docs/datasets/loading_datasets.html#selecting-a-configuration.
It has the --do_predict option for the test sets.
If there is no particular reason for not having the ability to read the test set in the local dataset, would it be ok for me to add the feature?
Or is there some intention behind this implementation?
Motivation
I'd like to train, validate, and test my own local dataset.
Your contribution
I think some modifications like the below may help to add the feature.
test_file: Optional[str] = field(
default=None, metadata={"help": "A csv or a json file containing the test data."}
)
datasets = load_dataset(
"csv", data_files={"train": data_args.train_file, "validation": data_args.validation_file, "test": data_args.test_file}
)
# if data_args.task_name is not None:
# test_dataset = datasets["test_matched" if data_args.task_name == "mnli" else "test"]
test_dataset = datasets["test_matched" if data_args.task_name == "mnli" else "test"]
Thank you in advance.
🚀 Feature request
It seems that
run_glue.pyhas the train set and the validation set management for local CSV/JSON files, but it doesn't have args for managing the test set of the local datasets.transformers/examples/text-classification/run_glue.py
Lines 90 to 95 in 7a9f1b5
I think the script is intended to be used not only for the train/validation but the test, as
gluetasks test sets are downloaded as shown in https://huggingface.co/docs/datasets/loading_datasets.html#selecting-a-configuration.It has the
--do_predictoption for the test sets.If there is no particular reason for not having the ability to read the test set in the local dataset, would it be ok for me to add the feature?
Or is there some intention behind this implementation?
Motivation
I'd like to train, validate, and test my own local dataset.
Your contribution
I think some modifications like the below may help to add the feature.
Thank you in advance.