We provide a dataset of equations represented as PNG-images and Latex code. This dataset is useful for learning to detect similarities in equations. Check it out HERE!
You have to build the respective datasets before you can train or evaluate the equation-encoder.
To build the training dataset run:
python formula_data.py train path/to/weak_data_trainTo build the evaluation dataset (Gold-Label Evaluation Data) run:
python formula_data.py eval path/to/eval2To build the evaluation dataset (Hold-Out Data) run:
python formula_data.py test path/to/weak_data_testIn order to pretrain the equation-encoder run this:
python pretrain_experiment.py with dataset=task data_source=path/to/weak_data_train
task should be either abstract or symbols depending on which pretraining task you want to run.
In order to train the equation-encoder run this:
python equen_experiment.pyIf you want to use weights from pretraining you should run something like:
python equen_experiment.py with pretrained_weights=path/to/weightspath/to/weights should be something like equen_runs/x with x as the number of the respective training routine.
In order to evaluate the trained weights from all epochs of a training routine run this:
python evaluation.py with run=path/to/runpath/to/run should be something like equen_runs/x with x as the number of the respective training routine.
If you want to evaluate on Hold-Out data instead of the Gold-Label data you should run:
python evaluation.py with run=path/to/run dataset=test