There is a newer version of the record available.

Published October 25, 2024 | Version v1
Software Open

Artifact for "Tidyparse: A Tool for Realtime Syntax Repair"

  • 1. ROR icon McGill University

Description

This is the experimental artifact for the TACAS '25 submission "Tidyparse: A Tool for Realtime Syntax Repair". To run it, ensure you have Java 21 installed, then download the file tacas25-experiments.jar and run the command:

java -jar tacas25-artifact.jar -Xmx32g 2>&1 | tee repairs.log

After a while, the log should contain the repair instances and a list of aggregate statistics. The parent folder will contain two files named bar_hillel_results_{positive, negative}*.csv, containing the statistics for each repair instance. For the negative examples, this will contain the following columns:

length, lev_dist, samples...

Where length is the length of the broken code snippet, lev_dist is the Levenshtein distance between the broken and fixed code snippets, and samples are the total number of samples drawn before timeout. For the positive examples, this will contain the following columns:

length, lev_dist, sample_ms ... rank

Where the first two columns are the same, sample_ms was the time it took to find the human repair after constructing the language intersection automaton, and rank was the rank of the true repair in the list of all repair suggestions.

These statistics will also be aggregrated and displayed in a streaming fashion in the terminal and repairs.log file. Next to the individual repairs instances, it will periodically display running statistics that look as follows:

Lev(*): Top-1/rec/pos/total: 1 / 1 / 1 / 1, errors: 0, P@1: 1.0, P@All: 1.0
Lev(3): Top-1/rec/pos/total: 1 / 1 / 1 / 1, errors: 0, P@1: 1.0, P@All: 1.0
Draw timings (ms): {1=0.0, 2=0.0, 3=731.0}
Full timings (ms): {1=0.0, 2=0.0, 3=10513.0}
Avg samples drawn: {1=0.0, 2=0.0, 3=9853.0}
  • Top-1 is the number of repair instanaces where the true repair was sampled and ranked first
  • rec is the number of repair instances where the true repair was sampled, but not ranked first
  • pos is the number of instances where the true repair could have been sampled, but was not
  • total are the total number of repair instances evaluated so far

It will also contain the following data, which a more granular summary of the running average precision across all repair instances, broken down by snippet length and edit distance, where |σ| is the length of the broken code snippet and Δ indicates the Levenshtein distance of the true repair.

Precision@1
===========
|σ|∈[0, 10): Top-1/total: 5 / 26 ≈ 0.19230769230769232
|σ|∈[10, 20): Top-1/total: 9 / 31 ≈ 0.2903225806451613
|σ|∈[20, 30): Top-1/total: 9 / 27 ≈ 0.3333333333333333
|σ|∈[30, 40): Top-1/total: 9 / 26 ≈ 0.34615384615384615
|σ|∈[40, 50): Top-1/total: 3 / 5 ≈ 0.6
Δ(1)= Top-1/total: 17 / 32 ≈ 0.53125
Δ(2)= Top-1/total: 10 / 38 ≈ 0.2631578947368421
Δ(3)= Top-1/total: 8 / 45 ≈ 0.17777777777777778
(|σ|∈[0, 10), Δ=1): Top-1/total: 3 / 10 ≈ 0.3
(|σ|∈[0, 10), Δ=2): Top-1/total: 2 / 7 ≈ 0.2857142857142857
(|σ|∈[0, 10), Δ=3): Top-1/total: 0 / 9 ≈ 0.0
(|σ|∈[10, 20), Δ=1): Top-1/total: 4 / 7 ≈ 0.5714285714285714
(|σ|∈[10, 20), Δ=2): Top-1/total: 2 / 12 ≈ 0.16666666666666666
(|σ|∈[10, 20), Δ=3): Top-1/total: 3 / 12 ≈ 0.25
(|σ|∈[20, 30), Δ=1): Top-1/total: 3 / 3 ≈ 1.0
(|σ|∈[20, 30), Δ=2): Top-1/total: 3 / 10 ≈ 0.3
(|σ|∈[20, 30), Δ=3): Top-1/total: 3 / 14 ≈ 0.21428571428571427
(|σ|∈[30, 40), Δ=1): Top-1/total: 5 / 8 ≈ 0.625
(|σ|∈[30, 40), Δ=2): Top-1/total: 3 / 9 ≈ 0.3333333333333333
(|σ|∈[30, 40), Δ=3): Top-1/total: 1 / 9 ≈ 0.1111111111111111
(|σ|∈[40, 50), Δ=1): Top-1/total: 2 / 4 ≈ 0.5
(|σ|∈[40, 50), Δ=3): Top-1/total: 1 / 1 ≈ 1.0

Running the full set of experiments can take several hours depending on the machine.

Files

Files (370.5 MB)

Name Size Download all
md5:71955bdec1d1dfc0b3a14d8c6fd0644d
370.5 MB Download

Additional details

Software

Repository URL
http://github.com/tidyparse/tidyparse
Programming language
Kotlin
Development Status
Active