Artifact for "Tidyparse: A Tool for Realtime Syntax Repair"
Description
This is the experimental artifact for the TACAS '25 submission "Tidyparse: A Tool for Realtime Syntax Repair". To run it, ensure you have Java 21 installed, then download the file tacas25-experiments.jar and run the command:
java -jar tacas25-artifact.jar -Xmx32g 2>&1 | tee repairs.log
After a while, the log should contain the repair instances and a list of aggregate statistics. The parent folder will contain two files named bar_hillel_results_{positive, negative}*.csv, containing the statistics for each repair instance. For the negative examples, this will contain the following columns:
length, lev_dist, samples...
Where length is the length of the broken code snippet, lev_dist is the Levenshtein distance between the broken and fixed code snippets, and samples are the total number of samples drawn before timeout. For the positive examples, this will contain the following columns:
length, lev_dist, sample_ms ... rank
Where the first two columns are the same, sample_ms was the time it took to find the human repair after constructing the language intersection automaton, and rank was the rank of the true repair in the list of all repair suggestions.
These statistics will also be aggregrated and displayed in a streaming fashion in the terminal and repairs.log file. Next to the individual repairs instances, it will periodically display running statistics that look as follows:
Lev(*): Top-1/rec/pos/total: 1 / 1 / 1 / 1, errors: 0, P@1: 1.0, P@All: 1.0
Lev(3): Top-1/rec/pos/total: 1 / 1 / 1 / 1, errors: 0, P@1: 1.0, P@All: 1.0
Draw timings (ms): {1=0.0, 2=0.0, 3=731.0}
Full timings (ms): {1=0.0, 2=0.0, 3=10513.0}
Avg samples drawn: {1=0.0, 2=0.0, 3=9853.0}
Top-1is the number of repair instanaces where the true repair was sampled and ranked firstrecis the number of repair instances where the true repair was sampled, but not ranked firstposis the number of instances where the true repair could have been sampled, but was nottotalare the total number of repair instances evaluated so far
It will also contain the following data, which a more granular summary of the running average precision across all repair instances, broken down by snippet length and edit distance, where |σ| is the length of the broken code snippet and Δ indicates the Levenshtein distance of the true repair.
Precision@1
===========
|σ|∈[0, 10): Top-1/total: 5 / 26 ≈ 0.19230769230769232
|σ|∈[10, 20): Top-1/total: 9 / 31 ≈ 0.2903225806451613
|σ|∈[20, 30): Top-1/total: 9 / 27 ≈ 0.3333333333333333
|σ|∈[30, 40): Top-1/total: 9 / 26 ≈ 0.34615384615384615
|σ|∈[40, 50): Top-1/total: 3 / 5 ≈ 0.6
Δ(1)= Top-1/total: 17 / 32 ≈ 0.53125
Δ(2)= Top-1/total: 10 / 38 ≈ 0.2631578947368421
Δ(3)= Top-1/total: 8 / 45 ≈ 0.17777777777777778
(|σ|∈[0, 10), Δ=1): Top-1/total: 3 / 10 ≈ 0.3
(|σ|∈[0, 10), Δ=2): Top-1/total: 2 / 7 ≈ 0.2857142857142857
(|σ|∈[0, 10), Δ=3): Top-1/total: 0 / 9 ≈ 0.0
(|σ|∈[10, 20), Δ=1): Top-1/total: 4 / 7 ≈ 0.5714285714285714
(|σ|∈[10, 20), Δ=2): Top-1/total: 2 / 12 ≈ 0.16666666666666666
(|σ|∈[10, 20), Δ=3): Top-1/total: 3 / 12 ≈ 0.25
(|σ|∈[20, 30), Δ=1): Top-1/total: 3 / 3 ≈ 1.0
(|σ|∈[20, 30), Δ=2): Top-1/total: 3 / 10 ≈ 0.3
(|σ|∈[20, 30), Δ=3): Top-1/total: 3 / 14 ≈ 0.21428571428571427
(|σ|∈[30, 40), Δ=1): Top-1/total: 5 / 8 ≈ 0.625
(|σ|∈[30, 40), Δ=2): Top-1/total: 3 / 9 ≈ 0.3333333333333333
(|σ|∈[30, 40), Δ=3): Top-1/total: 1 / 9 ≈ 0.1111111111111111
(|σ|∈[40, 50), Δ=1): Top-1/total: 2 / 4 ≈ 0.5
(|σ|∈[40, 50), Δ=3): Top-1/total: 1 / 1 ≈ 1.0
Running the full set of experiments can take several hours depending on the machine.
Files
Files
(370.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:71955bdec1d1dfc0b3a14d8c6fd0644d
|
370.5 MB | Download |
Additional details
Software
- Repository URL
- http://github.com/tidyparse/tidyparse
- Programming language
- Kotlin
- Development Status
- Active