DafnyGym

DafnyGym is a benchmarking suite for evaluating the ability to generate single assertions.

Our benchmark is also available on HuggingFace.

Set-Up

First build the docker image using:

docker buildx build -t dgym .

Then you can evaluate your predictions using the following command:

cat perfect_predictions.json | docker run -i dgym python3 run_eval.py | python3 analysis_results.py

where perfect_predictions.json is your prediction file.

This script might take a while since it is verifying all the different case of DafnyGym.

Content

Our benchmark contains assertions from these 3 repositories:

Cedar commit 2b9a0cbd
Dafny-VMC commit b79a4c7
Libraries commit ae8708c

Citation

@misc{mugnier2024laurelgeneratingdafnyassertions,
      title={Laurel: Generating Dafny Assertions Using Large Language Models},
      author={Eric Mugnier and Emmanuel Anaya Gonzalez and Ranjit Jhala and Nadia Polikarpova and Yuanyuan Zhou},
      year={2024},
      eprint={2405.16792},
      archivePrefix={arXiv},
      primaryClass={cs.LO},
      url={https://arxiv.org/abs/2405.16792},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assertions_removed		assertions_removed
ground_truth		ground_truth
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
analysis_results.py		analysis_results.py
cedar.csv		cedar.csv
dafnyGym.json		dafnyGym.json
dafny_configs.yaml		dafny_configs.yaml
libraries.csv		libraries.csv
nfold.py		nfold.py
perfect_predictions.json		perfect_predictions.json
run_eval.py		run_eval.py
vmc.csv		vmc.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DafnyGym

Set-Up

Content

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DafnyGym

Set-Up

Content

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages