Skip to content
This repository was archived by the owner on Nov 4, 2025. It is now read-only.

allenai/RegSet

Repository files navigation

RegSet

Datasets

The data directory contains .jsonl files with train/dev/test splits of the Exploration, Hard, and Mixed datasets.

Scripts

This repository also contains a number of scripts for generating data and computing attributes of data instances. The python scripts in the root directory are as follows:

  • exps2.py provides definintions of the regex type as well as functions for enumerating regexs (for use in sampling) and computing properties.
  • dfa2.py provides definitions of the dfa (Deterministic Finite Automaton) type as well methods for computing properties of regular languages and converting from regex to dfa.
  • sample_v2.py generates a cache of regexs and the Exploration set.
  • make_hard_set.py provides a script to generate our Hard dataset from the cached regexs not used in the Exploration training set.
  • compute_properties.py provides additional helper functions for computing attributes.
  • cache_loader.py provides utilities for loading cached regexs generated by sample_v2.py.

Generate the Exploration dataset:

python sample_v2.py \
  -d <output-folder> \
  [--depths <maximum number of compositions>]

Generate the Hard dataset:

python make_hard_set.py 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages