RegSet

Datasets

The data directory contains .jsonl files with train/dev/test splits of the Exploration, Hard, and Mixed datasets.

Scripts

This repository also contains a number of scripts for generating data and computing attributes of data instances. The python scripts in the root directory are as follows:

exps2.py provides definintions of the regex type as well as functions for enumerating regexs (for use in sampling) and computing properties.
dfa2.py provides definitions of the dfa (Deterministic Finite Automaton) type as well methods for computing properties of regular languages and converting from regex to dfa.
sample_v2.py generates a cache of regexs and the Exploration set.
make_hard_set.py provides a script to generate our Hard dataset from the cached regexs not used in the Exploration training set.
compute_properties.py provides additional helper functions for computing attributes.
cache_loader.py provides utilities for loading cached regexs generated by sample_v2.py.

Generate the Exploration dataset:

python sample_v2.py \
  -d <output-folder> \
  [--depths <maximum number of compositions>]

Generate the Hard dataset:

python make_hard_set.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RegSet

Datasets

Scripts

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
README.md		README.md
cache_loader.py		cache_loader.py
compute_properties.py		compute_properties.py
dfa2.py		dfa2.py
exps2.py		exps2.py
is_starfree.json		is_starfree.json
make_hard_set.py		make_hard_set.py
sample_v2.py		sample_v2.py

allenai/RegSet

Folders and files

Latest commit

History

Repository files navigation

RegSet

Datasets

Scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages