Feat: Automatically Derive Generators for Algebraic Data Types#35
Feat: Automatically Derive Generators for Algebraic Data Types#35ngernest wants to merge 60 commits intoleanprover-community:mainfrom
Conversation
hargoniX
left a comment
There was a problem hiding this comment.
Hey, first of all thanks so much for doing this! I had contemplated to implement this on multiple occasions in the past but unfortunately never found the time for it. I think this can really do a lot to push plausible adoption forward!
I left a few conceptual comments along the PR and didn't take a look at the implementations in detail yet.
| - In QuickChick, this typeclass is called `Gen`, but `Gen` is already | ||
| a reserved keyword in Plausible, so we call this typeclass `Arbitrary` | ||
| following the Haskell QuickCheck convention). -/ | ||
| class Arbitrary (α : Type) where |
There was a problem hiding this comment.
I don't think it is a good long term design decision to partially replicate Sampleable functionality in a second type class that lives next to it. To me the two sensible choices seem to be:
- Either we just drop
Arbitraryentirely and build a deriving handler forSampleablethat, at least for now, uses theidshrinker function. SampleableExtis refactored to be based on bothArbitraryandShrinkable
can you explain why you went for neither of those?
There was a problem hiding this comment.
Thanks -- both are good suggestions!
For context, this PR is part of a broader project to replicate Rocq QuickChick's functionality for deriving generators that produce data satisfying user-defined inductive predicates (this PR is spun off this repo).
Part of this broader project involves defining a ArbitrarySuchThat typeclass:
/-- Generators of type `α` such that `P : α -> Prop` holds for all generated values.
Note that these generators may fail, which is why they have type `OptionT Gen α`. -/
class ArbitrarySuchThat (α : Type) (P : α → Prop) where
arbitraryST : OptionT Gen αFor simplicity / ease of comparison with QuickChick, I tried to make the derived generators in ArbitrarySuchThat be as close as possible to the generators produced by QuickChick, which resulted in me adopting the same typeclass definitions as QuickChick. (I also wanted to avoid modifying Plausible internals, which is why I adopted to make Arbitrary its own typeclass instead of re-factoring SampleableExt directly.)
This was useful for rapid prototyping, but I agree with you that it's a bad idea to have Arbitrary replicate some of SampleableExt's functionality.
I think the best solution moving forward might be to refactor SampleableExt to be based on both Arbitrary and Shrinkable -- this makes the design more modular and separates the generation/shrinking logic. What do you think?
|
Thanks a bunch for your feedback @hargoniX! |
| - In QuickChick, this typeclass is called `Gen`, but `Gen` is already | ||
| a reserved keyword in Plausible, so we call this typeclass `Arbitrary` | ||
| following the Haskell QuickCheck convention). -/ | ||
| class Arbitrary (α : Type) where |
There was a problem hiding this comment.
What makes this typeclass different from Plausible.Random Id?
There was a problem hiding this comment.
Ah sorry, this is an easy question; Arbitrary is a version of Random with a "size" parameter.
There was a problem hiding this comment.
Yep, you're correct! Having a dedicated typeclass for generation (Arbitrary) allows us to maintain a similar design to Haskell QuickCheck & Rocq QuickChick and makes it easier to refer to the generators for a particular type in the body of derived generators.
There was a problem hiding this comment.
makes it easier to refer to the generators for a particular type in the body of derived generators.
Is Plausible.Random.random really any harder to refer to than Arbitrary.arbitrary? Maybe let's resolve the size argument issue below before continuing.
There was a problem hiding this comment.
Ah yes, you're right -- thanks! Random Id is basically the same as Arbitrary.
(The following explanation is purely for my own understanding / for documentation purposes)
Plausible.Random.random has type RandGT g m α. If we set g = StdGen and m = Id, then unfolding definitions gives us:
RandGT StdGen Id α
≡ RandG StdGen α -- since for any `g`, we have `RandG g := RandGT g Id`
≡ Rand α -- by definition of `Rand`Gen α is defined as ReaderT (ULift Nat) Rand α, so under the hood, Arbitrary.arbitrary, which has type Gen α is using the same Rand α random generator as Plausible.Random.random.
I guess in this case, the main benefit of having an Arbitrary typeclass is that it maintains the same naming convention as Haskell QuickCheck (which one could argue is somewhat tenuous), and that Arbitrary serves as an abbreviation for Random Id, which may make the deriving infrastructure easier to implement (Random Id is a partially-applied typeclass whereas Arbitrary isn't).
There was a problem hiding this comment.
If you want to go that route, I'd suggest abbrev Arbitrary α := Random Id α.
Note that the idea in Random m α is that some types such as Expr are much easier to sample in interesting ways in MetaM. It's totally fair to declare the m a distraction and specialize to Id in this Pr, but I think creating a brand new typeclass is a bad idea.
There was a problem hiding this comment.
Ah that's fair, thanks for the suggestion! I'll defer to @hargoniX on whether to keep Arbitrary since we were discussing in another thread above whether to keep Arbitrary as a new typeclass or to refactor the existing SampleableExt typeclass.
Plausible/Arbitrary.lean
Outdated
| /-- Takes a `Nat` and produces a random generator dependent on the `Nat` parameter | ||
| (which indicates the size of the output) -/ | ||
| arbitrarySized : Nat → Gen α |
There was a problem hiding this comment.
Doesn't Gen already have this size parameter built in?
There was a problem hiding this comment.
Yep, you're correct that Gen already has a size parameter built-in! However, when deriving proofs of soundness + completeness for the derived generators (which we're currently working on in a separate PR), as observed in Generating Good Generators for Inductive Relations (POPL 2018), it is convenient to have a separate typeclass which makes the generator's size parameter explicit (i.e. ArbitrarySized in this PR). Specifically, section 5.1 of the paper discusses how the semantics of Gen (in Rocq QuickChick, but a similar argument can be made for Plausible generators) are defined by the set of values that are generated for a particular size parameter, and how reasoning about monotonicity with respect to generators' size parameter leads to natural specifications for generator combinators.
There was a problem hiding this comment.
But to be clear, this is currently defined as arbitrarySized : Nat -> ULift Nat -> Plausible.Rand α (after unfolding), where both Nats are the size. If you genuinely have two different sizes at play, then the docstring should distinguish them.
Are you sure that our Gen corresponds to the one QuickCh(e|i)ck and not to something else? Right now your Arbitrary is a generator with one size argument, and ArbitrarySized is a generator with two size arguments, which seems like an off-by-one-error.
There was a problem hiding this comment.
Ah you're right, sorry, I missed that Gen contains an ULift Nat argument after unfolding -- thanks for the heads-up! In this case, the Nat parameter in (what is currently called) ArbitrarySized acts more like fuel -- the derived ArbitrarySized generator recurses on this Nat parameter and terminates when the Nat is 0.
I've renamed ArbitrarySized to ArbitraryFueled and updated the docstrings -- hopefully that's ok!
There was a problem hiding this comment.
What does "terminates" mean? Does this mean that if sampling random elements of List Nat, then this will sample both smaller Nats and smaller Lists? I think this might already be the intended semantics of the ULIft Nat argument, and so you might want just arbitrarySized : Gen α
There was a problem hiding this comment.
Ah, by "terminates", I mean that the generator picks one of the values in its base case to return. As an example, here is the generator that is derived for a binary tree datatype:
inductive Tree
| Leaf : Tree
| Node : Nat → Tree → Tree → Tree
-- Derived code below
-- (The code for the derived generator closely follows the generator that Rocq QuickChick derives)
instance : Plausible.ArbitraryFueled Tree where
arbitraryFueled :=
let rec aux_arb (fuel : Nat) : Plausible.Gen Tree :=
match size with
| Nat.zero => Plausible.Gen.oneOfWithDefault (pure Tree.Leaf) [(pure Tree.Leaf)]
| size' + 1 =>
Plausible.Gen.frequency (pure Tree.Leaf)
[(1, (pure Tree.Leaf)),
(fuel' + 1,
(do
let a_0 ← Plausible.Arbitrary.arbitrary
let a_1 ← aux_arb fuel'
let a_2 ← aux_arb fuel'
return Tree.Node a_0 a_1 a_2))]
fun fuel => aux_arb fuelWhen fuel = 0, we just return a Leaf.
For the List Nat example, the generator would just return the empty list when fuel = 0.
kmill
left a comment
There was a problem hiding this comment.
Thanks for your contribution.
I'm not a maintainer for this repository per se, but I have some comments.
Is there any particular reason you're not using some of the logic that you can find in other deriving handlers, e.g. in Deriving.Repr, which supports nested and mutually recursive inductive types, with parameters? The techniques there would make it possible to derive Arbitrary instances for types like List.
Plausible/Arbitrary.lean
Outdated
| - Note: Plausible's `SampleableExt` is analogous to QuickChick's `Arbitrary` typeclass | ||
| (which combines QuickChick's `Gen` and `Shrink` typeclass)-/ | ||
| instance [Repr α] [Shrinkable α] [ArbitrarySized α] : SampleableExt α := | ||
| SampleableExt.mkSelfContained (Gen.sized ArbitrarySized.arbitrarySized) |
There was a problem hiding this comment.
If you start with a SampleableExt, does this instance with the next one produce an equivalent SampleableExt? If not, then there should not be such an instance loop.
There was a problem hiding this comment.
Thanks -- good catch! I removed this instance since it's not strictly necessary (all tests pass without this instance).
Plausible/TSyntaxCombinators.lean
Outdated
| | [] => `(True) | ||
| | [p] => pure p | ||
| | p :: ps => | ||
| List.foldlM (fun acc pred => `($acc && $pred)) (init := p) ps |
There was a problem hiding this comment.
A function that creates an n-ary and from an array of terms seems potentially useful, but a combinator to create an if doElem doesn't.
There was a problem hiding this comment.
Ah that's fair, thanks -- I've removed the TSyntaxCombinators.lean file altogether since the combinators there are of dubious utility for this PR.
Plausible/DeriveArbitrary.lean
Outdated
| let mut doElems := #[] | ||
|
|
||
| -- Determine whether the constructor has any recursive arguments | ||
| let ctorIsRecursive ← isConstructorRecursive targetTypeName ctorName |
There was a problem hiding this comment.
| let ctorIsRecursive ← isConstructorRecursive targetTypeName ctorName | |
| let mut ctorIsRecursive := false |
then set it when you detect it is recursive.
It's better design not to duplicate logic, where it's implemented in slightly different ways each time. The source of truth is this loop.
There was a problem hiding this comment.
Thanks, this is a very good point! Adopted this change.
Plausible/Utils.lean
Outdated
| - Note: this function only considers constructors with arrow types -/ | ||
| def isConstructorRecursive (inductiveName : Name) (ctorName : Name) : MetaM Bool := do | ||
| let ctorInfo ← getConstInfo ctorName | ||
| let ctorType := ctorInfo.type |
There was a problem hiding this comment.
Shouldn't recursive mean that ctorType contains ctorName anywhere?
Option.isSome <| ctorType.find? (·.isConstOf inductiveName)
This would admit some more possibilities than what the deriving handler actually allows however.
There was a problem hiding this comment.
Thanks -- I removed the Utils.lean file altogether since this function is no longer strictly necessary. (This function was used in the parent project to analyze constructors of inductive Props when deriving generators of values satisfying the Prop, which is outside the scope of this PR.)
…licitly parenthesized)
Co-authored-by: Kyle Miller <kmill31415@gmail.com>
|
@kmill Thanks for your feedback! Re: why this PR doesn't use some of the existing logic in deriving handlers, this PR is part of a broader project (repo) to derive generators of random values that satisfy some user-specified inductive proposition (i.e. some |
…mplicit Arbitrary binders
WIP: Handle parameterized + mutually recursive types when deriving `Arbitrary` instances
|
I'm happy to say that the deriving To support this, I re-implemented the handler following the idioms for the Other high-level changes:
I think the main outstanding question, as pointed out by @codyroux, is whether |
hargoniX
left a comment
There was a problem hiding this comment.
Regarding Arbitrary vs SampleableExt, I do not really have stakes in how this is resolved, the two important points to me would be:
First I don't think it is reasonable UX to have the user write:
instance [Repr α] [Shrinkable α] [Arbitrary α] : SampleableExt (MyList α) :=
SampleableExt.mkSelfContained Arbitrary.arbitraryafter every deriving Arbitrary. Whatever we derive should allow them to run plausible with the type right away without further complications.
As for how the Arbitrary vs other typeclasses is resolved, as we have established in one of the threads Arbitrary is really Random Id, this, to me, indicates very clearly that having Arbitrary is a bit of a weird design choice. I understand, not wanting to write a full SampleableExt deriving handler though.
What about the following approach: We split up SampleableExt into its sampling and its shrinking component as two type classes (the shrinking already being split out as Shrinkable). We then define a default_instance for Shrinkable that is just fun _ => [], this allows users to still handle Shrinkable on their own if they so desire. Then we finally define an instance that combines a sampling and a Shrinkable instance to a SampleableExt instance which then gets fed into the rest of the framework.
As for the design of the sampling component a few points that seem sensible to me though I haven't read all of the QuickChick literature:
- I think it should maintain the ability to sample through a proxy type (for things such as functions), the deriving handling for the sampling component can make that part just be
id. If there are other ways to sample functions I'd also be open to that. - Please do try to keep it simple, I think the double parameter with
Genand the fuel that we have inArbitraryFueledwould be very confusing for users getting into this framework, ideally there should only be one parameter (as you mention in one of the doc-strings QuickChick already does this so this seems doable?)
Note that plausible currently has a basically non existing userbase for its internal implementation details so you can basically do whatever refactoring you want at this point in time. Given that breaking SampleableExt up can be done entirely separately from building a deriving handler for its sampling component this can probably be done in a separate PR to ease the review as well.
| @@ -0,0 +1,81 @@ | |||
| /- | |||
| Copyright (c) 2025 Ernest Ng. All rights reserved. | |||
There was a problem hiding this comment.
On a more general note, are you sure it is you that has copyright here and not AWS, given that you are being paid?
There was a problem hiding this comment.
I previously asked @codyroux and he said to put my name here (we've received permission to open-source this code). I'll double check with him!
Ok. After doing some soul searching and repo diving, here's my proposal for the medium term.
As a side note, the default I'd like to be able to fit this all into a monad, to be able to e.g. lift failure but I think that'd be more work. Any thoughts about this? |
I guess this probably raises that we are missing |
Overview
This PR introduces a Deriving Handler for deriving type-based generators for algebraic data types, similar to generic-random in Haskell and ppx_quickcheck in OCaml.
To facilitate this, this PR also introduces an
Arbitrarytypeclass, akin to Haskell QuickCheck and Rocq QuickChick:Note that Plausible's current
Gentype constructor is not a typeclass, which is why we need to introduce a separateArbitraryclass. We have typeclass infrastructure set up so that every type which has an instance of Plausible's existingSampleableExtclass automatically gets an instance ofArbitrary.(Note:
SampleableExtdescribes types which have both a generator & a shrinker, whereasArbitrarydescribes types which have a generator only.)Users can write
deriving Arbitraryafter an inductive type definition, e.g.This derives a generator for random inhabitants of
Tree.We also support deriving
Arbitraryinstances forstructures with named fields:Instead of writing
deriving Arbitrary, users can also writederiving instance Arbitrary for T1, ..., Tnas a top-level command to deriveArbitraryinstances for typesT1, ..., Tnsimultaneously.For debugging purposes, we also provide a command elaborator for the
#derive_arbitrarycommand:-- `#derive_arbitrary` derives an instance of `Arbitrary` for `Tree` #derive_arbitrary TreeRunning this command prints the code for the derived generator to
stdout. (Note that the definition of the derived generator is already registered at this point.)To sample from a derived generator, users can simply call
Arbitrary.runArbitrary, specify the typefor the desired generated values and provide some
Natto act as the generator's size parameter (10in the example below):Tests
Test/DeriveArbitrarysubdirectory for snapshot tests, in which we use the#guard_msgscommand to check that the code for the derived generator coheres with the docstrings in the test filelake testto run all snapshot tests (this also ensures that the code for the example derived generators type check)Linter
This PR also adds a linter to the repo -- we use the default linter from Batteries , as described in the Lean wiki.
Overview of files (listed in suggested review order)
Arbitrary.lean: TheArbitrary&ArbitraryFueledtypeclasses for unconstrained generators, adapted from QuickChickDeriveArbitrary.lean: Deriver for unconstrained generators (instances of theArbitrary/ArbitrarySizedtypeclasses)Test/DeriveArbitrary: snapshot tests containing derived generators for example inductive types