Skip to content

feat: add basic BitVec library#286

Merged
digama0 merged 25 commits intomainfrom
bitvec
Oct 18, 2023
Merged

feat: add basic BitVec library#286
digama0 merged 25 commits intomainfrom
bitvec

Conversation

@leodemoura
Copy link
Copy Markdown
Contributor

This library was started by Joe Hendrix and Wojciech Nawrocki. It is crucial for software and hardware verification projects. We need it in Std. Moreover lean-auto project needs it.

This library was started by Joe Hendrix and Wojciech Nawrocki. It is
crucial for software and hardware verification projects. We need it in
`Std`. Moreover `lean-auto` project needs it.
@leodemoura
Copy link
Copy Markdown
Contributor Author

Notation for bit vector literals. `i#n` is a shorthand for `BitVec.ofNat n i`.
TODO: is this a good notation? Do we need it? We can also write `(i : BitVec n)`
-/
syntax:max num noWs "#" noWs num : term
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syntax:max num noWs "#" noWs num : term
scoped syntax:max num noWs "#" noWs num : term

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems narrowly useful if it only works for literals. I'd say we don't want it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes proof goals way more readable. Specially the ones coming from software/hardware verification.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Scott, the only thing this does is add a type ascription so it seems more suited as either #n as a notation for Bitvec n (so (5 : #3) works), or letting downstream projects define the notation themselves as a local notation (it's a one-liner). I don't find the particular syntax very descriptive and would prefer the definition be moved closer to the use for readability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the notation, but it would be great to have something very compact and working "out of the box" for people working on software/hardware verification.

Copy link
Copy Markdown
Member

@digama0 digama0 Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that writing the bitwidth all the time will get old quick, especially in cases where the bitwidth is mostly fixed (e.g. everything is on 32 bit words). For sure frequently varied bitwidth definitely comes up too, especially in hardware verification, but I don't think it is the majority case, and inventing a new domain specific notation for type ascription seems like it would lead to more confusion and not less, unless this is a very standard notation in the trade (and my impression is that it is not). It might be good to do a survey of what VHDLs and such do for this situation.

Also, I don't see any reason this needs to be restricted to numeric literals, it could just be term:max instead of num.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am slightly in favor of the notation. This is probably orthogonal, but it'd be neat if there was a way to have access to the number of digits in a hex or binary literal (e.g., if we knew 0b01010 was a BitVec, then it would default to having 5 bits). Similar with hex digits having a multiple of the input 0x0010 is a BitVec 16. Cryptol does this, but also is predominantly about bitvector programming.

@alexkeizer
Copy link
Copy Markdown
Contributor

It might be worth mentioning that there is an open Mathlib PR which I believe was based on the same BitVec library as this PR, and has undergone several rounds of review.

@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 15, 2023

It might be better to get this into mathlib first, rather than having to copy things around if it is already under review there. Is there a representative project that is currently using this API? I would feel more comfortable merging a new API like this if I knew it was already meeting the needs of another project, and getting it in mathlib is one way to do that.

@leodemoura
Copy link
Copy Markdown
Contributor Author

@alexkeizer and @digama0 Bit vectors must be in the Std. They are essential for software and hardware verification, and support for bit vectors in lean-auto is blocked because we don't have bit vectors in Std. lean-auto is a package for invoking SMT solvers from Lean. I need lean-auto for applications.
I am happy to move the Mathlib PR here myself or tolerate the duplication for now. Happy with any of these two options, tell me which one you prefer and I will get it done by tomorrow.

Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 15, 2023

@alexkeizer and @digama0 Bit vectors must be in the Std. They are essential for software and hardware verification, and support for bit vectors in lean-auto is blocked because we don't have bit vectors in Std.

To be clear, I don't contest this and would like us to have a bitvectors API in Std. My concern is about whether the API here is the right one, or whether we need to expose things in a different way or with different definitions (e.g. little endian vs big endian). My main tool for evaluating this is to look at usage of the API in a downstream project. It is relatively easier to merge PRs migrating from mathlib because they have already undergone some API battle testing in order to incorporate them in the library.

How exactly is lean-auto blocked? (Not contesting the claim, asking for details.) Should it not be possible to simply have this module in lean-auto since dependencies of it will be able to use this? This is what aesop and mathlib have done in the past when it comes to merging things upstream: they keep a copy of the module and use that instead pending the merge. This practice also helps to get more data and refine the API.

@PratherConid
Copy link
Copy Markdown

PratherConid commented Oct 15, 2023

Lean-auto is not necessarily "blocked", but importing the current implementation of BitVec from Mathlib makes compilation and initialization unacceptably slow. @digama0 It seemed to me that there would be no problem to have Bitvec in Std immediately. But since it contradicts the method used to evaluate APIs, I'd be happy to just have this module in Lean-auto.

@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 15, 2023

I'm not trying to block this PR, I'm soliciting information on how it is used. Maybe there could be a test file or something, to demonstrate intended usage?

but importing the current implementation of BitVec from Mathlib makes compilation and initialization unacceptably slow

This is actually very valuable information, it indicates that the Mathlib BitVec API has issues (which I have also long suspected). It would be great to have a summary of how this API differs from it, what the problems were and why this implementation works better.

@PratherConid
Copy link
Copy Markdown

PratherConid commented Oct 15, 2023

Lean-auto just imports those Bitvec definitions, and matches their name against constant names in the input so that it can recognize them as Bitvec operations. The actual implementation of Bitvec has no impact on the performance of Lean-auto. What really matters to Lean-auto is the name of the Bitvec operations.

@digama0 I think the slow compilation might not be caused by the specific Bitvec implementation. Generally, whenever I import files from mathlib, the compilation of lean-auto becomes significantly slower, so I've been avoiding importing anything from mathlib, and I have several modules copied from mathlib in Lean-auto (I didn't do the same to Bitvec because I thought it makes sense to have it in std).

@leodemoura
Copy link
Copy Markdown
Contributor Author

@PratherConid Could you provide a list of all definitions you need in the BitVec library? I am assuming you need all the functions that are available on SMT-Lib, correct? The actual names will not be exactly the same, but we can put the corresponding SMT-Lib names in the doc strings.

@digama0 I am going to be using the BitVec library in actual applications, and we will polish the API and add theorems as we go. @joehendrix will help making sure the library is good for software and hardware verification applications. AFACIT, the Bitvec module in Mathlib has no users. So, this PR and future ones will have no impact on Mathlib.
BTW, SMT-Lib is a great reference for bit vectors since SMT solvers are already used in many hardware and software verification applications.

@PratherConid
Copy link
Copy Markdown

@leodemoura Yes, I only need the functions available on smt-lib 2.6 https://smtlib.cs.uiowa.edu/theories-FixedSizeBitVectors.shtml


/--
Extraction of bits `i` down to `j` from a bit vector of size `n` to yield a
new bitvector of size `i - j + 1`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In mathlib4 we were leaning towards an API that satisfied

Suggested change
new bitvector of size `i - j + 1`
new bitvector of size `i - j`

As this saves some branching in shrink

Copy link
Copy Markdown
Member

@eric-wieser eric-wieser Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this does not align with SMTlib (and am aware it does not align with the convention in VHDL and Verilog), but I imagine it will be much more pleasant to prove things avout, and it aligns with Std.Range.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser Thanks for the feedback. Yes, the goal is to align with SMTlib which is a standard in software and hardware verification. SMT solvers are widely used in industry, and they all use the SMTlib standard.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about defining extract in terms of extract' using Eric's convention?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract' is a good idea. Please feel free to add a commit with its implementation.

@alexkeizer
Copy link
Copy Markdown
Contributor

We are also looking to use bitvectors in our project. We started using the current bitvector API, but when the discussed refactor was proposed we stopped until it became clear which representation mathlib would use.

I did not necessarily mean to suggest postponing this PR, I would be happy to have bitvectors in STD ASAP, I merely wanted to point out the existing efforts.

The refactor also got discussed on Zulip (https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/Proposed.20refactor.20of.20Bitvec).

@leodemoura
Copy link
Copy Markdown
Contributor Author

@alexkeizer Thanks for sharing. The good news is that things will start moving really fast here in Std :) Our plans are

  • Std should be great for software/hardware verification projects.
  • Std should be great for software development, and should provide a library with the same capabilities of the Rust std.
  • Better automation for software/hardware verification projects using Std.

We're lucky to have Joe Hendrix (@joehendrix) on board with us at Lean FRO. He's a pro when it comes to software verification and development. Joe's got our back in hitting our goals and will help us get those PRs merged super fast.

The perf improvement will be in core. See:
leanprover/lean4@de719c9
leodemoura added a commit to leanprover/lean4 that referenced this pull request Oct 15, 2023
We now have the missing proofs `Nat.mod_le` and `Nat.div_le_self` in
core.
See:
leanprover-community/batteries#286 (comment)
@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 16, 2023

I pushed some changes. Summary:

  • Added more documentation to all of the operations
  • General style guide things (indentation etc)
  • Simplified some of the definitions
  • Added fill as a version of replicate that just splats a bool across the vector
  • Changed LT and LE to delegate to the Fin LT/LE (which delegates to the Nat one), instead of using the Bool function which will cause spurious decide (x < y) = true things in proofs using these ops
  • Clarified things like what zero extension, left/right shift, and other bitwise ops mean in terms of bit positions and also numbers where applicable. I tried to avoid saying things like "the left side" in favor of "the low bits" to avoid ambiguity depending on how the user's mental picture is laid out. (The Rust stdlib also uses this guideline, because RTL languages exist too.)
  • Renamed shrink to truncate, because shrink is not particularly explanatory regarding how the shrinking is supposed to happen (are we losing high bits or low bits?), while "truncate" is more consistently used for dropping high bits in integer conversion.
  • shrink and zeroExtend are actually aliases, but they have documentation which focuses on the case in the name.

I did not make any changes to the things under discussion (big endian, the # notation).

mk ::
/-- Interpret a bitvector as a number less than `2^w`.
O(1), because we use `Fin` as the internal representation of a bitvector. -/
val : Fin (2^w)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the multitude of possible representations, would calling mk/val as ofFin / toFin instead be reasonable? That would make them feel on equal footing with ofBits / toBits or similar.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea.

@joehendrix
Copy link
Copy Markdown
Contributor

@digama0 I personally think the comment changes and cleanups improve readability so thanks.

One interesting thing is that most-significant being on the left is part of our Hindu Arabic numeral system -- even traditionally right-to-left languages have most-significant digits on the left.

I don't want the bit order discussion to block this PR from being merged, so if you haven't been persuaded I am ok with avoiding committing to a bit order in this PR.

@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 16, 2023

As things stand currently, the only place where big endian has to be explicitly acknowledged is in the argument order for append, and actually we might also want the reversed argument order not simply because of endianness but also because it would have the addition in the opposite order, which would make it a bit easier re: dependent typing in some cases (note that there are some uses of Nat.add_comm .. ▸ (v1 ++ v2) in this PR, and probably even more in usage code). Given that we are explicitly acknowledging that these functions are intended to be compatible with SMT-lib and the main functions have cross references to it, I don't mind keeping this argument order since we'd need a SMT-lib compatible version anyway.

In general, dependent typing hell will definitely arise when working with this API when the bitwidths are not numeric literals, so we probably need to do something to help with that (although as usual it is difficult to predict in advance what that might be). One thing is that we probably want a ofEq function for coercing between bitvector types with equal but not defeq bitwidth; mathlib makes a point of defining these for all dependent type families because they compute better than Eq.rec terms.

@joehendrix
Copy link
Copy Markdown
Contributor

I've pushed ofFin/toFin changes and a change to use hex for rendering bitvectors. I don't know if people will always like that, but is is easy to convert to a Nat if one wants a decimal rendering.

digama0 I can add a ofEq. Is the conventional signature something like ofEq : u = v -> BitVec u -> BitVec v?

@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 16, 2023

I've pushed ofFin/toFin changes and a change to use hex for rendering bitvectors. I don't know if people will always like that, but is is easy to convert to a Nat if one wants a decimal rendering.

Actually I was thinking you might want binary rendering. I can't imagine you would ever want decimal, that seems exceedingly unhelpful for understanding a bitvector value. But binary is a bit too verbose and is probably not suitable for more than 4-8 bits. You could have a delaborator option to control this.

@alexkeizer
Copy link
Copy Markdown
Contributor

I've pushed ofFin/toFin changes and a change to use hex for rendering bitvectors. I don't know if people will always like that, but is is easy to convert to a Nat if one wants a decimal rendering.

digama0 I can add a ofEq. Is the conventional signature something like ofEq : u = v -> BitVec u -> BitVec v?

The signature is right, but I think these kind of functions are usually called cast, as in https://leanprover-community.github.io/mathlib4_docs/Std/Data/Fin/Basic.html#Fin.cast

Joe Hendrix added 2 commits October 17, 2023 17:18
* Introduce signed operations.
* Rename unsigned operations to indicate unsigned nature.
* Introduce shiftRight and complement integer bitwise operations.
* Rename byte-order dependent operations to clarify endianess.
* Add examples to documentation
* Add basic test cases.
Comment on lines +174 to +183
if t.msb then
if s.msb then
.udiv (.neg s) (.neg t)
else
.neg (.udiv s (.neg t))
else
if s.msb then
.neg (.udiv (.neg s) t)
else
.udiv s t
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if t.msb then
if s.msb then
.udiv (.neg s) (.neg t)
else
.neg (.udiv s (.neg t))
else
if s.msb then
.neg (.udiv (.neg s) t)
else
.udiv s t
match s.msb, t.msb with
| true, true => .udiv (.neg s) (.neg t)
| false, true => .neg (.udiv s (.neg t))
| true, false => .neg (.udiv (.neg s) t)
| false, false => .udiv s t

ditto for smt_sdiv, srem, smod

@digama0
Copy link
Copy Markdown
Member

digama0 commented Oct 18, 2023

Thanks @alexkeizer , @leodemoura , @joehendrix and everyone else for working on this. I think it looks really good now, and although there is still more to do (toList, split, lemmas, automation) this looks like a good starting point from which to start getting user feedback and iterate further.

@digama0 digama0 merged commit a4b2584 into main Oct 18, 2023
@digama0 digama0 deleted the bitvec branch October 18, 2023 02:10
leodemoura added a commit to leanprover/lean4 that referenced this pull request Oct 21, 2023
We now have the missing proofs `Nat.mod_le` and `Nat.div_le_self` in
core.
See:
leanprover-community/batteries#286 (comment)
thorimur pushed a commit to thorimur/batteries that referenced this pull request Oct 24, 2023
This library was started by Joe Hendrix and Wojciech Nawrocki. It is
crucial for software and hardware verification projects. We need it in
`Std`. Moreover `lean-auto` project needs it.

---------

Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
Co-authored-by: Mario Carneiro <di.gama@gmail.com>
Co-authored-by: Joe Hendrix <joe@lean-fro.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants