feat: add basic `BitVec` library by leodemoura · Pull Request #286 · leanprover-community/batteries

leodemoura · 2023-10-14T19:16:54Z

This library was started by Joe Hendrix and Wojciech Nawrocki. It is crucial for software and hardware verification projects. We need it in Std. Moreover lean-auto project needs it.

This library was started by Joe Hendrix and Wojciech Nawrocki. It is crucial for software and hardware verification projects. We need it in `Std`. Moreover `lean-auto` project needs it.

leodemoura · 2023-10-14T19:41:00Z

https://github.com/avigad/lean-auto

Std/Data/BitVec/Basic.lean

kim-em · 2023-10-14T23:14:09Z

Std/Data/BitVec/Basic.lean

+Notation for bit vector literals. `i#n` is a shorthand for `BitVec.ofNat n i`.
+TODO: is this a good notation? Do we need it? We can also write `(i : BitVec n)`
+-/
+syntax:max num noWs "#" noWs num : term


Suggested change

syntax:max num noWs "#" noWs num : term

scoped syntax:max num noWs "#" noWs num : term

It seems narrowly useful if it only works for literals. I'd say we don't want it.

It makes proof goals way more readable. Specially the ones coming from software/hardware verification.

I agree with Scott, the only thing this does is add a type ascription so it seems more suited as either #n as a notation for Bitvec n (so (5 : #3) works), or letting downstream projects define the notation themselves as a local notation (it's a one-liner). I don't find the particular syntax very descriptive and would prefer the definition be moved closer to the use for readability.

Happy to change the notation, but it would be great to have something very compact and working "out of the box" for people working on software/hardware verification.

My gut feeling is that writing the bitwidth all the time will get old quick, especially in cases where the bitwidth is mostly fixed (e.g. everything is on 32 bit words). For sure frequently varied bitwidth definitely comes up too, especially in hardware verification, but I don't think it is the majority case, and inventing a new domain specific notation for type ascription seems like it would lead to more confusion and not less, unless this is a very standard notation in the trade (and my impression is that it is not). It might be good to do a survey of what VHDLs and such do for this situation.

Also, I don't see any reason this needs to be restricted to numeric literals, it could just be term:max instead of num.

I think I am slightly in favor of the notation. This is probably orthogonal, but it'd be neat if there was a way to have access to the number of digits in a hex or binary literal (e.g., if we knew 0b01010 was a BitVec, then it would default to having 5 bits). Similar with hex digits having a multiple of the input 0x0010 is a BitVec 16. Cryptol does this, but also is predominantly about bitvector programming.

Std/Data/BitVec/Basic.lean

alexkeizer · 2023-10-15T00:14:23Z

It might be worth mentioning that there is an open Mathlib PR which I believe was based on the same BitVec library as this PR, and has undergone several rounds of review.

digama0 · 2023-10-15T01:49:50Z

It might be better to get this into mathlib first, rather than having to copy things around if it is already under review there. Is there a representative project that is currently using this API? I would feel more comfortable merging a new API like this if I knew it was already meeting the needs of another project, and getting it in mathlib is one way to do that.

leodemoura · 2023-10-15T02:38:11Z

@alexkeizer and @digama0 Bit vectors must be in the Std. They are essential for software and hardware verification, and support for bit vectors in lean-auto is blocked because we don't have bit vectors in Std. lean-auto is a package for invoking SMT solvers from Lean. I need lean-auto for applications.
I am happy to move the Mathlib PR here myself or tolerate the duplication for now. Happy with any of these two options, tell me which one you prefer and I will get it done by tomorrow.

Co-authored-by: Scott Morrison <scott.morrison@gmail.com>

digama0 · 2023-10-15T03:34:10Z

@alexkeizer and @digama0 Bit vectors must be in the Std. They are essential for software and hardware verification, and support for bit vectors in lean-auto is blocked because we don't have bit vectors in Std.

To be clear, I don't contest this and would like us to have a bitvectors API in Std. My concern is about whether the API here is the right one, or whether we need to expose things in a different way or with different definitions (e.g. little endian vs big endian). My main tool for evaluating this is to look at usage of the API in a downstream project. It is relatively easier to merge PRs migrating from mathlib because they have already undergone some API battle testing in order to incorporate them in the library.

How exactly is lean-auto blocked? (Not contesting the claim, asking for details.) Should it not be possible to simply have this module in lean-auto since dependencies of it will be able to use this? This is what aesop and mathlib have done in the past when it comes to merging things upstream: they keep a copy of the module and use that instead pending the merge. This practice also helps to get more data and refine the API.

PratherConid · 2023-10-15T03:57:35Z

Lean-auto is not necessarily "blocked", but importing the current implementation of BitVec from Mathlib makes compilation and initialization unacceptably slow. @digama0 It seemed to me that there would be no problem to have Bitvec in Std immediately. But since it contradicts the method used to evaluate APIs, I'd be happy to just have this module in Lean-auto.

digama0 · 2023-10-15T04:03:17Z

I'm not trying to block this PR, I'm soliciting information on how it is used. Maybe there could be a test file or something, to demonstrate intended usage?

but importing the current implementation of BitVec from Mathlib makes compilation and initialization unacceptably slow

This is actually very valuable information, it indicates that the Mathlib BitVec API has issues (which I have also long suspected). It would be great to have a summary of how this API differs from it, what the problems were and why this implementation works better.

PratherConid · 2023-10-15T04:14:58Z

Lean-auto just imports those Bitvec definitions, and matches their name against constant names in the input so that it can recognize them as Bitvec operations. The actual implementation of Bitvec has no impact on the performance of Lean-auto. What really matters to Lean-auto is the name of the Bitvec operations.

@digama0 I think the slow compilation might not be caused by the specific Bitvec implementation. Generally, whenever I import files from mathlib, the compilation of lean-auto becomes significantly slower, so I've been avoiding importing anything from mathlib, and I have several modules copied from mathlib in Lean-auto (I didn't do the same to Bitvec because I thought it makes sense to have it in std).

leodemoura · 2023-10-15T04:43:23Z

@PratherConid Could you provide a list of all definitions you need in the BitVec library? I am assuming you need all the functions that are available on SMT-Lib, correct? The actual names will not be exactly the same, but we can put the corresponding SMT-Lib names in the doc strings.

@digama0 I am going to be using the BitVec library in actual applications, and we will polish the API and add theorems as we go. @joehendrix will help making sure the library is good for software and hardware verification applications. AFACIT, the Bitvec module in Mathlib has no users. So, this PR and future ones will have no impact on Mathlib.
BTW, SMT-Lib is a great reference for bit vectors since SMT solvers are already used in many hardware and software verification applications.

PratherConid · 2023-10-15T05:02:55Z

@leodemoura Yes, I only need the functions available on smt-lib 2.6 https://smtlib.cs.uiowa.edu/theories-FixedSizeBitVectors.shtml

eric-wieser · 2023-10-15T08:05:39Z

Std/Data/BitVec/Basic.lean

+
+/--
+Extraction of bits `i` down to `j` from a bit vector of size `n` to yield a
+new bitvector of size `i - j + 1`


In mathlib4 we were leaning towards an API that satisfied

Suggested change

new bitvector of size `i - j + 1`

new bitvector of size `i - j`

As this saves some branching in shrink

I imagine this does not align with SMTlib (and am aware it does not align with the convention in VHDL and Verilog), but I imagine it will be much more pleasant to prove things avout, and it aligns with Std.Range.

@eric-wieser Thanks for the feedback. Yes, the goal is to align with SMTlib which is a standard in software and hardware verification. SMT solvers are widely used in industry, and they all use the SMTlib standard.

How about defining extract in terms of extract' using Eric's convention?

extract' is a good idea. Please feel free to add a commit with its implementation.

Std/Data/BitVec/Basic.lean

alexkeizer · 2023-10-15T09:25:12Z

We are also looking to use bitvectors in our project. We started using the current bitvector API, but when the discussed refactor was proposed we stopped until it became clear which representation mathlib would use.

I did not necessarily mean to suggest postponing this PR, I would be happy to have bitvectors in STD ASAP, I merely wanted to point out the existing efforts.

The refactor also got discussed on Zulip (https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/Proposed.20refactor.20of.20Bitvec).

leodemoura · 2023-10-15T14:30:48Z

@alexkeizer Thanks for sharing. The good news is that things will start moving really fast here in Std :) Our plans are

Std should be great for software/hardware verification projects.
Std should be great for software development, and should provide a library with the same capabilities of the Rust std.
Better automation for software/hardware verification projects using Std.

We're lucky to have Joe Hendrix (@joehendrix) on board with us at Lean FRO. He's a pro when it comes to software verification and development. Joe's got our back in hitting our goals and will help us get those PRs merged super fast.

The perf improvement will be in core. See: leanprover/lean4@de719c9

We now have the missing proofs `Nat.mod_le` and `Nat.div_le_self` in core. See: leanprover-community/batteries#286 (comment)

digama0 · 2023-10-16T05:48:04Z

I pushed some changes. Summary:

Added more documentation to all of the operations
General style guide things (indentation etc)
Simplified some of the definitions
Added fill as a version of replicate that just splats a bool across the vector
Changed LT and LE to delegate to the Fin LT/LE (which delegates to the Nat one), instead of using the Bool function which will cause spurious decide (x < y) = true things in proofs using these ops
Clarified things like what zero extension, left/right shift, and other bitwise ops mean in terms of bit positions and also numbers where applicable. I tried to avoid saying things like "the left side" in favor of "the low bits" to avoid ambiguity depending on how the user's mental picture is laid out. (The Rust stdlib also uses this guideline, because RTL languages exist too.)
Renamed shrink to truncate, because shrink is not particularly explanatory regarding how the shrinking is supposed to happen (are we losing high bits or low bits?), while "truncate" is more consistently used for dropping high bits in integer conversion.
shrink and zeroExtend are actually aliases, but they have documentation which focuses on the case in the name.

I did not make any changes to the things under discussion (big endian, the # notation).

Std/Data/BitVec/Basic.lean

eric-wieser · 2023-10-16T07:55:22Z

Std/Data/BitVec/Basic.lean

+  mk ::
+  /-- Interpret a bitvector as a number less than `2^w`.
+  O(1), because we use `Fin` as the internal representation of a bitvector. -/
+  val : Fin (2^w)


Given the multitude of possible representations, would calling mk/val as ofFin / toFin instead be reasonable? That would make them feel on equal footing with ofBits / toBits or similar.

I like that idea.

joehendrix · 2023-10-16T13:11:16Z

@digama0 I personally think the comment changes and cleanups improve readability so thanks.

One interesting thing is that most-significant being on the left is part of our Hindu Arabic numeral system -- even traditionally right-to-left languages have most-significant digits on the left.

I don't want the bit order discussion to block this PR from being merged, so if you haven't been persuaded I am ok with avoiding committing to a bit order in this PR.

digama0 · 2023-10-16T21:40:55Z

As things stand currently, the only place where big endian has to be explicitly acknowledged is in the argument order for append, and actually we might also want the reversed argument order not simply because of endianness but also because it would have the addition in the opposite order, which would make it a bit easier re: dependent typing in some cases (note that there are some uses of Nat.add_comm .. ▸ (v1 ++ v2) in this PR, and probably even more in usage code). Given that we are explicitly acknowledging that these functions are intended to be compatible with SMT-lib and the main functions have cross references to it, I don't mind keeping this argument order since we'd need a SMT-lib compatible version anyway.

In general, dependent typing hell will definitely arise when working with this API when the bitwidths are not numeric literals, so we probably need to do something to help with that (although as usual it is difficult to predict in advance what that might be). One thing is that we probably want a ofEq function for coercing between bitvector types with equal but not defeq bitwidth; mathlib makes a point of defining these for all dependent type families because they compute better than Eq.rec terms.

joehendrix · 2023-10-16T22:49:37Z

I've pushed ofFin/toFin changes and a change to use hex for rendering bitvectors. I don't know if people will always like that, but is is easy to convert to a Nat if one wants a decimal rendering.

digama0 I can add a ofEq. Is the conventional signature something like ofEq : u = v -> BitVec u -> BitVec v?

digama0 · 2023-10-16T22:54:29Z

I've pushed ofFin/toFin changes and a change to use hex for rendering bitvectors. I don't know if people will always like that, but is is easy to convert to a Nat if one wants a decimal rendering.

Actually I was thinking you might want binary rendering. I can't imagine you would ever want decimal, that seems exceedingly unhelpful for understanding a bitvector value. But binary is a bit too verbose and is probably not suitable for more than 4-8 bits. You could have a delaborator option to control this.

alexkeizer · 2023-10-17T09:18:07Z

I've pushed ofFin/toFin changes and a change to use hex for rendering bitvectors. I don't know if people will always like that, but is is easy to convert to a Nat if one wants a decimal rendering.

digama0 I can add a ofEq. Is the conventional signature something like ofEq : u = v -> BitVec u -> BitVec v?

The signature is right, but I think these kind of functions are usually called cast, as in https://leanprover-community.github.io/mathlib4_docs/Std/Data/Fin/Basic.html#Fin.cast

* Introduce signed operations. * Rename unsigned operations to indicate unsigned nature. * Introduce shiftRight and complement integer bitwise operations. * Rename byte-order dependent operations to clarify endianess. * Add examples to documentation * Add basic test cases.

Std/Data/BitVec/Basic.lean

Std/Data/Int/Basic.lean

digama0 · 2023-10-18T01:25:28Z

Std/Data/BitVec/Basic.lean

+  if t.msb then
+    if s.msb then
+      .udiv (.neg s) (.neg t)
+    else
+      .neg (.udiv s (.neg t))
+  else
+    if s.msb then
+      .neg (.udiv (.neg s) t)
+    else
+      .udiv s t


Suggested change

if t.msb then

if s.msb then

.udiv (.neg s) (.neg t)

else

.neg (.udiv s (.neg t))

else

if s.msb then

.neg (.udiv (.neg s) t)

else

.udiv s t

match s.msb, t.msb with

| true, true => .udiv (.neg s) (.neg t)

| false, true => .neg (.udiv s (.neg t))

| true, false => .neg (.udiv (.neg s) t)

| false, false => .udiv s t

ditto for smt_sdiv, srem, smod

Std/Data/BitVec/Basic.lean

Std/Data/BitVec.lean

Co-authored-by: Mario Carneiro <di.gama@gmail.com>

digama0 · 2023-10-18T02:09:47Z

Thanks @alexkeizer , @leodemoura , @joehendrix and everyone else for working on this. I think it looks really good now, and although there is still more to do (toList, split, lemmas, automation) this looks like a good starting point from which to start getting user feedback and iterate further.

We now have the missing proofs `Nat.mod_le` and `Nat.div_le_self` in core. See: leanprover-community/batteries#286 (comment)

This library was started by Joe Hendrix and Wojciech Nawrocki. It is crucial for software and hardware verification projects. We need it in `Std`. Moreover `lean-auto` project needs it. --------- Co-authored-by: Scott Morrison <scott.morrison@gmail.com> Co-authored-by: Mario Carneiro <di.gama@gmail.com> Co-authored-by: Joe Hendrix <joe@lean-fro.org>

feat: add basic BitVec library

803c082

This library was started by Joe Hendrix and Wojciech Nawrocki. It is crucial for software and hardware verification projects. We need it in `Std`. Moreover `lean-auto` project needs it.

leodemoura force-pushed the bitvec branch from b9f01c0 to 803c082 Compare October 14, 2023 19:22

kim-em reviewed Oct 14, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

kim-em reviewed Oct 14, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

kim-em reviewed Oct 14, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

kim-em reviewed Oct 14, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

kim-em reviewed Oct 14, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

kim-em mentioned this pull request Oct 15, 2023

[Merged by Bors] - refactor: replace Bitvec with Std.BitVec leanprover-community/mathlib4#5920

Closed

1 task

chore: update Std/Data/BitVec/Basic.lean

9b9399f

Co-authored-by: Scott Morrison <scott.morrison@gmail.com>

leodemoura force-pushed the bitvec branch from 03c575f to 9b9399f Compare October 15, 2023 02:52

eric-wieser reviewed Oct 15, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 15, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

chore: use Fin.div and Fin.mod

cf2022d

The perf improvement will be in core. See: leanprover/lean4@de719c9

leodemoura added a commit to leanprover/lean4 that referenced this pull request Oct 15, 2023

chore: remove unnecessary % operations at Fin.mod and Fin.div

de719c9

We now have the missing proofs `Nat.mod_le` and `Nat.div_le_self` in core. See: leanprover-community/batteries#286 (comment)

leodemoura mentioned this pull request Oct 15, 2023

chore: remove unnecessary % operations at Fin.mod and Fin.div leanprover/lean4#2688

Merged

chore: rename arguments as suggested by @eric-wieser

fabf97f

eric-wieser reviewed Oct 16, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

eric-wieser reviewed Oct 16, 2023

View reviewed changes

Update Std/Data/BitVec/Basic.lean

ee92901

Joe Hendrix added 2 commits October 16, 2023 14:54

chore: Replace mk/val with ofFin/`toFin in BitVec

ea68dae

chore: Use hex for rendering BitVec

2b5422e

Joe Hendrix added 2 commits October 17, 2023 17:18

Add cast operation

85ac9a0

digama0 reviewed Oct 18, 2023

View reviewed changes

Joe Hendrix added 2 commits October 17, 2023 17:45

chore: Fix inadvertent addition in last commit.

6143102

chore: Fix cast line length.

3b4fdff

digama0 reviewed Oct 18, 2023

View reviewed changes

chore: Various bitvec cleanups

1b19b1b

digama0 reviewed Oct 18, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

digama0 reviewed Oct 18, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

digama0 reviewed Oct 18, 2023

View reviewed changes

Std/Data/BitVec/Basic.lean Outdated Show resolved Hide resolved

chore: More BitVec cleanup

b1cfae6

digama0 reviewed Oct 18, 2023

View reviewed changes

Std/Data/BitVec.lean Outdated Show resolved Hide resolved

joehendrix and others added 2 commits October 17, 2023 18:56

Remove Std/Data/BitVec.lean copyright header.

5cf4322

Co-authored-by: Mario Carneiro <di.gama@gmail.com>

chore: Cleanup BitVec.sdiv definition.

4e3400c

digama0 merged commit a4b2584 into main Oct 18, 2023

digama0 deleted the bitvec branch October 18, 2023 02:10

leodemoura added a commit to leanprover/lean4 that referenced this pull request Oct 21, 2023

chore: remove unnecessary % operations at Fin.mod and Fin.div

9d02e0e

We now have the missing proofs `Nat.mod_le` and `Nat.div_le_self` in core. See: leanprover-community/batteries#286 (comment)

	syntax:max num noWs "#" noWs num : term
	scoped syntax:max num noWs "#" noWs num : term

	new bitvector of size `i - j + 1`
	new bitvector of size `i - j`

Conversation

leodemoura commented Oct 14, 2023

Uh oh!

leodemoura commented Oct 14, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

digama0 Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexkeizer commented Oct 15, 2023

Uh oh!

digama0 commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leodemoura commented Oct 15, 2023

Uh oh!

digama0 commented Oct 15, 2023

Uh oh!

PratherConid commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

digama0 commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PratherConid commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leodemoura commented Oct 15, 2023

Uh oh!

PratherConid commented Oct 15, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexkeizer commented Oct 15, 2023

Uh oh!

leodemoura commented Oct 15, 2023

Uh oh!

digama0 commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joehendrix commented Oct 16, 2023

Uh oh!

digama0 commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joehendrix commented Oct 16, 2023

digama0 Oct 15, 2023 •

edited

Loading

digama0 commented Oct 15, 2023 •

edited

Loading

PratherConid commented Oct 15, 2023 •

edited

Loading

digama0 commented Oct 15, 2023 •

edited

Loading

PratherConid commented Oct 15, 2023 •

edited

Loading

eric-wieser Oct 15, 2023 •

edited

Loading

digama0 commented Oct 16, 2023 •

edited

Loading

digama0 commented Oct 16, 2023 •

edited

Loading

digama0 commented Oct 16, 2023 •

edited

Loading