[WIP / RFC] Cranelift: Basic support for EGraph roundtripping. #4249

cfallin · 2022-06-09T00:23:56Z

This is a work-in-progress, and meant to sketch the direction I've been
thinking in for a mid-end framework. A proper BA RFC will come soon.

This PR builds a phase in the optimization pipeline that converts a CLIF
CFG into an egraph representing the function body. Each node represents
an original instruction or operator. The "skeleton" of side-effecting
instructions is retained, but non-side-effecting (pure) operators are
allowed to "float": the egraph will naturally deduplicate them during
build, and we will determine their proper place when we convert back to
a CFG representation.

The conversion from the egraph back to the CFG is done via a new
algorithm I call "scoped elaboration". The basic idea is to do a
preorder traversal of the domtree, and at each level, evaluate the
values of the eclasses called upon by the side-effect skeleton,
memoizing in an eclass-to-SSA-value map. This map is a scoped hashmap,
with scopes at each domtree level. In this way, (i) when a value is
computed in a location that dominates another instance of that value,
the first replacees the second; but (ii) we never produce "partially
dead" computations, i.e. we never hoist to a level in the domtree where
a node is not "anticipated" (always eventually computed). This exactly
matches what GVN does today. With a small tweak, it can also subsume
LICM: we need to be loop-nest-aware in our recursive eclass elaboration,
and potentially place nodes higher up the domtree (and higher up in the
scoped hashmap).

Unlike what I had been thinking in Monday's meeting, this produces CLIF
out of the egraph and then allows that to be lowered. It's overall
simpler and a better starting point (thanks @abrown for tipping me over
the edge in this). The way it produces CLIF now could be made more
efficient: it could reuse instructions already in the DFG for nodes that
are not duplicated (likely most of them) rather than clearing all and
repopulating.

This PR does not do anything to actually rewrite in the egraph. That's
the next step! I need to work out exactly how to integrate ISLE with
some sort of rewrite machinery. I have some ideas about efficient
dispatch with an "operand-tree discriminants shape analysis" on the
egraph and indexing rules by their matched shape; more to come.

cc @fitzgen @jameysharp @abrown @avanhatt @mlfbrown

@abrown

This is a work-in-progress, and meant to sketch the direction I've been thinking in for a mid-end framework. A proper BA RFC will come soon. This PR builds a phase in the optimization pipeline that converts a CLIF CFG into an egraph representing the function body. Each node represents an original instruction or operator. The "skeleton" of side-effecting instructions is retained, but non-side-effecting (pure) operators are allowed to "float": the egraph will naturally deduplicate them during build, and we will determine their proper place when we convert back to a CFG representation. The conversion from the egraph back to the CFG is done via a new algorithm I call "scoped elaboration". The basic idea is to do a preorder traversal of the domtree, and at each level, evaluate the values of the eclasses called upon by the side-effect skeleton, memoizing in an eclass-to-SSA-value map. This map is a scoped hashmap, with scopes at each domtree level. In this way, (i) when a value is computed in a location that dominates another instance of that value, the first replacees the second; but (ii) we never produce "partially dead" computations, i.e. we never hoist to a level in the domtree where a node is not "anticipated" (always eventually computed). This exactly matches what GVN does today. With a small tweak, it can also subsume LICM: we need to be loop-nest-aware in our recursive eclass elaboration, and potentially place nodes higher up the domtree (and higher up in the scoped hashmap). Unlike what I had been thinking in Monday's meeting, this produces CLIF out of the egraph and then allows that to be lowered. It's overall simpler and a better starting point (thanks @abrown for tipping me over the edge in this). The way it produces CLIF now could be made more efficient: it could reuse instructions already in the DFG for nodes that are *not* duplicated (likely most of them) rather than clearing all and repopulating. This PR does *not* do anything to actually rewrite in the egraph. That's the next step! I need to work out exactly how to integrate ISLE with some sort of rewrite machinery. I have some ideas about efficient dispatch with an "operand-tree discriminants shape analysis" on the egraph and indexing rules by their matched shape; more to come.

github-actions · 2022-06-09T04:36:39Z

Subscribe to Label Action

cc @peterhuene

Details

This issue or pull request has been labeled: "cranelift", "cranelift:meta", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

…tests now.

fitzgen

Super excited about this and very pleasantly surprised by how small the changes here ended up being (of course, no rules yet, but nonetheless!)

A few nitpicks below.

fitzgen · 2022-06-09T19:25:05Z

cranelift/codegen/src/egg.rs

+    /// subgraph of the egraph that we use for codegen.
+    ///
+    /// To avoid cycles, we do a cycle-finding DFS as part of
+    ///  extraction that disqualifies enodes (removes them from


Suggested change

/// extraction that disqualifies enodes (removes them from

/// extraction that disqualifies enodes (removes them from

fitzgen · 2022-06-09T21:14:13Z

cranelift/codegen/src/egg/elaborate.rs

@@ -0,0 +1,183 @@
+//! Elaboration phase: lowers EGraph back to sequences of operations
+//! in CFG nodes.


I think this would really benefit from an overview of the scoped elaboration algorithm up here at the top.

fitzgen · 2022-06-09T21:19:00Z

cranelift/codegen/src/egg/elaborate.rs

+        }
+
+        for child in domtree.children(block) {
+            self.elaborate_block(child, block_params_fn, block_roots_fn, domtree);


I think this should probably be rewritten to use iteration + an explicit work stack, instead of stack recursion. Just to be more robust in the face of malicious input. I don't think we have any stack recursion anywhere else in Cranelift, afaik.

Yes for sure, this is the plan!

Incidentally we'll probably want to audit egg itself for this property too...

I know some key parts of egg are structured as a workqueue but I haven't audited it to check if that's true everywhere. In fact there are APIs (such as the Pattern and CostFunction traits) that as I recall have default implementations that are recursive, but can be implemented iteratively if the client code puts in the work.

fitzgen · 2022-06-09T21:21:28Z

cranelift/codegen/src/egg/elaborate.rs

+        if matches!(node, Node::Param { .. }) {
+            unreachable!("Param nodes should already be inserted");
+        }


This is kinda funky, would be better IMO as

Suggested change

if matches!(node, Node::Param { .. }) {

unreachable!("Param nodes should already be inserted");

}

assert!(

!matches!(node, Node::Param { .. }),

"Param nodes should already be inserted",

);

which is a little funky because of !macro!() but also there could be a Node::is_param method to clean that up.

fitzgen · 2022-06-09T21:24:48Z

cranelift/codegen/src/egg/elaborate.rs

+        // Is the node a result projection? If so, at this point we
+        // have everything we need; no need to allocate a new Value
+        // for the result.
+        if let Node::Result { value, result, .. } = node {


Nitpick-y for sure, but I'm not a huge fan of the name Result for this type of node. I think Projection would be an improvement and Pick or something along those lines might be even better.

fitzgen · 2022-06-09T21:38:11Z

cranelift/codegen/src/egg/extract.rs

+
+        let mut best_cost_and_node = None;
+        for (i, node) in egraph[id].nodes.iter().enumerate() {
+            let this_cost = self.visit_enode(egraph, node);


Similarly, the mutual recursion between visit_enode and visit_eclass should be rewritten with iteration and an explicit stack before we consider merging this.

fitzgen · 2022-06-09T21:41:14Z

cranelift/codegen/src/egg/extract.rs

@@ -0,0 +1,90 @@
+//! Extraction phase: pick one enode per eclass, avoiding loops.


This would also benefit from a description of the algorithm here.

fitzgen · 2022-06-09T21:53:34Z

cranelift/codegen/src/scoped_hash_map.rs

+    }
+
+    /// Insert a key-value pair if absent, panicking otherwise.
+    pub fn insert_if_absent(&mut self, key: K, value: V) {


Ahhhh okay, so this panics if it is already present. That's a bit confusing, since given the method name I would assume that this silently returns if the entry already exists.

Maybe call this insert_and_assert_absent or even just insert_absent?

fitzgen · 2022-06-09T21:55:12Z

cranelift/filetests/filetests/egraph/basic-gvn.clif

+test optimize
+set opt_level=none
+set use_egraphs=true


Probably worth having test egraphs in the long run, since this combination of settings is a bit of a magical incantation.

fitzgen · 2022-06-09T21:57:10Z

cranelift/filetests/filetests/egraph/basic-gvn.clif

+
+; check: block0(v0: i32, v1: i32):
+; nextln:     v2 = iadd v0, v1
+; check:  block1:


This seems to have had the redundant phi elimination pass run on it? Because the egraph isn't doing that yet, right? Another reason to have test egraph so that we aren't pulling in incidental/unrelated changes to these tests.

Yes, I think so. I think we'll actually want to do something equivalent to redundant-phis as a rewrite rule eventually (you're probably thinking along these lines too :-) ); I need to think through how exactly to do this best without introducing cycles in the egraph from the blockparam nodes (or maybe we do, and just ignore during elaboration).

bjorn3 · 2022-06-10T09:00:06Z

How fast is this code? How much would the increase in compile time for this be?

cfallin · 2022-06-10T16:40:21Z

How fast is this code? How much would the increase in compile time for this be?

I haven't measured yet but I will soon. There are some optimizations I want to do first (e.g., make the egraph-to-CLIF lowering work without rebuilding the whole function body, reusing existing Insts where no node duplication occurs). The path that runs without egraph-based opts will always exist, so debug builds shouldn't be affected; and my hope is that by subsuming GVN, LICM, simple_preopt, and removing the need to run them multiple times, we might not be too much worse or maybe even close to parity when sticking to a "very basic opts" level. That said the goal here will also be to allow more configurability in that tradeoff: eventually, it would be great to have a larger body of rewrite rules, and some framework that lets us configure "fuel" for applying those rules to some configurable limit. (More on this in the RFC for sure, when it comes!)

This moves to a strategy based on `hashbrown::raw::RawTable` and "eq-with-context" / "hash-with-context" traits, used to allow nodes to be stored once in a `BumpVec` (which encapsulates a range in a shared `Vec`) and then used in an eclass, with keys in the deduplication hashtable referring to an eclass and enode index in that eclass. This moves back to the enodes-without-IDs strategy used in `egg`, which makes the graph rebuild simpler, but retains the single-storage / no-cloning property. Along the way, carrying through the eq-with-context / hash-with-context to the `Node` itself, we can remove the `&mut [Id]` and `&mut [Type]` slices and replace with `BumpVec`s. Actually they could be made `NonGrowingBumpVec`s for 8 bytes instead of 12; that is future work. This should allow equivalent algorithmic approaches to `egg` but without the separate allocations; all allocations for nodes are now within the entity-component-system-style large `Vec`s.

…stimated node count

…Slice` instead). This brings the overhead of `wasmtime compile spidermonkey.wasm` with `use_egraphs=true` and `opt_level=none` to 4.3% slower than `opt_level=speed`. (This is an interesting comparison because the egraph build/extract roundtrip subsumes GVN, the main time-sink in today's opt pass.)

…parate ISLE environment.

Lots of TODOs still. The main one is how to deal with *multiple* rewrites arising from a single rule invocation on an eclass. I think what needs to happen is that we have multi-ctors as well as multi-etors; these return a `SmallVec<[T; 8]>` or something of the sort. So then the `simplify` toplevel returns a list of eclass IDs it has built that are equivalent to the original. This means that we then need to have two different ABIs for multi-terms: "eager" (internal multi-ctor) and "lazy" (external multi-etor). But this is workable I think. The egraph implementation needs much more stress-testing as well!

github-actions · 2022-07-23T00:33:03Z

Subscribe to Label Action

cc @cfallin, @fitzgen

Details

This issue or pull request has been labeled: "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

This is necessary to allow the `simplify` rule in the mid-end to return multiple Ids of new e-classes that are equivalent to the original. In general, multiplicity is a property of an extractor or constructor (namely, a single match returns multiple results/tuples), not just an external extractor.

…ops via generation numbers

…ive; working on improving amode lowering.

The ISLE language's lexer previously used a very primitive `i64::from_str_radix` call to parse integer constants, allowing values in the range -2^63..2^63 only. Also, underscores to separate digits (as is allwoed in Rust) were not supported. Finally, 128-bit constants were not supported at all. This PR addresses all issues above: - Integer constants are internally stored as 128-bit values. - Parsing supports either signed (-2^127..2^127) or unsigned (0..2^128) range. Negation works independently of that, so one can write `-0xffff..ffff` (128 bits wide, i.e., -(2^128-1)) to get a `1`. - Underscores are supported to separate groups of digits, so one can write `0xffff_ffff`. - A minor oversight was fixed: hex constants can start with `0X` (uppercase) as well as `0x`, for consistency with Rust and C. This PR also adds a new kind of ISLE test that actually runs a driver linked to compiled ISLE code; we previously didn't have any such tests, but it is now quite useful to assert correct interpretation of constant values.

…used

… node in a union

…ms to be down to 1%-ish or so again

…und remaining because some nodes cost zero points

bjorn3 · 2022-08-17T17:32:03Z

cranelift/codegen/meta/src/shared/settings.rs

            better optimization, but at the cost of a longer compile time.
        "#,
-        false,
+        true,


Does this mean it will do egraph optimizations even if opt_level is set to none?

This is a work-in-progress branch and I'm doing some hacking on some benchmarking infrastructure, and I wanted to name points-in-time by hashes only, without a separate config. We will definitely not enable opts if opts are disabled. Please note the commit comment and please disregard throwaway commits on my work-in-progress branch; there will be ample opportunity to review when the real PRs come.

Didn't see the commit message as I was looking at the github diff view for all changes since the last notification.

cfallin · 2022-09-23T22:28:45Z

Superseded by #4953; closing this one!

cfallin force-pushed the clif-egg branch from 4769a4d to 7768e74 Compare June 9, 2022 00:25

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:meta Everything related to the meta-language. labels Jun 9, 2022

fix tests

6e6d674

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Jun 9, 2022

Bugfixes: successfully roundtrips/runs SpiderMonkey and all Wasmtime …

fb37175

…tests now.

fitzgen reviewed Jun 9, 2022

View reviewed changes

cfallin mentioned this pull request Jun 30, 2022

Cranelift: Using E-Graphs for Verified, Cooperating Middle-End Optimizations bytecodealliance/rfcs#27

Merged

Use custom egraph implementation for speed.

80b6e65

cfallin force-pushed the clif-egg branch from 031770a to 80b6e65 Compare July 8, 2022 06:22

cfallin added 4 commits July 17, 2022 20:46

egraph rebuilding; and get some more perf by pre-allocating with an e…

b064c4d

…stimated node count

ISLE: implement multi-extractors, necessary for egraph/enode matching.

b6cb9e2

cfallin force-pushed the clif-egg branch from 12d290e to b6cb9e2 Compare July 20, 2022 06:26

cfallin added 2 commits July 21, 2022 22:00

Generate ISLE prelude for mid-end opts, and update build infra for se…

1676bca

…parate ISLE environment.

cfallin force-pushed the clif-egg branch from ba242d4 to f562972 Compare July 23, 2022 00:31

github-actions bot added cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language labels Jul 23, 2022

cfallin added 3 commits July 26, 2022 23:40

Updates to use multi-constructors.

3d5abd0

Bugfix in ISLE trie ordering: fallible before infallible etors.

b2721ce

cfallin added 8 commits August 6, 2022 10:49

avoid visit_block_succs in alias analysis; use precomputed CFG instead

092723a

optimizations: better hashcode caching; constant-time ScopedHashMap p…

e7ae6fb

…ops via generation numbers

Hash/Eq by canonical ids

a8f71c5

canonicalize in store-to-load match check

9ec041e

instrument egraph stages with stats

9be9705

some more stats

39625f1

Add some algebraic rules to reassociate adds to make LICM more effect…

e9f709e

…ive; working on improving amode lowering.

fix licm-reassociate: do not use a no-op bitcast to force hoist

54ce81f

cfallin force-pushed the clif-egg branch from 49c7821 to 54ce81f Compare August 8, 2022 04:57

cfallin added 17 commits August 7, 2022 23:19

Enough from simple_preopt to now exactly match baseline on bz2 on x64

f8287d7

Updated TODO.

a15ddaf

optimizations

fee36b9

Rematerialization of op-imm and immediates in each basic block where …

cfc1137

…used

fix remat!

951e9ff

fix aarch64

621d61a

Optimize alias analysis somewhat.

ff96ddb

node subsumption: cprop etc means no need to keep around more complex…

61c84d0

… node in a union

slightly cheaper extraction by memoizing on canonical, not latest, id

9bf2719

elaboration/extraction: shortcut for side-effectful nodes

6b235fc

Search-tree pruning in extractor; egraph compilation time penalty see…

a863cfe

…ms to be down to 1%-ish or so again

get an edge-case right: do not terminate child recursion with zero bo…

318aa56

…und remaining because some nodes cost zero points

optimizations to elaboration

732ba4e

Inline All The Things

26a52cb

Minor opt: do not re-match on node children

c41f206

Enable use_egraphs by default, to make benchmarking infra easier to use

79929c0

bjorn3 reviewed Aug 17, 2022

View reviewed changes

cfallin mentioned this pull request Sep 23, 2022

egraph-based midend: draw the rest of the owl (productionized). #4953

Merged

cfallin closed this Sep 23, 2022

	/// extraction that disqualifies enodes (removes them from
	/// extraction that disqualifies enodes (removes them from

		@@ -0,0 +1,183 @@
		//! Elaboration phase: lowers EGraph back to sequences of operations
		//! in CFG nodes.

-        if matches!(node, Node::Param { .. }) {
-            unreachable!("Param nodes should already be inserted");
-        }
+        assert!(
+            !matches!(node, Node::Param { .. }),
+            "Param nodes should already be inserted",
+        );

		@@ -0,0 +1,90 @@
		//! Extraction phase: pick one enode per eclass, avoiding loops.

[WIP / RFC] Cranelift: Basic support for EGraph roundtripping. #4249

[WIP / RFC] Cranelift: Basic support for EGraph roundtripping. #4249

Uh oh!

Conversation

cfallin commented Jun 9, 2022

Uh oh!

github-actions bot commented Jun 9, 2022

Subscribe to Label Action

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjorn3 commented Jun 10, 2022

Uh oh!

cfallin commented Jun 10, 2022

Uh oh!

github-actions bot commented Jul 23, 2022

Subscribe to Label Action

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cfallin commented Sep 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants