-
Notifications
You must be signed in to change notification settings - Fork 30
Comparison with other compiler frameworks
In the following sections, I compare some ideas in pliron with
that of
- The sway compiler: Sway is a domain-specific language inspired by Rust.
-
abstraps: A project with
similar goals as
pliron(i.e., an MLIR like extensible compiler framework), but not actively maintained at the time of writing this article. - cranelift An optimizing compiler written in Rust. Also an alternate backend for the Rust compiler (with the primary backend, today, being LLVM itself).
There also exists Melior,
a project providing Rust bindings for MLIR, as a wrapper around
MLIR's C bindings. Melior uses unsafe Rust.
In the following discussion, the terms "operation" and "instruction" are mostly used interchangeably. Traditional LLVM compilers use the term "instruction" while MLIR uses "operation" (but with more generalised semantics).
The article assumes that the reader is reasonably aware of the basic design ideas of MLIR and uses terms with that assumption.
A program representation requires different entities, such as instructions (operations), basic blocks, etc, to be part of it, with each entities referring to others. Rust's borrowing rules make it difficult to traverse and transform the program if it were to be represented directly with Rust references.
abstraps uses a type wrapper around usize to represent
Vars
in the program. Operations in abstraps, like in LLVM,
produce a single result, and the integer associated with that
operation result also represents the operation itself. At the region
level, a map
is maintained to get reference to an Operation from the Var
that represents its result.
cranelift uses custom vector based data structures
to store entities, representing them with type safe wrappers
around indices to these vectors. Entities are stored per Function.
The sway compiler uses the generational-arena
crate to store various entities, with type safe wrappers around the
arena index to represent them. A Context data structure holds all
entities in the program (i.e., at a global level). The context is
passed around for most IR operations.
cranelift mentions the tradeoffs between its custom data-structures
vs using generational-arena.
pliron uses generational-arena to represent entities. Similar
to the sway compiler , a Context data structure holds all arenas.
The similarity ends there though.
Type safe arena indices are implemented using
Ptr<T: ArenaObj>.
These Ptr objects are light-weight and hence Copyable. An entity
needs to implement the ArenaObj trait, which essentially provides
a mechanism to get the specific arena from Context. Thereafter,
type-safe "dereferencing" of Ptr objects are automatically provided.
ArenaObj also handles deallocation of contained
entities and clearing references from other external entities,
provided that the trait methods are correctly implemented.
Entities stored in the arena are wrapped with RefCell allowing safe mutation of individual entities.
Most non-Rust compilers implement sequences of instructions (operations)
in a basic block, or the sequence of basic blocks
in a function (region in general) as linked lists. The sway
compiler, abstraps and rustc all simply use Vec instead. This is quite
inefficient for many compiler transforms, which require inserting
and deleting any instruction or basic block, with just a reference
to that instruction or block. cranelift on the other hand
implements Layout, which implements linked-list operations for both
basic blocks and instructions. pliron also implements a linked
list via the LinkedList
trait, which provides a common linked list implementation for any ArenaObj
entity, and currently used for operations and basic blocks.
Given a value definition (block argument, operation result), what are its uses? This is a fundamental query used by many compiler transformations and analyses. This information is part of a data-structure called def-use chains. Traditional compilers computed this on-demand using data-flow analyses. Modern SSA compilers like LLVM store always-available and always-valid def-use info as part of the IR itself.
cranelift, although not providing def-use chains (at least from
what I could understand), allows value defs to alias to other values.
This provides an efficient way to replace_all_uses_with, which is
a common use of def-use chains, to replace all uses of a value with
that of another value. Lookups always go down the alias chain
(possibly shortening them for efficiency). All of this functionality
is provided by the Data Flow Graph data-structure.
The sway compiler provides a replace_all_uses_with functionality
that takes as input, a map from original values to the replacement
values, and does all replacements with one traversal of the function.
There is no real def-use information available. abstraps does not,
yet, implement any functionality related to def-use information.
pliron, similar to LLVM (and consequently MLIR), provides
first-class support for def-use chains.
The sway compiler and cranelift, similar to LLVM, provides
a rich, but non-extensible set of pre-defined types. A type system
does not seem to be part of the core ir of abstraps, but instead has dialect specific types, referred to as
lattice. These are used by the abstract interpreter that is part of abstraps.
pliron's type system is modelled based on that of MLIR's and allows extending the type system
to allow any arbitrary abstraction as a type.
Any Rust type can be a type in the pliron infrastructure by
implementing the Type trait.
Also similar to MLIR's types, pliron's types are
uniqued.
It may help the reader visualise this better by referring to the
definitions of some builtin types.
There is nothing special about these builtin
types, except that they are part of a dialect named builtin.
Types implement ArenaObj and are thus referred to using
Ptr<TypeObj>.
Types are immutable once constructed (and hence can be uniqued).
The exception is when the type chooses to manually implement Hash,
skipping certain components from the hash. This allows building recursive types.
In such a case, uniquing happens based on what components are part of the Hash.
See the StructType
in the llvm dialect, which allows building types such as
LinkedList { data: i64, next: llvm.ptr<LinkedList>, }.
Type objects (i.e, objects implementing the Type trait) can be
downcasted to a reference to their concrete Rust types.
cranelift and the sway compiler's instructions are similar to
those of LLVM, i.e., a closed set and not extensible.
abstraps
and pliron
both have an MLIR like Operation, which is a general
container for operation/instruction data (results, attributes and
operands). Specific Ops (opcode) are light wrappers around an
Operation.
abstraps implements this a little differently by having an
intrinsic trait object contained in the Operation.
pliron is more similar to MLIR. An Op
trait object is a thin wrapper around a Ptr<Operation> and
provides opcode specific API on the Operation.
Operations have an OpId (essentially a text identifier for that
opcode, similar to OpName in MLIR) that is provided when the Op
is registered.
It is possible to convert b/w a Ptr<Operation> and a reference
to a specific Op, again similar to how it works in MLIR.
LLVM, cranelift and the sway compiler do not have Attributes.
Like in MLIR, abstraps and pliron both provide an Attribute
trait, allowing any Rust type to implement this. Attributes are
non-SSA data that can be stored in Operations. Unlike MLIR, pliron
also provides an attribute dictionary at the BasicBlock level.
Unlike MLIR, we do not unique Attributes in pliron (although
it would be simple to do so, and was in fact done in an earlier
version). Attributes are owned by its container (Operation /
BasicBlock) and can be mutated if necessary. This makes
Attributes in pliron similar to the proposed
Properties
in MLIR.
Trait objects of the Attribute trait can be downcasted to their
concrete Rust type.
LLVM, cranelift and the sway compiler do not have the concept
of interfaces. Interfaces are an idea that is novel to extensible
compilers like MLIR. Both abstraps and pliron provide interface
functionality with similar external behaviour, but implemented differently.
What MLIR terms as Traits and Interfaces are both provided and
implemented as Interfaces in pliron. Interfaces are just Rust traits
with either of Op, Attribute or Type being a super trait. A concrete
Op, Attribute or Type can choose to impl an interface. It is
possible to check, at runtime, whether an op, attribute or type objectimpls
a particular interface and cast it to a trait object of that interface.
abstraps on the other hand provides similar functionality via
its own implementation
of vtables to keep track of trait objects.
Details on the implementation of interfaces, casting to interface objects and verifying them are provided in the interfaces article.