Skip to content

Comparison with other compiler frameworks

Vaivaswatha N edited this page Aug 16, 2024 · 7 revisions

A Comparison of pliron with other compiler frameworks

In the following sections, I compare some ideas in pliron with that of

  • The sway compiler: Sway is a domain-specific language inspired by Rust.
  • abstraps: A project with similar goals as pliron (i.e., an MLIR like extensible compiler framework), but not actively maintained at the time of writing this article.
  • cranelift An optimizing compiler written in Rust. Also an alternate backend for the Rust compiler (with the primary backend, today, being LLVM itself).

There also exists Melior, a project providing Rust bindings for MLIR, as a wrapper around MLIR's C bindings. Melior uses unsafe Rust.

In the following discussion, the terms "operation" and "instruction" are mostly used interchangeably. Traditional LLVM compilers use the term "instruction" while MLIR uses "operation" (but with more generalised semantics).

The article assumes that the reader is reasonably aware of the basic design ideas of MLIR and uses terms with that assumption.

Representing Program Entities

A program representation requires different entities, such as instructions (operations), basic blocks, etc, to be part of it, with each entities referring to others. Rust's borrowing rules make it difficult to traverse and transform the program if it were to be represented directly with Rust references.

abstraps uses a type wrapper around usize to represent Vars in the program. Operations in abstraps, like in LLVM, produce a single result, and the integer associated with that operation result also represents the operation itself. At the region level, a map is maintained to get reference to an Operation from the Var that represents its result.

cranelift uses custom vector based data structures to store entities, representing them with type safe wrappers around indices to these vectors. Entities are stored per Function. The sway compiler uses the generational-arena crate to store various entities, with type safe wrappers around the arena index to represent them. A Context data structure holds all entities in the program (i.e., at a global level). The context is passed around for most IR operations.

cranelift mentions the tradeoffs between its custom data-structures vs using generational-arena.

pliron uses generational-arena to represent entities. Similar to the sway compiler , a Context data structure holds all arenas. The similarity ends there though.

Type safe arena indices are implemented using Ptr<T: ArenaObj>. These Ptr objects are light-weight and hence Copyable. An entity needs to implement the ArenaObj trait, which essentially provides a mechanism to get the specific arena from Context. Thereafter, type-safe "dereferencing" of Ptr objects are automatically provided. ArenaObj also handles deallocation of contained entities and clearing references from other external entities, provided that the trait methods are correctly implemented.

Entities stored in the arena are wrapped with RefCell allowing safe mutation of individual entities.

Representing Lists

Most non-Rust compilers implement sequences of instructions (operations) in a basic block, or the sequence of basic blocks in a function (region in general) as linked lists. The sway compiler, abstraps and rustc all simply use Vec instead. This is quite inefficient for many compiler transforms, which require inserting and deleting any instruction or basic block, with just a reference to that instruction or block. cranelift on the other hand implements Layout, which implements linked-list operations for both basic blocks and instructions. pliron also implements a linked list via the LinkedList trait, which provides a common linked list implementation for any ArenaObj entity, and currently used for operations and basic blocks.

Def-Use Chains

Given a value definition (block argument, operation result), what are its uses? This is a fundamental query used by many compiler transformations and analyses. This information is part of a data-structure called def-use chains. Traditional compilers computed this on-demand using data-flow analyses. Modern SSA compilers like LLVM store always-available and always-valid def-use info as part of the IR itself.

cranelift, although not providing def-use chains (at least from what I could understand), allows value defs to alias to other values. This provides an efficient way to replace_all_uses_with, which is a common use of def-use chains, to replace all uses of a value with that of another value. Lookups always go down the alias chain (possibly shortening them for efficiency). All of this functionality is provided by the Data Flow Graph data-structure.

The sway compiler provides a replace_all_uses_with functionality that takes as input, a map from original values to the replacement values, and does all replacements with one traversal of the function. There is no real def-use information available. abstraps does not, yet, implement any functionality related to def-use information.

pliron, similar to LLVM (and consequently MLIR), provides first-class support for def-use chains.

The Type System

The sway compiler and cranelift, similar to LLVM, provides a rich, but non-extensible set of pre-defined types. A type system does not seem to be part of the core ir of abstraps, but instead has dialect specific types, referred to as lattice. These are used by the abstract interpreter that is part of abstraps.

pliron's type system is modelled based on that of MLIR's and allows extending the type system to allow any arbitrary abstraction as a type.

Any Rust type can be a type in the pliron infrastructure by implementing the Type trait. Also similar to MLIR's types, pliron's types are uniqued.

It may help the reader visualise this better by referring to the definitions of some builtin types. There is nothing special about these builtin types, except that they are part of a dialect named builtin. Types implement ArenaObj and are thus referred to using Ptr<TypeObj>.

Types are immutable once constructed (and hence can be uniqued). The exception is when the type chooses to manually implement Hash, skipping certain components from the hash. This allows building recursive types. In such a case, uniquing happens based on what components are part of the Hash. See the StructType in the llvm dialect, which allows building types such as LinkedList { data: i64, next: llvm.ptr<LinkedList>, }.

Type objects (i.e, objects implementing the Type trait) can be downcasted to a reference to their concrete Rust types.

Operations

cranelift and the sway compiler's instructions are similar to those of LLVM, i.e., a closed set and not extensible. abstraps and pliron both have an MLIR like Operation, which is a general container for operation/instruction data (results, attributes and operands). Specific Ops (opcode) are light wrappers around an Operation.

abstraps implements this a little differently by having an intrinsic trait object contained in the Operation.

pliron is more similar to MLIR. An Op trait object is a thin wrapper around a Ptr<Operation> and provides opcode specific API on the Operation.

Operations have an OpId (essentially a text identifier for that opcode, similar to OpName in MLIR) that is provided when the Op is registered.

It is possible to convert b/w a Ptr<Operation> and a reference to a specific Op, again similar to how it works in MLIR.

Attributes

LLVM, cranelift and the sway compiler do not have Attributes. Like in MLIR, abstraps and pliron both provide an Attribute trait, allowing any Rust type to implement this. Attributes are non-SSA data that can be stored in Operations. Unlike MLIR, pliron also provides an attribute dictionary at the BasicBlock level.

Unlike MLIR, we do not unique Attributes in pliron (although it would be simple to do so, and was in fact done in an earlier version). Attributes are owned by its container (Operation / BasicBlock) and can be mutated if necessary. This makes Attributes in pliron similar to the proposed Properties in MLIR.

Trait objects of the Attribute trait can be downcasted to their concrete Rust type.

Interfaces

LLVM, cranelift and the sway compiler do not have the concept of interfaces. Interfaces are an idea that is novel to extensible compilers like MLIR. Both abstraps and pliron provide interface functionality with similar external behaviour, but implemented differently.

What MLIR terms as Traits and Interfaces are both provided and implemented as Interfaces in pliron. Interfaces are just Rust traits with either of Op, Attribute or Type being a super trait. A concrete Op, Attribute or Type can choose to impl an interface. It is possible to check, at runtime, whether an op, attribute or type objectimpls a particular interface and cast it to a trait object of that interface.

abstraps on the other hand provides similar functionality via its own implementation of vtables to keep track of trait objects.

Details on the implementation of interfaces, casting to interface objects and verifying them are provided in the interfaces article.