Skip to content

Proposal: evolve RyuJIT emitter to improve flexibility #80589

@BruceForstall

Description

@BruceForstall

The RyuJIT emitter uses a machine-level program representation which is not designed for modification after it is generated, such as modification due to peephole optimization or other low-level code transformations. It has long been known that certain "easy" optimizations could be made by adding peephole optimization capability, and a few specific cases have been added with severe limitations due to the current representation. This issue describes some of the problems and proposes incremental changes to increase its flexibility.

Current status

RyuJIT IR (internal representation) for representing a program has three basic forms:

  • tree-based GenTrees, sometimes referred to as HIR (high-level IR) to distinguish it from LIR and reuse the terminology used in some compiler literature
  • linear GenTree LIR (linear IR, or low-level IR), used after Lowering
  • Instruction Groups + Instruction Descriptors (IG + ID; in code: insGroup + instrDesc). I propose to call this MIR for machine-level IR, since it represents machine instructions and arguments that can be used for binary encoding. (Note that some compiler literature instead uses MIR to stand for "medium-level IR".)

In RyuJIT, MIR is a single-linked list of IGs, each of which contains a packed memory buffer of variable-sized instrDescs. There is no facility to navigate to a previous instrDesc from an instrDesc pointer, delete an instrDesc, or insert a new one. An IG roughly corresponds to a BasicBlock in LIR and stores the GC state that exists at the beginning of the IG. Each instrDesc encodes enough information to describe how the instruction affects GC state.

During code generation (codegen), MIR is incrementally created by iterating over LIR and generating machine instructions as IG/ID lists. After codegen, branch tightening and alignment processing occurs. After this point, we know how much memory to request from the VM, and the final memory space for the generated code is allocated. MIR is traversed, encoding instructions into this memory, and also computing and updating GC information, collecting GC state changes at final instruction offsets to hand back to the VM.

Final instruction offsets are needed for other needs as well: (1) IP->native mapping, (2) variable live ranges, (3) unwind information.

One key mechanism used to compute final instruction offsets is the emitLocation type. During code generation, any time a location in the final code stream is needed, an emitLocation is created. It's basically a pointer to the current or specified location of generated instrDescs. Crucially, it can give the actual code offset of this location after branch tightening has changed the initial estimated instruction offsets and we have final offsets.

However, making changes to the instrDesc instruction stream after an emitLocation has captured a particular location might invalidate the emitLocation, thus requiring anyone making such modification to find and update all the appropriate captured emitLocation, which is currently difficult and costly. Thus, we currently do not have any machine-level optimizations that remove any already-generated instructions.

Proposal: Remove emitLocation

To improve emitter flexibility, eliminate the use of emitLocation. These are all cases where the code needs a final instruction offset, but during codegen, that is not available. Instead of emitLocation, create new instrDesc representing the thing for which location information is needed, such as an IP->native mapping or the birth or death of a variable. These "pseudo-instructions" (or "MIR nodes") would be represented within the instrDesc list but would not generate any code. Branch tightening would leave them alone, but they would retain their relative position. Then, either before or after instruction encoding, as needed, walk the MIR to process these pseudo-instructions as necessary, such as building up and reporting variable live range data structures.

With this, a peephole optimization would be free to delete an instruction without worrying about updating any "pointers" to the instruction being removed. The optimizations would need to be aware of non-code MIR nodes, however, and adjust behavior appropriately. This is good, though, as those nodes represent semantics that would need to be respected by the optimization.

Proposal: Support backwards navigation

To support peephole optimizations, the JIT currently maintains a "last instruction" pointer (and soon might maintain a larger, though fixed-size and limited, history of "last instruction" pointers to support looking backwards more than a single instruction).

An alternative is to introduce a "previous" pointer to the instrDesc.

Currently, given an instrDesc*, you can use emitAdvanceInstrDesc() or emitNextID() to move to the next instrDesc in the list. To move backwards, we could add a byte to an instrDesc that specifies the number of bytes to subtract from an instrDesc* to reach the previous one, or zero if there is no previous one in the ID list for this IG. This presumes that all instrDesc are less than 256 bytes (which I believe is true, but otherwise we could use a short). To support going backwards across IGs, we would need to add an insGroup* igPrev "previous" pointer to insGroup. Then, add emitter::emitPrevInstrDesc(insGroup** ig, instrDesc** id) for navigation.

Alternatives

This proposal is an incremental change over the existing RyuJIT data structures. An interesting alternative is to scrap using IG/ID as MIR altogether, and reuse the existing linear GenTree representation, and LIR operations, instead. This would be done by creating new machine-level GenTree nodes, with machine-level instructions / opcodes and formats, much as is currently stored in instrDescs. Codegen would transform from one kind of GenTree to another.

This would re-use much existing mechanism, and possibly make it easier to do something like re-use our SSA package on machine-level IR. However, transforming the current code to this would require care to avoid rewriting a significant amount of very complicated instruction encoding logic. Also, memory usage and throughput would need consideration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions