Skip to content

Enhancement: New Inline Assembly #5241

@ghost

Description

Copied over from #215. Inspiration is via them. Thanks also to @MasterQ32 and @kubkon for help extending it to support stack machine architectures. See #7561 for standalone assembler improvements.

New Inline Assembly

asm volatile? {bindings}? body? : post_expression?

TL;DR: Benefits over Status Quo

  • No mandatory sections -- flexible to any application
  • Components are listed in evaluation order
  • First-class support for stack machine architectures
  • First-class support for floating point, vector, and as yet unforeseen register types
  • Operands have types
  • Named inputs are optional
  • Input/output characteristics fully customisable
  • Not bound to an input/output model
  • Can access program symbols and call functions safely
  • Volatility is inferred in most cases
  • Concise, flexible wildcard syntax
  • Substitution syntax easier to scan and less likely to clash with native symbols
  • Open to architecture-specific extensions
  • Communicates stack-relevant metadata to compiler
  • Can be automatically distinguished from status quo; no sneaky breakages

Stack Machines

This syntax has first-class support for stack machine architectures such as WebAssembly, the JVM, and @MasterQ32's SPU Mk. II. It accomplishes this with a novel batch-push and -pop mechanism for marshaling between Zig and the stack. Because there is significant difference between register and stack machine architectures, a new .paradigm() method is defined on builtin.Arch, which returns an enum with the variants .register and .stack. (NOTE: supporting stack machines with LLVM is a very hard problem -- maybe defer to stage 2?)

Meta

At least one of body or post expression must be present. The expression inherits block/statement status from the post expression if present, and defaults to statement if not.

Volatile

This block has side effects, and may not be optimised away if its value is not used. Implied by a return type of void or noreturn, or a mutable symbol binding -- so, in practice, very rarely used.

Bindings

There are three types of bindings: operand, symbol, and clobber. All of them use specially formatted comptime strings to interface with assembly, as in status quo. This decision was made as integrating the required functionality into Zig itself would have required either breaking several guidelines or introducing special constructs with no other use cases.

Operand

An operand binding has the form "operand" name: type = value. Within the block, ?(name) then refers to operand compatible with Zig type type, initially with value value, which may be a register (integer, float, or vector), a datum literal (only integer in every ISA I'm aware of), a stack top (array with size a multiple of stack alignment), or a processor condition code (boolean). type must be coercible to all of name's uses in the block, taking into account sign- or zero-extension and lane width/count if applicable, and may be omitted if the type of value is known -- in addition, value may be omitted if initialisation is not needed, and name may be omitted if only initialisation is needed. The type of the binding must be derivable -- that is, at least one of type or value must be present (this also means that operand and symbol bindings are syntactically distinct). Stack pushes and pops must be declared separately -- see below. Condition codes may not be initialised (type must be present and must be bool). operand may be a wildcard, as described below.

Symbol

A symbol binding has the form "type" const? symbol, where symbol is a program symbol in scope. type is a wildcard indicating the type of symbol, which could be a variable or a function. Within the block, ?(symbol) then refers to the assembly program entity corresponding to the Zig program construct (which need not be an exported symbol -- it may be an internal label, a simple address, or even the referenced data itself on stack machines). A const annotation indicates an immutable binding -- this may be safety-checked by comparing the value at the associated address before and after the block. (NOTE: In some assemblies, many label operations are actually macros, which expand to multiple instructions and relocations -- we'd need some way of propagating this information through the compilation pipeline from codegen to linking.)

Clobber

A clobber is simply "location", which may be a literal or a wildcard.

Wildcards

Wildcards indicate that a binding has special properties, and give the compiler freedom to fill in some details. Wildcards start with ? and run the length of the binding string. A literal ? is escaped with another one, for symmetry with in-block syntax. Wildcards may be followed by architecture-dependent :options to place restrictions on their resolution -- for instance, ?reg:abcd for a legacy x86 register on x86_64, or ?int:lo12 for a 12-bit integer immediate on RISC-V. Options may change the type of a binding -- for instance, "?tmp:all" callconv(.fast) is a clobber that binds all callee-saved registers under the fast calling convention.

The following wildcards are defined:

Operand

  • ?reg
    Arbitrary register. Register machine architectures only. value may be an integer, a float, or an int/float vector, of any architecturally-supported width and length.
  • ?tmp
    Arbitrary caller-saved register under current calling convention. See above. May be annotated with callconv to specify a different calling convention.
  • ?sav
    Arbitrary callee-saved register under current calling convention. See above.
  • ?lit
    Literal. value must be comptime-known, and may be any architecturally-supported literal type.
  • ?psh
    Array. value must be provided. Length * element size must be a multiple of platform stack alignment; elements must be size-compatible with stack cells if applicable. Pushed onto the stack at block entry, leftmost element topmost. Only one allowed per block. This is the only way of marshaling non-symbol values into assembly on stack machines.
  • ?pop
    Uninitialised array (value must not be provided). See above. Popped from the stack on block exit, topmost element leftmost. This is the only way of marshaling non-symbol values out of assembly on stack machines.
  • ?stg
    Additional stack growth, i.e. growth not already accounted for by ?push or function calls, in bytes. name, type omitted. value must be comptime-known. (NOTE: This does not imply that the stack pointer has a different value before and after the block -- in fact, unless it is listed as a clobber, this is not allowed.)

Symbol

  • ?locl
    Local variable. Stack machine only.
  • ?argm
    Argument of current function. Stack machine only. Implies const.
  • ?glob
    Global variable.
  • ?thdl
    Thread-local variable.
  • ?comp
    Comptime-known variable/constant. Substitution semantics of a literal. Implies const.
  • ?func
    Function. Registers symbol in this block's call graph. Implies const.

Clobber

  • ?memory
    Unspecified memory.
  • ?status
    Processor status flags.

Body

The assembly code itself, as a comptime string. For symbol scoping purposes, treated as a separate file, i.e. declared symbols do not leak to the rest of the program and elsewhere-defined symbols are not visible except through bindings. May be omitted if only values of registers are desired.

Bound operands and symbols are accessed within the block by enclosing their names in ?(). This syntax was chosen as the ? character is far less commonly used in assembly languages than %, and pairs well with the theme of an unknown resolution -- additionally, parentheses are less likely to have semantic significance than square brackets, so the code is easier to scan. Accessing an unbound name in this manner is a compile error. As with wildcards, names may be modified with :options, for instance ?(r:hi) to access the high byte of register r, or ?(i:x) to print integer i in hexadecimal. A literal ? is escaped with another one, as regular escaping is not possible in multiline strings.

Post Expression

An expression evaluated after the body, using the final values of all bindings. Becomes the value of the whole block. Preceded by a colon. May be omitted without ambiguity, in which case the return type is void. This permits us to return as many values as we like, in whatever format and location we choose. Moreover, we don't have to specify the exact lifetimes of all of our inputs and outputs to appease the optimiser -- we can decide for ourselves how our values are allocated and consumed.

Examples

Simple, bindless assembly is simple:

comptime assert(builtin.arch == .x86_64);

// No unused names, types on everything
asm { "rax": u64 = 60, "rdi": u64 = 0 } "syscall";

// No unnecessary detail
starting_stack_ptr = asm { "rsp" sp: usize } : sp;

More involved assembly is logical:

// Using #1717 syntax because that proposal has been accepted
// -- this proposal does not depend on #1717
const vendorId = fn () void {
    comptime assert(builtin.arch == .x86_64);

    // Multiple return values, anyone?
    return asm {
        "eax": u32 = 0,
        "ebx" b: u32,
        "ecx" c: u32,
        "edx" d: u32,
        "?memory",
    } "cpuid"
    : .{ b, c, d };
};

// In case we have trouble getting RLS working, we can do it directly
const vendorId2 = fn (result: *[3]u32) void {
    comptime assert(builtin.arch == .x86_64);

    // void return type implies volatile
    asm {
        "eax": u32 = 0,
        "ebx" b: u32,
        "ecx" c: u32,
        "edx" d: u32,
        "?memory",
    } "cpuid"
    : {
        result[0] = b;
        result[1] = c;
        result[2] = d;
    }
};

A simple bare-metal OS entry point on RISC-V:

const stack_height = 16 * 1024;
var stack: [stack_height]usize = undefined;

const _start = fn callconv(.naked) () noreturn {
    comptime assert(builtin.arch == .riscv64);

    asm {
        "?func" kmain,
        "?glob" stack,

        "?reg" stack_size: usize = stack_height,
        "?int" slot_shift: usize = @ctz(@sizeOf(usize)),
        "sp", "ra", "t1",
    }
    \\ slli ?(stack_size), ?(stack_size), ?(slot_shift)
    \\ la sp, ?(stack)
    \\ add sp, sp, ?(stack_size)
    \\ call ?(kmain)
    : unreachable;
};

const kmain = fn () noreturn {
    // kernel kernel kernel
};

POSIX startcode (adapted from lib/std/start.zig):

const _start = fn callconv(.naked) () noreturn {
    if (builtin.os.tag == .wasi) {
        std.os.wasi.proc_exit(@call(.{ .modifier = .always_inline }, callMain, .{}));
    }

    asm {
        "?reg" stack_ptr: [*]usize,
    // Much more compact and local
    } switch (builtin.arch) {
        .x86_64 => "mov ?(stack_ptr), rsp",
        .i386 => "mov ?(stack_ptr), esp",
        .aarch64, .aarch64_be, .arm => "mov ?(stack_ptr), sp",
        .riscv64 => "mv ?(stack_ptr), sp"
        .mips, .mipsel => (
          \\ .set noat
          \\ move ?(stack_ptr), $sp
        ),
        else => @compileError("unsupported arch"),
    }
    // By the time we get here, we have the stack pointer
    // -- so, no global required
    : @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{ stack_ptr });
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions