-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Copied over from #215. Inspiration is via them. Thanks also to @MasterQ32 and @kubkon for help extending it to support stack machine architectures. See #7561 for standalone assembler improvements.
New Inline Assembly
asm volatile? {bindings}? body? : post_expression?
TL;DR: Benefits over Status Quo
- No mandatory sections -- flexible to any application
- Components are listed in evaluation order
- First-class support for stack machine architectures
- First-class support for floating point, vector, and as yet unforeseen register types
- Operands have types
- Named inputs are optional
- Input/output characteristics fully customisable
- Not bound to an input/output model
- Can access program symbols and call functions safely
- Volatility is inferred in most cases
- Concise, flexible wildcard syntax
- Substitution syntax easier to scan and less likely to clash with native symbols
- Open to architecture-specific extensions
- Communicates stack-relevant metadata to compiler
- Can be automatically distinguished from status quo; no sneaky breakages
Stack Machines
This syntax has first-class support for stack machine architectures such as WebAssembly, the JVM, and @MasterQ32's SPU Mk. II. It accomplishes this with a novel batch-push and -pop mechanism for marshaling between Zig and the stack. Because there is significant difference between register and stack machine architectures, a new .paradigm() method is defined on builtin.Arch, which returns an enum with the variants .register and .stack. (NOTE: supporting stack machines with LLVM is a very hard problem -- maybe defer to stage 2?)
Meta
At least one of body or post expression must be present. The expression inherits block/statement status from the post expression if present, and defaults to statement if not.
Volatile
This block has side effects, and may not be optimised away if its value is not used. Implied by a return type of void or noreturn, or a mutable symbol binding -- so, in practice, very rarely used.
Bindings
There are three types of bindings: operand, symbol, and clobber. All of them use specially formatted comptime strings to interface with assembly, as in status quo. This decision was made as integrating the required functionality into Zig itself would have required either breaking several guidelines or introducing special constructs with no other use cases.
Operand
An operand binding has the form "operand" name: type = value. Within the block, ?(name) then refers to operand compatible with Zig type type, initially with value value, which may be a register (integer, float, or vector), a datum literal (only integer in every ISA I'm aware of), a stack top (array with size a multiple of stack alignment), or a processor condition code (boolean). type must be coercible to all of name's uses in the block, taking into account sign- or zero-extension and lane width/count if applicable, and may be omitted if the type of value is known -- in addition, value may be omitted if initialisation is not needed, and name may be omitted if only initialisation is needed. The type of the binding must be derivable -- that is, at least one of type or value must be present (this also means that operand and symbol bindings are syntactically distinct). Stack pushes and pops must be declared separately -- see below. Condition codes may not be initialised (type must be present and must be bool). operand may be a wildcard, as described below.
Symbol
A symbol binding has the form "type" const? symbol, where symbol is a program symbol in scope. type is a wildcard indicating the type of symbol, which could be a variable or a function. Within the block, ?(symbol) then refers to the assembly program entity corresponding to the Zig program construct (which need not be an exported symbol -- it may be an internal label, a simple address, or even the referenced data itself on stack machines). A const annotation indicates an immutable binding -- this may be safety-checked by comparing the value at the associated address before and after the block. (NOTE: In some assemblies, many label operations are actually macros, which expand to multiple instructions and relocations -- we'd need some way of propagating this information through the compilation pipeline from codegen to linking.)
Clobber
A clobber is simply "location", which may be a literal or a wildcard.
Wildcards
Wildcards indicate that a binding has special properties, and give the compiler freedom to fill in some details. Wildcards start with ? and run the length of the binding string. A literal ? is escaped with another one, for symmetry with in-block syntax. Wildcards may be followed by architecture-dependent :options to place restrictions on their resolution -- for instance, ?reg:abcd for a legacy x86 register on x86_64, or ?int:lo12 for a 12-bit integer immediate on RISC-V. Options may change the type of a binding -- for instance, "?tmp:all" callconv(.fast) is a clobber that binds all callee-saved registers under the fast calling convention.
The following wildcards are defined:
Operand
?reg
Arbitrary register. Register machine architectures only.valuemay be an integer, a float, or an int/float vector, of any architecturally-supported width and length.?tmp
Arbitrary caller-saved register under current calling convention. See above. May be annotated withcallconvto specify a different calling convention.?sav
Arbitrary callee-saved register under current calling convention. See above.?lit
Literal.valuemust be comptime-known, and may be any architecturally-supported literal type.?psh
Array.valuemust be provided. Length * element size must be a multiple of platform stack alignment; elements must be size-compatible with stack cells if applicable. Pushed onto the stack at block entry, leftmost element topmost. Only one allowed per block. This is the only way of marshaling non-symbol values into assembly on stack machines.?pop
Uninitialised array (valuemust not be provided). See above. Popped from the stack on block exit, topmost element leftmost. This is the only way of marshaling non-symbol values out of assembly on stack machines.?stg
Additional stack growth, i.e. growth not already accounted for by?pushor function calls, in bytes.name,typeomitted.valuemust be comptime-known. (NOTE: This does not imply that the stack pointer has a different value before and after the block -- in fact, unless it is listed as a clobber, this is not allowed.)
Symbol
?locl
Local variable. Stack machine only.?argm
Argument of current function. Stack machine only. Impliesconst.?glob
Global variable.?thdl
Thread-local variable.?comp
Comptime-known variable/constant. Substitution semantics of a literal. Impliesconst.?func
Function. Registerssymbolin this block's call graph. Impliesconst.
Clobber
?memory
Unspecified memory.?status
Processor status flags.
Body
The assembly code itself, as a comptime string. For symbol scoping purposes, treated as a separate file, i.e. declared symbols do not leak to the rest of the program and elsewhere-defined symbols are not visible except through bindings. May be omitted if only values of registers are desired.
Bound operands and symbols are accessed within the block by enclosing their names in ?(). This syntax was chosen as the ? character is far less commonly used in assembly languages than %, and pairs well with the theme of an unknown resolution -- additionally, parentheses are less likely to have semantic significance than square brackets, so the code is easier to scan. Accessing an unbound name in this manner is a compile error. As with wildcards, names may be modified with :options, for instance ?(r:hi) to access the high byte of register r, or ?(i:x) to print integer i in hexadecimal. A literal ? is escaped with another one, as regular escaping is not possible in multiline strings.
Post Expression
An expression evaluated after the body, using the final values of all bindings. Becomes the value of the whole block. Preceded by a colon. May be omitted without ambiguity, in which case the return type is void. This permits us to return as many values as we like, in whatever format and location we choose. Moreover, we don't have to specify the exact lifetimes of all of our inputs and outputs to appease the optimiser -- we can decide for ourselves how our values are allocated and consumed.
Examples
Simple, bindless assembly is simple:
comptime assert(builtin.arch == .x86_64);
// No unused names, types on everything
asm { "rax": u64 = 60, "rdi": u64 = 0 } "syscall";
// No unnecessary detail
starting_stack_ptr = asm { "rsp" sp: usize } : sp;More involved assembly is logical:
// Using #1717 syntax because that proposal has been accepted
// -- this proposal does not depend on #1717
const vendorId = fn () void {
comptime assert(builtin.arch == .x86_64);
// Multiple return values, anyone?
return asm {
"eax": u32 = 0,
"ebx" b: u32,
"ecx" c: u32,
"edx" d: u32,
"?memory",
} "cpuid"
: .{ b, c, d };
};
// In case we have trouble getting RLS working, we can do it directly
const vendorId2 = fn (result: *[3]u32) void {
comptime assert(builtin.arch == .x86_64);
// void return type implies volatile
asm {
"eax": u32 = 0,
"ebx" b: u32,
"ecx" c: u32,
"edx" d: u32,
"?memory",
} "cpuid"
: {
result[0] = b;
result[1] = c;
result[2] = d;
}
};A simple bare-metal OS entry point on RISC-V:
const stack_height = 16 * 1024;
var stack: [stack_height]usize = undefined;
const _start = fn callconv(.naked) () noreturn {
comptime assert(builtin.arch == .riscv64);
asm {
"?func" kmain,
"?glob" stack,
"?reg" stack_size: usize = stack_height,
"?int" slot_shift: usize = @ctz(@sizeOf(usize)),
"sp", "ra", "t1",
}
\\ slli ?(stack_size), ?(stack_size), ?(slot_shift)
\\ la sp, ?(stack)
\\ add sp, sp, ?(stack_size)
\\ call ?(kmain)
: unreachable;
};
const kmain = fn () noreturn {
// kernel kernel kernel
};POSIX startcode (adapted from lib/std/start.zig):
const _start = fn callconv(.naked) () noreturn {
if (builtin.os.tag == .wasi) {
std.os.wasi.proc_exit(@call(.{ .modifier = .always_inline }, callMain, .{}));
}
asm {
"?reg" stack_ptr: [*]usize,
// Much more compact and local
} switch (builtin.arch) {
.x86_64 => "mov ?(stack_ptr), rsp",
.i386 => "mov ?(stack_ptr), esp",
.aarch64, .aarch64_be, .arm => "mov ?(stack_ptr), sp",
.riscv64 => "mv ?(stack_ptr), sp"
.mips, .mipsel => (
\\ .set noat
\\ move ?(stack_ptr), $sp
),
else => @compileError("unsupported arch"),
}
// By the time we get here, we have the stack pointer
// -- so, no global required
: @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{ stack_ptr });
};