profile
viewpoint
gnzlbg gnzlbg RWTH Aachen University Munich, Germany http://gnzlbg.github.io Physics, HPC, Heterogeneous computing, SIMD, Systems programming, C++, Rust

gnzlbg/cargo-asm 539

cargo subcommand showing the assembly or llvm-ir generated for Rust code

ericniebler/meta 226

A tiny metaprogramming library

fitzgen/mach 40

A rust interface to the Mach 3.0 kernel that underlies OSX.

gnzlbg/bitwise 38

Portable high-level bitwise manipulation algorithms

gnzlbg/bitintr 21

Portable Bitwise Manipulation Intrinsics

gnzlbg/aobench 7

Ambient Occlusion Benchmark in Rust (multi-threaded and explicitly vectorized)

gnzlbg/arithmetic_type 5

Implementation of an arithmetic type in C++

gnzlbg/cffi-panic 4

Error handling in Rust->C->Rust for C APIs taking callbacks

gnzlbg/ampi 3

Asynchronous Message Passing Interface

gnzlbg/any 1

Implementation of std::experimental::any

push eventrust-lang/packed_simd

Travis CI User

commit sha 640f08c67ec91f541ca6ab367a3dbd891ed79b9d

Update documentation

view details

push time in 3 days

issue commentregolith-linux/regolith-desktop

Unable to utilize workspace names with the workspace directive.

Thanks! workspace number 1 works perfectly!

jtcrank

comment created time in 10 days

push eventrust-lang/packed_simd

Travis CI User

commit sha c29fe353a80198ec27306641c4ecf2c35b9f843d

Update documentation

view details

push time in 10 days

issue commentregolith-linux/regolith-desktop

Unable to utilize workspace names with the workspace directive.

I ran into this issue while porting my i3 config to regolith:

In my .config/regolith/i3/config file I've added:

exec --no-startup-id ~/.config/regolith/i3/layout

where layout is defined as:

#!/usr/bin/env sh

i3-msg "workspace 1"; append_layout ~/.config/regolith/i3/ws1.json"
google-chrome --app=google.com &
google-chrome --app=soundcloud.com &
spotify &
slack &

The problem is that workspace 1 does not refer to the first workspace in regolith (it creates a new workspace). This issue does not contain the solution to this problem, nor I can find it in the documentation.

jtcrank

comment created time in 10 days

issue commentregolith-linux/regolith-desktop

Wallpaper bugged after changing multi monitor display configuration

That worked for me, thank you. Might be good to document this somewhere in the customization section of the webpage. It was not clear to me which parts of the configuration are done through the usual i3 methods, and which parts follow the usual Ubuntu / GNOME methods. Thank you for all your help!

cafeoh

comment created time in 11 days

issue commentregolith-linux/regolith-desktop

regolith-desktop changes login screen background before activation?

I'd like to theme the login for ppa installation, is there a way to opt in to that?

arthur-e

comment created time in 11 days

issue commentregolith-linux/regolith-desktop

Wallpaper bugged after changing multi monitor display configuration

Its unclear to me how to change the wallpaper in regolith.

I'm using the Solarized dark theme, and have added a

exec --no-startup-id feh --bg-fill

to my ~/.config/regolith/i3/config, but the wall paper gets "corrupted" and or changed back to the original one when I add external monitors.

With external monitors, even the original wall paper gets messed up, but that seems related to #133

cafeoh

comment created time in 11 days

issue openedregolith-linux/regolith-desktop

Volume unmute and volume scroll not working for me

In the bar volume control, I can use the right click to mute the sound. However:

  • right click to unmute alter the display of the sound level in the bar, but does not unmute the sound - I need to manually press the FN+VolUP key to unmute.

  • when the sound is unmuted, I can scroll up and down, and this raises and lowers the sound level percentage, but the sound does not change.

created time in 12 days

issue openedregolith-linux/regolith-desktop

Volume block and mute

The volume block just shows "S" when the output is muted (as opposed to "S X%").

It would be nicer to use a speaker icon for the volume, and to change that to use the "Speaker with cancelation" (unicode) icon when the volume gets muted.

created time in 12 days

issue openedregolith-linux/regolith-desktop

Add a brightness block

E.g. similar to the one from i3blocks, that allows controlling display brightness.

created time in 12 days

issue openedregolith-linux/regolith-desktop

Add a weather block

E.g. similar to the one from i3blocks.

created time in 12 days

issue openedregolith-linux/regolith-desktop

Add a bluetooth block

Similar to the one in MacOSX.

created time in 12 days

issue openedregolith-linux/regolith-desktop

Add microphone block

Add a microphone block, that shows up if the microphone is muted or not, and that lets you control the microphone volume.

created time in 12 days

issue openedregolith-linux/regolith-desktop

Add vpn block

Add a vpn block, e.g., based on the nm-vpn block from i3blocks, that shows up if a VPN connection is in use.

created time in 12 days

issue commentvivien/i3blocks-contrib

nm-vpn fails to display my input

cc @The-King-of-Toasters

gnzlbg

comment created time in 12 days

issue openedvivien/i3blocks-contrib

nm-vpn fails to display my input

Expected behavior

I expect the only side-effect of this script to be displaying the vpn name when connected.

Actual behavior

The script does not display anything, and the adjacent wifi indicator icon disappears when it is used.

i3blocks config relevant to blocklet(s)

nm-vpn

Reproducer

The current script is:

#!/bin/sh
nmcli -t connection show --active | awk -F ':' '
/tun0/{vpn="ON"} /vpn/{name=$1}
END{if(vpn) printf("%s\n%s\n%s\n", name, vpn, "#00FF00")}'

when I execute nmcli -t connection show --active I get (note: I've modified the hashes in the middle):

MY-VPN-NAME:ab6ab8-a86ba6ab8-ab6ba9ba:vpn:wlp0s30g4
MY-WLAN-NAME:bf323-23-5ff-sd-4-254:802-11-wireless:wlp0s30g4
vpn0:4282484-22482-424824:tun:vpn0

The script looks for tun0, which does not appear in my input, and therefore, nothing happens. The script probably makes an assumption that just does not hold in my system.

I expected the script to show MY-VPN-NAME as the active vpn.

created time in 12 days

issue commentregolith-linux/regolith-desktop

Can't enable VPN

This fix did not work for me. I added:

exec --no-startup-id /usr/bin/nm-applet

to ~/.config/regolith/i3/config (which did not exist - that's the only line this file has in my system), and when restarting the computer, I get the following error:

i3 Error: Status_command not found (exit 127)

The desktop then becomes unusable, and typing super+enter does not open a terminal, and that there is no other way to open one. I had to restart again and log into the Ubuntu desktop, to be able to remove that line.

This is something that should "just work", and not require the user to meddle with configuration files, so IMO this issue should be reopened.

JohnGebbie

comment created time in 12 days

issue openedsyl20bnr/spacemacs

What does the osx layer do on non-Apple operating systems?

I have a single .spacemacs cfg that I do share across Linux, windows, and OSX machines (amongst others). I need to maintain a fork of this config for OSX to be able to add the osx layer.

Would it be possible to guarantee that the osx layer only has any effects if the OS is actually MacOSX?

created time in 15 days

push eventrust-lang/packed_simd

Travis CI User

commit sha cafbaa92872f5a443cd2552a69e7d07c2fccb1b0

Update documentation

view details

push time in 17 days

issue commentmozilla/neqo

Rewrite to avoid usage of SliceDeque crate

This was reported to slice deque a couple of months ago, and support for OpenBSD was added immediately :/

agrover

comment created time in 19 days

issue openedrust-lang/rust

VecDeque::new allocates

I've noticed that VecDeque::new allocates. We don't guarantee that it doesn't, but this felt inconsistent with, e.g., Vec::new, for which we do guarantee that it does not allocate.

I'm not sure what the appropriate venue is for discussing this, but I think it would make sense to remove the surprise here and guarantee that VecDeque::new does not heap allocate either.

cc @rust-lang/libs

created time in 19 days

issue commentmozilla/neqo

Rewrite to avoid usage of SliceDeque crate

The commit message mentions that slice_deque does not work on FreeBSD, but I use it on FreeBSD regularly. What was the reason for the removal?

agrover

comment created time in 20 days

pull request commentrust-lang/libc

Fix CI

@bors: r+

vickenty

comment created time in 20 days

issue commentgnzlbg/bitintr

counts/offsets shouldn't reuse the `Self` type from the trait

Thank you for the issue. I'm a bit low on time, but will keep this in mind for the next time I'll do a maintenance pass over the library. PRs are obviously welcome in case somebody wants to fix this.

A future version of this library will probably only expose "software fallbacks" for the core::arch intrinsics, leaving the decision of when to use the core::arch intrinsic or the software fallback up to the user, so it might be worth it for you to consider to just start doing that already in your application.

We also have an RFC that's about to get merged in the pipeline (target-feature 1.1 RFC) that will more clearly express what is exactly unsafe about the core::arch intrinsics.

This crate predates core::arch and its design by a couple of years (although what was learned here certainly influenced its design). Right now, I'm more of the opinion that without a proper effect-system for target-features and a way to be generic about them this crate cannot really make a meaningful decision about how to generate code for these intrinsics (e.g. should the safe wrappers do compile-time feature detection? run-time feature detection? no feature detection at all because the wrapper is used in an execution path that always has the feature enabled? etc.). While we are slowly making strides towards a good effect system for Rust (e.g. for handling const-effects, see RFC2632), a really a good effect system for Rust is unfortunately still a couple of PhD thesis away at this point (see rust-effects/target-feature-1.1).

ekmett

comment created time in 20 days

push eventrust-lang/packed_simd

Travis CI User

commit sha edeefb10e1811dde40edaa2b418b2695890ed7ae

Update documentation

view details

push time in 24 days

push eventrust-lang/packed_simd

Travis CI User

commit sha 733ab3a612fda50a1c6a05b7bcf0c614112e6096

Update documentation

view details

push time in a month

pull request commentrust-lang/rfcs

target_feature 1.1

Yes, I am available to mentor the implementation work if someone wants to go ahead and start it (just ping me in Zulip). I might become more available in February/March and might be able to do this myself but I cannot commit to that right now.

gnzlbg

comment created time in a month

issue commentrayon-rs/rayon

Prefix scans

A different way to implement this would be to use decoupled look-back: https://research.nvidia.com/sites/default/files/pubs/2016-03_Single-pass-Parallel-Prefix/nvr-2016-002.pdf

jaupe

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.+

Since you are right that this is a potential footgun, we could add a nounwind flag and make it mandatory (it's a compile error if you don't specify it). At least until LLVM adds support for unwind from inline asm.

Sounds good to me.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.++The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.++These flag registers which must be preserved if `preserves_flags` is set:+- x86+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).+  - Direction flag in `EFLAGS` (DF).+  - Floating-point status word (all).+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).+- ARM+  - Condition flags in `CPSR` (N, Z, C, V)+  - Saturation flag in `CPSR` (Q)+  - Greater than or equal flags in `CPSR` (GE).+  - Condition flags in `FPSCR` (N, Z, C, V)+  - Saturation flag in `FPSCR` (QC)+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).+- AArch64+  - Condition flags (`NZCV` register).+  - Floating-point status (`FPSR` register).+- RISC-V+  - Floating-point exception flags in `fcsr` (`fflags`).++> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.++## Mapping to LLVM IR++The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):++* `in(reg)` => `r`+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)++If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].++As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:++* Register classes are mapped to the appropriate constraint code as per the table above.+* `imm` operands are formatted and injected directly into the asm string.+* `sym` is mapped to `s` for statics and `X` for functions.+* a register name `r1` is mapped to `{r1}`+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)+* If the `preserves_flags` flag is not set then the following are added to the clobber list:+  - (x86) `~{dirflag}~{flags}~{fpsr}`+  - (ARM/AArch64) `~{cc}`++For some operand types, we will automatically insert some modifiers into the template string.+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.++Additionally, the following attributes are added to the LLVM `asm` statement:++* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).

Aborting would require additional code around every asm block,

Can't we just use landing pads, just like we do for #[unwind(abort)]? That does increase code-size, and code size can impact execution time, but beyond this particular impact, the non-exceptional path should be fast.

We could potentially support an unwind attribute in the future, if LLVM adds such support.

This would mean that some footguns are optin (nomem, nostack, pure, readonly, ...) while others are opt-out (unwind), and that feels inconsistent.

Amanieu

comment created time in a month

issue commentrust-lang/wg-allocators

Rename `Alloc` to `AllocRef`

cc @rust-lang/libs

TimDiekmann

comment created time in a month

push eventrust-lang/packed_simd

Travis CI User

commit sha 2811910f63659f7ec98dc6a52a6d583bf4e38169

Update documentation

view details

push time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.++The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.++These flag registers which must be preserved if `preserves_flags` is set:+- x86+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).+  - Direction flag in `EFLAGS` (DF).+  - Floating-point status word (all).+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).+- ARM+  - Condition flags in `CPSR` (N, Z, C, V)+  - Saturation flag in `CPSR` (Q)+  - Greater than or equal flags in `CPSR` (GE).+  - Condition flags in `FPSCR` (N, Z, C, V)+  - Saturation flag in `FPSCR` (QC)+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).+- AArch64+  - Condition flags (`NZCV` register).+  - Floating-point status (`FPSR` register).+- RISC-V+  - Floating-point exception flags in `fcsr` (`fflags`).++> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.++## Mapping to LLVM IR++The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):++* `in(reg)` => `r`+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)++If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].++As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:++* Register classes are mapped to the appropriate constraint code as per the table above.+* `imm` operands are formatted and injected directly into the asm string.+* `sym` is mapped to `s` for statics and `X` for functions.+* a register name `r1` is mapped to `{r1}`+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)+* If the `preserves_flags` flag is not set then the following are added to the clobber list:+  - (x86) `~{dirflag}~{flags}~{fpsr}`+  - (ARM/AArch64) `~{cc}`++For some operand types, we will automatically insert some modifiers into the template string.+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.++Additionally, the following attributes are added to the LLVM `asm` statement:++* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).

This seems like a big footgun, e.g., if an asm! block calls a function that unwinds.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.+

Am I correct to assume that all asm! blocks can unwind if -C panic=unwind ? If so, I think it would be nice to have a nounwind flag here as well, such that if the asm! blocks unwinds, either the behavior is undefined, or the program terminates cc @rust-lang/wg-ffi-unwind .

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(i, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+guaranteed to be written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second implicit operand is the `eax` register which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code.++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as raw identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |

@Amanieu isn't that target-dependent? e.g. on RV128 you should be able to use i128 as a register just fine. OTOH, x86_64 does not have arithmetic registers for 128-bit integers and stores i128 in two 64-bit registers instead, so there you need to split it when passing it to/from an assembly block (e.g. as the result of mulxq, which "returns" two 64-bit integers).

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.

Notice that if you mark the input as a clobber, then it is ok to clobber the input. Arguably, there isn't much point in clobbering an input before reading it and one might as well just make it an output, but... this feels more like a logic error than UB to me.

However, if you don't mark the input as a clobber, then I'd say that it is illegal for the asm! block to modify it as a side-effect.

I'm not sure if this is the case within the asm! block itself, e.g., if the block spills the input to the stack, reuses the input register, and then pops the input back from the stack into the input register before leaving the asm! block. In that case, the block does not have modifying the input as a side-effect, and that might be ok.

I think it is up to us to say whether that's ok or not, and that the RFC should be more clear about this.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.++The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.++These flag registers which must be preserved if `preserves_flags` is set:+- x86+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).+  - Direction flag in `EFLAGS` (DF).+  - Floating-point status word (all).+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).+- ARM+  - Condition flags in `CPSR` (N, Z, C, V)+  - Saturation flag in `CPSR` (Q)+  - Greater than or equal flags in `CPSR` (GE).+  - Condition flags in `FPSCR` (N, Z, C, V)+  - Saturation flag in `FPSCR` (QC)+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).+- AArch64+  - Condition flags (`NZCV` register).+  - Floating-point status (`FPSR` register).+- RISC-V+  - Floating-point exception flags in `fcsr` (`fflags`).++> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.++## Mapping to LLVM IR++The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):++* `in(reg)` => `r`+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)++If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].++As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:++* Register classes are mapped to the appropriate constraint code as per the table above.+* `imm` operands are formatted and injected directly into the asm string.+* `sym` is mapped to `s` for statics and `X` for functions.+* a register name `r1` is mapped to `{r1}`+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)+* If the `preserves_flags` flag is not set then the following are added to the clobber list:+  - (x86) `~{dirflag}~{flags}~{fpsr}`+  - (ARM/AArch64) `~{cc}`++For some operand types, we will automatically insert some modifiers into the template string.+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.++Additionally, the following attributes are added to the LLVM `asm` statement:++* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).+* If the `nomem` flag is set then the `readnone` attribute is added to the LLVM `asm` statement.+* If the `readonly` flag is set then the `readonly` attribute is added to the LLVM `asm` statement.+* If the `pure` flag is not set then the `sideffect` flag is added the LLVM `asm` statement.+* If the `nostack` flag is not set then the `alignstack` flag is added the LLVM `asm` statement.+* On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.++If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted after the asm invocation.++> Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.++[llvm-constraint]: http://llvm.org/docs/LangRef.html#supported-constraint-code-list+[llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints+[issue-65452]: https://github.com/rust-lang/rust/issues/65452++# Drawbacks+[drawbacks]: #drawbacks++## Unfamiliarity++This RFC proposes a completely new inline assembly format.+It is not possible to just copy examples of GCC-style inline assembly and re-use them.+There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.++Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.++The `cpuid` example above would look like this in GCC-sytle inline assembly:++```C+// GCC doesn't allow directly clobbering an input, we need+// to use a dummy output instead.+int ebx, ecx, discard;+asm (+    "cpuid"+    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs+    : "a"(4), "c"(0) // inputs+    : "edx" // clobbers+);+printf("L1 Cache: %i\n", ((ebx >> 22) + 1)+    * (((ebx >> 12) & 0x3ff) + 1)+    * ((ebx & 0xfff) + 1)+    * (ecx + 1));+```++## Limited set of operand types++The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.++We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.++If we discover that there is a demand for a new register class or special operand type, we can always add it later.++## Difficulty of support++Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]).++However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Take the following (AArch64) asm block as an example:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    asm!("<some asm code>", inout(reg) a, in("x0") b, out("x20") c);+    (a, c)+}+```++This could be expanded to an external asm file with the following contents:++```+# Function prefix directives+.section ".text.foo_inline_asm"+.globl foo_inline_asm+.p2align 2+.type foo_inline_asm, @function+foo_inline_asm:++// If necessary, save callee-saved registers to the stack here.+str x20, [sp, #-16]!++// Move the pointer to the argument out of the way since x0 is used.+mov x1, x0++// Load inputs values+ldr w2, [x1, #0]+ldr w0, [x1, #4]++<some asm code>++// Store output values+str w2, [x1, #0]+str w20, [x1, #8]++// If necessary, restore callee-saved registers here.+ldr x20, [sp], #16++ret++# Function suffix directives+.size foo_inline_asm, . - foo_inline_asm+```++And the following Rust code:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    {+        #[repr(C)]+        struct foo_inline_asm_args {+            a: i32,+            b: i32,+            c: i32,+        }+        extern "C" {+            fn foo_inline_asm(args: *mut foo_inline_asm_args);+        }+        let mut args = foo_inline_asm_args {+            a: a,+            b: b,+            c: mem::uninitialized(),+        };+        foo_inline_asm(&mut args);+        a = args.a;+        c = args.c;+    }+    (a, c)+}+```++[cranelift]: https://cranelift.readthedocs.io/+[cranelift-asm]: https://github.com/bytecodealliance/cranelift/issues/444+

Duh, I failed at this too.

FWIW, @lokathor is much better at reading than me, and it is often the case that I miss something that's literally written "as is" in the text, and @lokathor just points me to it 😆

So I actually was searching the whole RFC for this thinking "not again..." 😆

Amanieu

comment created time in a month

issue commentrust-lang/libc

Documentation gap surrounding `errno`

This crate exposes Raw FFI bindings, nothing more, nothing less. If that's unclear, PRs welcome.

jimrandomh

comment created time in a month

issue commentrust-lang/libc

Rust Roadmap: libc v1.0 release

If somebody wants to start working on this, I can mentor. A good step would be to create a new crate in this repo for a particular platform, add that as a dependency of libc (for backwards compatibility, at least for the moment), and start moving APIs towards that crate.

coder543

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.++The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.++These flag registers which must be preserved if `preserves_flags` is set:+- x86+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).+  - Direction flag in `EFLAGS` (DF).+  - Floating-point status word (all).+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).+- ARM+  - Condition flags in `CPSR` (N, Z, C, V)+  - Saturation flag in `CPSR` (Q)+  - Greater than or equal flags in `CPSR` (GE).+  - Condition flags in `FPSCR` (N, Z, C, V)+  - Saturation flag in `FPSCR` (QC)+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).+- AArch64+  - Condition flags (`NZCV` register).+  - Floating-point status (`FPSR` register).+- RISC-V+  - Floating-point exception flags in `fcsr` (`fflags`).++> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.++## Mapping to LLVM IR++The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):++* `in(reg)` => `r`+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)++If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].++As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:++* Register classes are mapped to the appropriate constraint code as per the table above.+* `imm` operands are formatted and injected directly into the asm string.+* `sym` is mapped to `s` for statics and `X` for functions.+* a register name `r1` is mapped to `{r1}`+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)+* If the `preserves_flags` flag is not set then the following are added to the clobber list:+  - (x86) `~{dirflag}~{flags}~{fpsr}`+  - (ARM/AArch64) `~{cc}`++For some operand types, we will automatically insert some modifiers into the template string.+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.++Additionally, the following attributes are added to the LLVM `asm` statement:++* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).+* If the `nomem` flag is set then the `readnone` attribute is added to the LLVM `asm` statement.+* If the `readonly` flag is set then the `readonly` attribute is added to the LLVM `asm` statement.+* If the `pure` flag is not set then the `sideffect` flag is added the LLVM `asm` statement.+* If the `nostack` flag is not set then the `alignstack` flag is added the LLVM `asm` statement.+* On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.++If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted after the asm invocation.++> Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.++[llvm-constraint]: http://llvm.org/docs/LangRef.html#supported-constraint-code-list+[llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints+[issue-65452]: https://github.com/rust-lang/rust/issues/65452++# Drawbacks+[drawbacks]: #drawbacks++## Unfamiliarity++This RFC proposes a completely new inline assembly format.+It is not possible to just copy examples of GCC-style inline assembly and re-use them.+There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.++Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.++The `cpuid` example above would look like this in GCC-sytle inline assembly:++```C+// GCC doesn't allow directly clobbering an input, we need+// to use a dummy output instead.+int ebx, ecx, discard;+asm (+    "cpuid"+    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs+    : "a"(4), "c"(0) // inputs+    : "edx" // clobbers+);+printf("L1 Cache: %i\n", ((ebx >> 22) + 1)+    * (((ebx >> 12) & 0x3ff) + 1)+    * ((ebx & 0xfff) + 1)+    * (ecx + 1));+```++## Limited set of operand types++The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.++We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.++If we discover that there is a demand for a new register class or special operand type, we can always add it later.++## Difficulty of support++Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]).++However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Take the following (AArch64) asm block as an example:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    asm!("<some asm code>", inout(reg) a, in("x0") b, out("x20") c);+    (a, c)+}+```++This could be expanded to an external asm file with the following contents:++```+# Function prefix directives+.section ".text.foo_inline_asm"+.globl foo_inline_asm+.p2align 2+.type foo_inline_asm, @function+foo_inline_asm:++// If necessary, save callee-saved registers to the stack here.+str x20, [sp, #-16]!++// Move the pointer to the argument out of the way since x0 is used.+mov x1, x0++// Load inputs values+ldr w2, [x1, #0]+ldr w0, [x1, #4]++<some asm code>++// Store output values+str w2, [x1, #0]+str w20, [x1, #8]++// If necessary, restore callee-saved registers here.+ldr x20, [sp], #16++ret++# Function suffix directives+.size foo_inline_asm, . - foo_inline_asm+```++And the following Rust code:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    {+        #[repr(C)]+        struct foo_inline_asm_args {+            a: i32,+            b: i32,+            c: i32,+        }+        extern "C" {+            fn foo_inline_asm(args: *mut foo_inline_asm_args);+        }+        let mut args = foo_inline_asm_args {+            a: a,+            b: b,+            c: mem::uninitialized(),+        };+        foo_inline_asm(&mut args);+        a = args.a;+        c = args.c;+    }+    (a, c)+}+```++[cranelift]: https://cranelift.readthedocs.io/+[cranelift-asm]: https://github.com/bytecodealliance/cranelift/issues/444+

Where does the RFC say that? (I've searched for that info a couple of times and can't find it, can you provide a link to the actual line?).

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).

It might be worth clarifying that "accessible" here means "accessible according to Rust's aliasing rules". I.e., this is not allowed to perform any kind of access that unsafe Rust wouldn't also be permitted to do.

@RalfJung When reading or modifying inputs or outputs, which are necessarily visible to Rust, then I agree.

But in general, I'm not sure I agree: an asm! block might read or write to global state not accessible to the Rust program in an unsynchronized way, causing a data-race. In Rust, the behavior of data-races is undefined, but on hardware, this isn't necessarily the case.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code.++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags

Currently the flags/options control the "effects" that the assembly block has, so we can also call it effects(nomem, nostack, noflags, noregs) to state that the assembly block does not have effects that modify "x" - where pure and readonly are shorthands for some of the effects.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.

@RalfJung

For example, I'd say that the motivation for explicit declaration of clobbered registers is that assembly may not access (read nor write) any register not declared in its inputs and outputs. So, the cpuid example without the lateout("eax") _, lateout("edx") _ is UB due to violating those constraints.

I think this goes in the right direction. I'll reword this as "An assembly word shall only have specified side-effects, i.e., if an assembly block has a side-effect that's not specified, the behavior is undefined".

This subtly different definition means that an assembly block can actually read from almost any register, including those that are not specified as inputs (e.g. like flags registers), because doing so is not a side-effect. I say almost any, because reading from some registers can be a side-effect (i.e. can modify the value of that or other registers).

The same is true for writes, e.g., on RISC-V, the x0 register is always zero, so you can write to it even if it is not declared as an input or a clobber, and doing so is ok because that is not a side-effect.

What I think might be a bit confusing is the split between the declaration of "additive" and "negative" side-effects in the asm! block . For example, preserves_flags is a "negative" side-effect, stating that this assembly block does not modify (some) flag registers as a side-effect. However, lateout(foo) in the "clobbers" section states the additive side-effect that this asm! block might modify the content of the memory at foo.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.++The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.++These flag registers which must be preserved if `preserves_flags` is set:+- x86+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).+  - Direction flag in `EFLAGS` (DF).+  - Floating-point status word (all).+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).+- ARM+  - Condition flags in `CPSR` (N, Z, C, V)+  - Saturation flag in `CPSR` (Q)+  - Greater than or equal flags in `CPSR` (GE).+  - Condition flags in `FPSCR` (N, Z, C, V)+  - Saturation flag in `FPSCR` (QC)+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).+- AArch64+  - Condition flags (`NZCV` register).+  - Floating-point status (`FPSR` register).+- RISC-V+  - Floating-point exception flags in `fcsr` (`fflags`).++> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.++## Mapping to LLVM IR++The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):++* `in(reg)` => `r`+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)++If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].++As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:++* Register classes are mapped to the appropriate constraint code as per the table above.+* `imm` operands are formatted and injected directly into the asm string.+* `sym` is mapped to `s` for statics and `X` for functions.+* a register name `r1` is mapped to `{r1}`+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)+* If the `preserves_flags` flag is not set then the following are added to the clobber list:+  - (x86) `~{dirflag}~{flags}~{fpsr}`+  - (ARM/AArch64) `~{cc}`++For some operand types, we will automatically insert some modifiers into the template string.+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.++Additionally, the following attributes are added to the LLVM `asm` statement:++* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).+* If the `nomem` flag is set then the `readnone` attribute is added to the LLVM `asm` statement.+* If the `readonly` flag is set then the `readonly` attribute is added to the LLVM `asm` statement.+* If the `pure` flag is not set then the `sideffect` flag is added the LLVM `asm` statement.+* If the `nostack` flag is not set then the `alignstack` flag is added the LLVM `asm` statement.+* On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.++If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted after the asm invocation.++> Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.++[llvm-constraint]: http://llvm.org/docs/LangRef.html#supported-constraint-code-list+[llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints+[issue-65452]: https://github.com/rust-lang/rust/issues/65452++# Drawbacks+[drawbacks]: #drawbacks++## Unfamiliarity++This RFC proposes a completely new inline assembly format.+It is not possible to just copy examples of GCC-style inline assembly and re-use them.+There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.++Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.++The `cpuid` example above would look like this in GCC-sytle inline assembly:++```C+// GCC doesn't allow directly clobbering an input, we need+// to use a dummy output instead.+int ebx, ecx, discard;+asm (+    "cpuid"+    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs+    : "a"(4), "c"(0) // inputs+    : "edx" // clobbers+);+printf("L1 Cache: %i\n", ((ebx >> 22) + 1)+    * (((ebx >> 12) & 0x3ff) + 1)+    * ((ebx & 0xfff) + 1)+    * (ecx + 1));+```++## Limited set of operand types++The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.++We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.++If we discover that there is a demand for a new register class or special operand type, we can always add it later.++## Difficulty of support++Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]).++However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Take the following (AArch64) asm block as an example:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    asm!("<some asm code>", inout(reg) a, in("x0") b, out("x20") c);+    (a, c)+}+```++This could be expanded to an external asm file with the following contents:++```+# Function prefix directives+.section ".text.foo_inline_asm"+.globl foo_inline_asm+.p2align 2+.type foo_inline_asm, @function+foo_inline_asm:++// If necessary, save callee-saved registers to the stack here.+str x20, [sp, #-16]!++// Move the pointer to the argument out of the way since x0 is used.+mov x1, x0++// Load inputs values+ldr w2, [x1, #0]+ldr w0, [x1, #4]++<some asm code>++// Store output values+str w2, [x1, #0]+str w20, [x1, #8]++// If necessary, restore callee-saved registers here.+ldr x20, [sp], #16++ret++# Function suffix directives+.size foo_inline_asm, . - foo_inline_asm+```++And the following Rust code:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    {+        #[repr(C)]+        struct foo_inline_asm_args {+            a: i32,+            b: i32,+            c: i32,+        }+        extern "C" {+            fn foo_inline_asm(args: *mut foo_inline_asm_args);+        }+        let mut args = foo_inline_asm_args {+            a: a,+            b: b,+            c: mem::uninitialized(),+        };+        foo_inline_asm(&mut args);+        a = args.a;+        c = args.c;+    }+    (a, c)+}+```++[cranelift]: https://cranelift.readthedocs.io/+[cranelift-asm]: https://github.com/bytecodealliance/cranelift/issues/444+

Where did you read that ?

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.

Is out-of-bounds access within an asm block even considered undefined behavior

No, this is not undefined behavior, e.g.,

let ptr: *mut f32 = GlobalAlloc::alloc(Layout::new(8, 16)?) as *mut _;
assert!(ptr as usize % 16 == 0); // aligned to a 16 byte boundary
let mut arr = [MaybeUninit<f32>; 4];
asm!("
        simd_16byte_load xmm0, {0}
        simd_16byte_write {1}, xmm0
    ", 
    in(ptr), out(&arr));

allocates 8 bytes on the heap at a 16 byte aligned boundary, and then uses a SIMD load to read 16 byte from the 8-byte wide allocation, reading the last 8-bytes out-of-bounds within the assembly block, and that's ok.

What's UB is an asm! block that has a side-effect but does not state it, e.g., an asm! block that modifies a register, but does not mark it as being an output or a clobber, since the compiler will generate code under the assumption that this register is not modified, yet that assumption would be incorrect.

Amanieu

comment created time in a month

Pull request review commentrust-lang/rfcs

Inline assembly

+- Feature Name: `asm`+- Start Date: (fill me in with today's date, YYYY-MM-DD)+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.++The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.++The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.++[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843++# Motivation+[motivation]: #motivation++In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.++The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.++A collection of use cases for inline asm can be found in [this repository][catalogue].++[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++Rust provides support for inline assembly via the `asm!` macro.+It can be used to embed handwritten assembly in the assembly output generated by the compiler.+Generally this should not be necessary, but might be where the required performance or timing+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.++> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.++## Basic usage++Let us start with the simplest possible example:++```rust+unsafe {+    asm!("nop");+}+```++This will insert a NOP (no operation) instruction into the assembly generated by the compiler.+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert+arbitrary instructions and break various invariants. The instructions to be inserted are listed+in the first argument of the `asm!` macro as a string literal.++## Inputs and outputs++Now inserting an instruction that does nothing is rather boring. Let us do something that+actually acts on data:++```rust+let x: u32;+unsafe {+    asm!("mov {}, 5", out(reg) x);+}+assert_eq!(x, 5);+```++This will write the value `5` into the `u32` variable `x`.+You can see that the string literal we use to specify instructions is actually a template string.+It is governed by the same rules as Rust [format strings][format-syntax].+The arguments that are inserted into the template however look a bit different then you may+be familiar with. First we need to specify if the variable is an input or an output of the+inline assembly. In this case it is an output. We declared this by writing `out`.+We also need to specify in what kind of register the assembly expects the variable.+In this case we put it in an arbitrary general purpose register by specifying `reg`.+The compiler will choose an appropriate register to insert into+the template and will read the variable from there after the inline assembly finishes executing.++Let see another example that also uses an input:++```rust+let i: u32 = 3;+let o: u32;+unsafe {+    asm!("+        mov {0}, {1}+        add {0}, {number}+    ", out(reg) o, in(reg) i, number = imm 5);+}+assert_eq!(o, 8);+```++This will add `5` to the input in variable `i` and write the result to variable `o`.+The particular way this assembly does this is first copying the value from `i` to the output,+and then adding `5` to it.++The example shows a few things:++First we can see that inputs are declared by writing `in` instead of `out`.++Second one of our operands has a type we haven't seen yet, `imm`.+This tells the compiler to expand this argument to an immediate inside the assembly template.+This is only possible for constants and literals.++Third we can see that we can specify an argument number, or name as in any format string.+For inline assembly templates this is particularly useful as arguments are often used more than once.+For more complex inline assembly using this facility is generally recommended, as it improves+readability, and allows reordering instructions without changing the argument order.++We can further refine the above example to avoid the `mov` instruction:++```rust+let mut x: u32 = 3;+unsafe {+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);+}+assert_eq!(x, 8);+```++We can see that `inout` is used to specify an argument that is both input and output.+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.++It is also possible to specify different variables for the input and output parts of an `inout` operand:++```rust+let x: u32 = 3;+let y: u32;+unsafe {+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);+}+assert_eq!(y, 8);+```++## Late output operands++The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`+can be written at any time, and can therefore not share its location with any other argument.+However, to guarantee optimal performance it is important to use as few registers as possible,+so they won't have to be saved and reloaded around the inline assembly block.+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is+written only after all inputs have been consumed.+There is also a `inlateout` variant of this specifier.++Here is an example where `inlateout` *cannot* be used:++```rust+let mut a = 4;+let b = 4;+let c = 4;+unsafe {+    asm!("+        add {0}, {1}+        add {0}, {2}+    ", inout(reg) a, in(reg) b, in(reg) c);+}+assert_eq!(a, 12);+```++Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.++However the following example can use `inlateout` since the output is only modified after all input registers have been read:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);+}+assert_eq!(a, 8);+```++As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.++## Explicit register operands++Some instructions require that the operands be in a specific register.+Therefore, Rust inline assembly provides some more specific constraint specifiers.+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`+among others can be addressed by their name.++```rust+unsafe {+    asm!("out 0x64, {}", in("eax") cmd);+}+```++In this example we call the `out` instruction to output the content of the `cmd` variable+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand+we had to use the `eax` constraint specifier.++It is somewhat common that instructions have operands that are not explicitly listed in the+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:++```rust+fn mul(a: u32, b: u32) -> u64 {+    let lo: u32;+    let hi: u32;++    unsafe {+        asm!(+            // The x86 mul instruction takes eax as an implicit input and writes+            // the 64-bit result of the multiplication to eax:edx.+            "mul {}",+            in(reg) a, in("eax") b,+            lateout("eax") lo, lateout("edx") hi+        );+    }++    hi as u64 << 32 + lo as u64+}+```++This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.+The only explicit operand is a register, that we fill from the variable `a`.+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.++Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.++## Clobbered registers++In many cases inline assembly will modify state that is not needed as an output.+Usually this is either because we have to use a scratch register in the assembly,+or instructions modify state that we don't need to further examine.+This state is generally referred to as being "clobbered".+We need to tell the compiler about this since it may need to save and restore this state+around the inline assembly block.++```rust+let ebx: u32;+let ecx: u32;++unsafe {+    asm!(+        "cpuid",+        in("eax") 4, in("ecx") 0,+        lateout("ebx") ebx, lateout("ecx") ecx,+        lateout("eax") _, lateout("edx") _+    );+}++println!(+    "L1 Cache: {}",+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)+);+```++In the example above we use the `cpuid` instruction to get the L1 cache size.+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.++However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.++This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:++```rust+// Multiply x by 6 using shifts and adds+let mut x = 4;+unsafe {+    asm!("+        mov {tmp}, {x}+        shl {tmp}, 1+        shl {x}, 2+        add {x}, {tmp}+    ", x = inout(reg) x, tmp = out(reg) _);+}+assert_eq!(x, 4 * 6);+```++## Register template modifiers++In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).++```rust+let mut x: u16 = 0xab;++unsafe {+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);+}++assert_eq!(x, 0xabab);+```++In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.++Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.++## Flags++By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.++Let's take our previous example of an `add` instruction:++```rust+let mut a = 4;+let b = 4;+unsafe {+    asm!(+        "add {0}, {1}",+        inlateout(reg) a, in(reg) b,+        flags(pure, nomem, nostack)+    );+}+assert_eq!(a, 8);+```++Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.++These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.++See the reference for the full list of available flags and their effects.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++Inline assembler is implemented as an unsafe macro `asm!()`.+The first argument to this macro is a template string literal used to build the final assembly.+The following arguments specify input and output operands.+When required, flags are specified as the final argument.++The following ABNF specifies the general syntax:++```+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"+reg_spec := <arch specific register class> / "<arch specific register name>"+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"+reg_operand := dir_spec "(" reg_spec ")" operand_expr+operand := reg_operand / "imm" const_expr / "sym" path+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"+flags := "flags(" flag *["," flag] ")"+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"+```++[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax++## Template string++The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.++The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.++This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.++However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.++The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.++[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795++## Operand type++Several types of operands are supported:++* `in(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).+* `out(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain an unknown value at the start of the asm code.+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+* `lateout(<reg>) <expr>`+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.+* `inout(<reg>) <expr>`+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.+  - The allocated register will contain the value of `<expr>` at the start of the asm code.+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.+* `inout(<reg>) <in expr> => <out expr>`+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).+  - `<in expr>` and `<out expr>` may have different types.+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.+* `imm <expr>`+  - `<expr>` must be an integer or floating-point constant expression.+  - The value of the expression is formatted as a string and substituted directly into the asm template string.+* `sym <path>`+  - `<path>` must refer to a `fn` or `static` defined in the current crate.+  - A mangled symbol name referring to the item is substituted into the asm template string.+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).++## Register operands++Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).++Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.++Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.++If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.++Here is the list of currently supported register classes:++| Architecture | Register class | Registers | LLVM constraint code | Allowed types |+| ------------ | -------------- | --------- | ----- | ------------- |+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |++> Notes on allowed types:+> - Pointers and references are allowed where the equivalent integer type is allowed.+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.+> - Fat pointers are not allowed.+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.++Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).++Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:++| Architecture | Base register | Aliases |+| ------------ | ------------- | ------- |+| x86 | `ax` | `al`, `eax`, `rax` |+| x86 | `bx` | `bl`, `ebx`, `rbx` |+| x86 | `cx` | `cl`, `ecx`, `rcx` |+| x86 | `dx` | `dl`, `edx`, `rdx` |+| x86 | `si` | `sil`, `esi`, `rsi` |+| x86 | `di` | `dil`, `edi`, `rdi` |+| x86 | `bp` | `bpl`, `ebp`, `rbp` |+| x86 | `sp` | `spl`, `esp`, `rsp` |+| x86 | `ip` | `eip`, `rip` |+| x86 | `st(0)` | `st` |+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |+| AArch64 | `x[0-30]` | `w[0-30]` |+| AArch64 | `x29` | `fp` |+| AArch64 | `x30` | `lr` |+| AArch64 | `sp` | `wsp` |+| AArch64 | `xzr` | `wzr` |+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |+| ARM | `r[0-3]` | `a[1-4]` |+| ARM | `r[4-9]` | `v[1-6]` |+| ARM | `r9` | `rfp` |+| ARM | `r10` | `sl` |+| ARM | `r11` | `fp` |+| ARM | `r12` | `ip` |+| ARM | `r13` | `sp` |+| ARM | `r14` | `lr` |+| ARM | `r15` | `pc` |+| RISC-V | `x0` | `zero` |+| RISC-V | `x1` | `ra` |+| RISC-V | `x2` | `sp` |+| RISC-V | `x3` | `gp` |+| RISC-V | `x4` | `tp` |+| RISC-V | `x[5-7]` | `t[0-2]` |+| RISC-V | `x8` | `fp`, `s0` |+| RISC-V | `x9` | `s1` |+| RISC-V | `x[10-17]` | `a[0-7]` |+| RISC-V | `x[18-27]` | `s[2-11]` |+| RISC-V | `x[28-31]` | `t[3-6]` |+| RISC-V | `f[0-7]` | `ft[0-7]` |+| RISC-V | `f[8-9]` | `fs[0-1]` |+| RISC-V | `f[10-17]` | `fa[0-7]` |+| RISC-V | `f[18-27]` | `fs[2-11]` |+| RISC-V | `f[28-31]` | `ft[8-11]` |++Some registers cannot be used for input or output operands:++| Architecture | Unsupported register | Reason |+| ------------ | -------------------- | ------ |+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |+| x86 | `k0` | This is a constant zero register which can't be modified. |+| x86 | `ip` | This is the program counter, not a real register. |+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |+| ARM | `pc` | This is the program counter, not a real register. |+| RISC-V | `x0` | This is a constant zero register which can't be modified. |+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |++## Template modifiers++The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.++The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].++| Architecture | Register class | Modifier | Input type | Example output |+| ------------ | -------------- | -------- | ---------- | -------------- |+| x86 | `reg` | None | `i8` | `al` |+| x86 | `reg` | None | `i16` | `ax` |+| x86 | `reg` | None | `i32` | `eax` |+| x86 | `reg` | None | `i64` | `rax` |+| x86-32 | `reg_abcd` | `b` | Any | `al` |+| x86-64 | `reg` | `b` | Any | `al` |+| x86 | `reg_abcd` | `h` | Any | `ah` |+| x86 | `reg` | `w` | Any | `ax` |+| x86 | `reg` | `k` | Any | `eax` |+| x86-64 | `reg` | `q` | Any | `rax` |+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |+| x86 (AVX-512) | `kreg` | None | Any | `k1` |+| AArch64 | `reg` | None | Any | `x0` |+| AArch64 | `reg` | `w` | Any | `w0` |+| AArch64 | `reg` | `x` | Any | `x0` |+| AArch64 | `vreg` | None | Any | `v0` |+| AArch64 | `vreg` | `b` | Any | `b0` |+| AArch64 | `vreg` | `h` | Any | `h0` |+| AArch64 | `vreg` | `s` | Any | `s0` |+| AArch64 | `vreg` | `d` | Any | `d0` |+| AArch64 | `vreg` | `q` | Any | `q0` |+| ARM | `reg` | None | Any | `r0` |+| ARM | `vreg` | None | `f32` | `s0` |+| ARM | `vreg` | None | `f64`, `v64` | `d0` |+| ARM | `vreg` | None | `v128` | `q0` |+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |+| RISC-V | `reg` | None | Any | `x1` |+| RISC-V | `vreg` | None | Any | `f0` |++> Notes:+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.++[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers++## Flags++Flags are used to further influence the behavior of the inline assembly block.+Currently the following flags are defined:+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.++The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.++These flag registers which must be preserved if `preserves_flags` is set:+- x86+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).+  - Direction flag in `EFLAGS` (DF).+  - Floating-point status word (all).+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).+- ARM+  - Condition flags in `CPSR` (N, Z, C, V)+  - Saturation flag in `CPSR` (Q)+  - Greater than or equal flags in `CPSR` (GE).+  - Condition flags in `FPSCR` (N, Z, C, V)+  - Saturation flag in `FPSCR` (QC)+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).+- AArch64+  - Condition flags (`NZCV` register).+  - Floating-point status (`FPSR` register).+- RISC-V+  - Floating-point exception flags in `fcsr` (`fflags`).++> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.++## Mapping to LLVM IR++The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):++* `in(reg)` => `r`+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)++If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].++As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:++* Register classes are mapped to the appropriate constraint code as per the table above.+* `imm` operands are formatted and injected directly into the asm string.+* `sym` is mapped to `s` for statics and `X` for functions.+* a register name `r1` is mapped to `{r1}`+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)+* If the `preserves_flags` flag is not set then the following are added to the clobber list:+  - (x86) `~{dirflag}~{flags}~{fpsr}`+  - (ARM/AArch64) `~{cc}`++For some operand types, we will automatically insert some modifiers into the template string.+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.++Additionally, the following attributes are added to the LLVM `asm` statement:++* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).+* If the `nomem` flag is set then the `readnone` attribute is added to the LLVM `asm` statement.+* If the `readonly` flag is set then the `readonly` attribute is added to the LLVM `asm` statement.+* If the `pure` flag is not set then the `sideffect` flag is added the LLVM `asm` statement.+* If the `nostack` flag is not set then the `alignstack` flag is added the LLVM `asm` statement.+* On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.++If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted after the asm invocation.++> Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.++[llvm-constraint]: http://llvm.org/docs/LangRef.html#supported-constraint-code-list+[llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints+[issue-65452]: https://github.com/rust-lang/rust/issues/65452++# Drawbacks+[drawbacks]: #drawbacks++## Unfamiliarity++This RFC proposes a completely new inline assembly format.+It is not possible to just copy examples of GCC-style inline assembly and re-use them.+There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.++Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.++The `cpuid` example above would look like this in GCC-sytle inline assembly:++```C+// GCC doesn't allow directly clobbering an input, we need+// to use a dummy output instead.+int ebx, ecx, discard;+asm (+    "cpuid"+    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs+    : "a"(4), "c"(0) // inputs+    : "edx" // clobbers+);+printf("L1 Cache: %i\n", ((ebx >> 22) + 1)+    * (((ebx >> 12) & 0x3ff) + 1)+    * ((ebx & 0xfff) + 1)+    * (ecx + 1));+```++## Limited set of operand types++The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.++We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.++If we discover that there is a demand for a new register class or special operand type, we can always add it later.++## Difficulty of support++Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]).++However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Take the following (AArch64) asm block as an example:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    asm!("<some asm code>", inout(reg) a, in("x0") b, out("x20") c);+    (a, c)+}+```++This could be expanded to an external asm file with the following contents:++```+# Function prefix directives+.section ".text.foo_inline_asm"+.globl foo_inline_asm+.p2align 2+.type foo_inline_asm, @function+foo_inline_asm:++// If necessary, save callee-saved registers to the stack here.+str x20, [sp, #-16]!++// Move the pointer to the argument out of the way since x0 is used.+mov x1, x0++// Load inputs values+ldr w2, [x1, #0]+ldr w0, [x1, #4]++<some asm code>++// Store output values+str w2, [x1, #0]+str w20, [x1, #8]++// If necessary, restore callee-saved registers here.+ldr x20, [sp], #16++ret++# Function suffix directives+.size foo_inline_asm, . - foo_inline_asm+```++And the following Rust code:++```rust+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)+{+    let c;+    {+        #[repr(C)]+        struct foo_inline_asm_args {+            a: i32,+            b: i32,+            c: i32,+        }+        extern "C" {+            fn foo_inline_asm(args: *mut foo_inline_asm_args);+        }+        let mut args = foo_inline_asm_args {+            a: a,+            b: b,+            c: mem::uninitialized(),+        };+        foo_inline_asm(&mut args);+        a = args.a;+        c = args.c;+    }+    (a, c)+}+```++[cranelift]: https://cranelift.readthedocs.io/+[cranelift-asm]: https://github.com/bytecodealliance/cranelift/issues/444+
If `rustc` does not support inline assembly for a particular target and backend, trying to use the `asm!` macro on that particular backend and target fails to compile.

For example, consider we actually do this, and add support for using asm! with Cranelift and x86_64. What happens if Cranelift implements aarch64 support, and somebody tries to compile a program for that target using that backend ?

Should we forbid trying to do that? That is, should be forbid adding rustc support for new targets until rustc supports inline assembly for them? Or should we allow that as long as the programs being compiled do not use the asm! macro ?

I think that blocking target/backend support on having inline assembly worked out imposes a too high barrier of entry for new targets. E.g., this RFC adds x86, arm, and risc-v support, but there are people using PowerPC and others on stable, which, according to this RFC, wouldn't initially gain any asm! macro support, yet we can't break that code.

Amanieu

comment created time in a month

issue commentrust-lang/wg-allocators

Decision making in the WG

cc @amanieu - they also participated in some of the discussions.

SimonSapin

comment created time in a month

issue commentrust-lang/rust

Imprecise floating point operations (fast-math)

You mention three issues in that comment:

  • lack of literal support: this is an orthogonal problem that should be solved by "user-defined literals".
  • lack of as support: From, TryFrom, round, and similar APIs are usually (always?) clearer than as,
  • bad performance: sounds like a compiler bug, did you report it ? Looking at your code, your ffast_math::fff type is repr(C) but should be repr(transparent) - notice that repr(C) inhibits the Scalar and ScalarPair repr(Rust) optimizations, forcing the ffast_math::fff type to use the Aggregate ABI class.

So from the issues you mention, the one I think has most weight is the lack of "user-defined literals". I don't think one needs to solve this problem to ship a fast-math feature. Today, we would need to write let x = NonNan(1.0);. That isn't bad enough to be a blocker IMO, considering that with a "user-defined literals" feature we might need to write let x = 1.0_NonNan_f64; or similar.

mpdn

comment created time in a month

issue commentrust-lang/rust

Imprecise floating point operations (fast-math)

@lxrec a perma-unstable feature is enough to allow those interested in developing a stable-rust solution to prototype one in nightly, since as mentioned, it allows people to implement types like fastf32, type wrappers like NonNan<T>, and even proc macros like #[fast_math(...)].

That would allow those interested in "fast-math" to explore the design space without having to hack on the compiler, and to submit RFCs that can be tried, since library APIs using unstable features can be exposed from libcore to stable Rust users, without having to stabilize the "perma-unstable" features themselves.

Unless we think all libraries interested in fast-math are okay with being perma-unstable libraries.

I have yet to see anybody expressing desire for this goal.

mpdn

comment created time in a month

issue commentrust-lang/wg-allocators

Change the signatures of `AllocRef` to take `self` instead of `&mut self`

This makes sense to me. Also:

A oneshot allocator is trivially implementable

This is cool.

TimDiekmann

comment created time in a month

issue commentrust-lang/libc

Documentation gap surrounding `errno`

This isn't true; it doesn't provide errno.

The first line of the readme says (emphasis mine):

libc - Raw FFI bindings...

One can't bind something that doesn't exist, and on most platforms, errno is not a symbol. The search you have performed ("errno") shows the APIs that each platform provides to access it, e.g., a #[thread_local] errno on some platforms, __errno_location on others, etc.

If you don't want to use "Raw" bindings, then libc is not for you.

jimrandomh

comment created time in a month

issue commentrust-lang/libc

Rust Roadmap: libc v1.0 release

Chances are libc 0.3 will be released at some point with major backward incompatible changes, like proper usage of unions, #[repr(align / packed)], etc.

I'm not sure whether libc 1.0 is something that makes even sense anymore, since many platforms supported by libc do perform backward incompatible changes to their APIs on a regular basis, and libc currently needs to adapt to that, which necessarily results on backward incompatible changes on our end.

A step towards that would be to start by removing all kernel APIs and non-stable APIs from the libc crate, and moving them to other crates that can follow a versioning scheme closer to the OS.

coder543

comment created time in a month

issue commentgnzlbg/slice_deque

Leaking tmp files in currect release (fixed in master)

Personally I'd prefer an explicit clone over just a dereference, since it is more obvious (if the * was missing it would have been an issue, but rustc may catch that one). But that is a style choice.

Here I would prefer to just use a type annotation to make things clearer (let mut fname: [u8; _] = *b"...";) since I agree with you that right now one has to think too much to follow that part of the code - or at least, I had to do that when reviewing your PR and I'm the one who wrote that code in the first place 😆

The changes from *mut to *const in #82 are also more explicit and arguably better, but don't affect correctness either.

I agree, we should make this change before the next release.


I think I will make these changes as part of #80 and try to merge that as soon as possible and do a new release.

Cocalus

comment created time in a month

pull request commentgnzlbg/slice_deque

Fix data race on older Linux kernels

Thanks, so I'm going to close this PR, but I still want to investigate the precise problem you are running into, so it would be great if you could just open an issue.

For this, it would be nice if you could try using the master branch, cleaning your tmp directory, and running your workload, and see if after your workload finished, you still have files in tmp. If that's the case, the next step would be to try to come up with a small, self-contained example that runs into this issue. (e.g. a single crate with only a dependency on slice_deque, and a small main function, that just does the bare minimum to trigger the issue).

Cocalus

comment created time in a month

Pull request review commentgnzlbg/slice_deque

Fix data race on older Linux kernels

 pub fn allocate_mirrored(size: usize) -> Result<*mut u8, AllocError> {         assert!(half_size % allocation_granularity() == 0);          // create temporary file-        let mut fname = *b"/tmp/slice_deque_fileXXXXXX\0";-        let mut fd: c_long =-            memfd_create(fname.as_mut_ptr() as *mut c_char, 0);+        let fname = b"/tmp/slice_deque_fileXXXXXX\0";

For example, this code in the playground compiles:

   let mut fname = *b"/tmp/slice_deque_fileXXXXXX\0";
   let mut fname2: [u8; 28] = fname;

So the type of fname in the current version of the code is indeed [u8; 28], which is a local copy on the stack of the string literal.

Cocalus

comment created time in a month

Pull request review commentgnzlbg/slice_deque

Fix data race on older Linux kernels

 pub fn allocate_mirrored(size: usize) -> Result<*mut u8, AllocError> {         assert!(half_size % allocation_granularity() == 0);          // create temporary file-        let mut fname = *b"/tmp/slice_deque_fileXXXXXX\0";-        let mut fd: c_long =-            memfd_create(fname.as_mut_ptr() as *mut c_char, 0);+        let fname = b"/tmp/slice_deque_fileXXXXXX\0";+        let mut fd: c_long = memfd_create(fname.as_ptr() as *const c_char, 0);         if fd == -1 && errno() == ENOSYS {+            // copy fname to stack variable to allow modification+            let mut fname = fname.clone();             // memfd_create is not implemented, use mkstemp instead:             fd = c_long::from(mkstemp(fname.as_mut_ptr() as *mut c_char));             // and unlink the file             if fd != -1 {-                unlink(fname.as_mut_ptr() as *mut c_char);+                unlink(fname.as_ptr() as *const c_char);

This change looks good, nice catch!

Cocalus

comment created time in a month

Pull request review commentgnzlbg/slice_deque

Fix data race on older Linux kernels

 pub fn allocate_mirrored(size: usize) -> Result<*mut u8, AllocError> {         assert!(half_size % allocation_granularity() == 0);          // create temporary file-        let mut fname = *b"/tmp/slice_deque_fileXXXXXX\0";-        let mut fd: c_long =-            memfd_create(fname.as_mut_ptr() as *mut c_char, 0);+        let fname = b"/tmp/slice_deque_fileXXXXXX\0";

If I understood the problem correctly (which I'm not sure), I think the current code might be correct, since it is just equivalent to:

        let mut fname: [u8; 29] = b"/tmp/slice_deque_fileXXXXXX\0";

Which arguably is much clearer than how the code is currently written.

Cocalus

comment created time in a month

pull request commentgnzlbg/slice_deque

Fix data race on older Linux kernels

Ok, so I remember this code. Let me work through the bug, and let's check I've understood it.

If I understood correctly, the problem is that mkstmp replaces the XXXX in the str literal, and if the str literal is allocated in static read-only memory, that's undefined behavior of the form "write through a read-only reference" (&str), correct ? Note that this form of UB happens even in single-threaded code, so this is worse than a data-race: it is a general memory unsafety issue. Is that correct?

Cocalus

comment created time in a month

pull request commentgnzlbg/slice_deque

Fix data race on older Linux kernels

Hey @Cocalus, thank you so much for the report. I just saw it, and will try to understand what the problem is first, and once I'm done I'll get back to you. I want to resolve this as quickly as possible.

Cocalus

comment created time in a month

issue commentrust-lang/rust

Tracking issue for SIMD support

@Lokathor did that work ?

I imagine that the compiler will emit two asm instructions for two loads after the call to volatile_mmio_write_to_DMA_activation(), but you are not guaranteed that those two instructions will take in total 2 cycles to complete, e.g., since there are no data-dependencies between them, and the CPU can do some instruction-level parallelism for loads, both could complete in 1 single cycle.

alexcrichton

comment created time in a month

issue commentseanmonstar/reqwest

Deadlock on drop of reqwest::blocking::Response sometimes

Sure, debugging deadlocks is hard.

I just meant to say that exposing such a feature would:

  • allow users that want to protect themselves against them and are willing to pay the runtime cost to do so (this turns deadlocks into panics, so deadlocks could be used to restart things)
  • help people fill better bugreports, because one often discovers a deadlock when the application does not terminate, but I at least often have the doubt of whether "is some work still ongoing" or "is this really a deadlock". Being able to turn a flag and reproduce with a backtrace is a huge time saver, and allows filling a more informative bug report. Although as you mention, from there to actually finding out what the problem is, the road is long.

What would be the appropriate place to discuss adding such a feature ?

karlri

comment created time in a month

issue commentseanmonstar/reqwest

Deadlock on drop of reqwest::blocking::Response sometimes

parking_lot primitives have deadlock detection that can be enabled with a cargo feature, so making sure that reqwest or tokio panic on a deadlock should be "as simple" as enabling that feature. The panic will output the backtraces of all threads involved in that deadlock.

karlri

comment created time in a month

issue commentrust-lang/rust

Tracking issue for SIMD support

@eddyb volatile operations are not re-ordered across other volatile operations (but other operations can be). IIUC the use case, @thejpster wants absolutely nothing to be reordered across this nop. This particular case is probably better suited to inline assembly than to a specific compiler intrinsic.

alexcrichton

comment created time in a month

issue commentrust-lang/rustup

Rustup should always clean after itself

@gnzlbg Oh dear, that really ought to be cleaning up. If you run rustup update do you get info: cleaning up downloads & tmp directories at the end, and if so, is it still full'o'crud?

Sadly I manually deleted all the files when filling this issue, so I don't have those precise files anymore to reproduce. I did run rustup update, rustup self update, etc. and none of these commands removed the files, but I don't remember if I got the info: ... part. - it does not ring a bell, but I was not paying attention to that either.

I just touched some files in the directory, ran rustup update and the whole directory was deleted. Maybe rustup has some registers of files it should not delete within ~/.rustup/downloads ?

gnzlbg

comment created time in a month

issue commentrust-lang/rustup

Rustup should always clean after itself

@kinnison I have rustup 1.21.1 (7832b2ebe 2019-12-20), hope that helps.

gnzlbg

comment created time in a month

push eventrust-lang/packed_simd

Travis CI User

commit sha 04b567cc682041e3ec47c0aa515ee7f49bede3d8

Update documentation

view details

push time in a month

issue commentrust-lang/project-inline-asm

What about global_asm?

The RFC does not mention anything about global_asm!, so AFAICT, it is not proposing to do anything about it.

It probably should be stabilized along with the new asm!,

Why?

comex

comment created time in a month

issue openedrust-lang/rustup

Rustup should always clean after itself

After a successful rustup update (or maybe even after each successful rustup command), the RUSTUP_HOME directory should be in a "clean" state.

Example of a non-"clean" state:

ls .rustup/downloads
0018255b94a01c42bcf6f4c3f2b2cb008d9e870811abd7b491c1a5decc7dcef8         9704de3c1ddf5711026a6e76e4a74564314a56d192f94c6e9be73f53e128bd0a
0d1e77d383480dde5cd1ef00d4edf711ea8366e28995ff5b37856a292c0efb6d         a85620550390318cef250ef36e174c4922ab83848ab2502dfb74bfaa50b5267a
14a696cd425757956f0edbcaaa887c5e4c6b8b3248c1a5a88fbad27741411ece         a998c2af9af897c8ee288683fc4ac15222bea6ded995f0933f54f921c708053b
1642bc5f123f61c9bb8e99a2600f7adf68e6c21c960b482a0dcb6382332fff3e         a9a541657ad067c823d7436ef875ff2e6b5da67fabe38b874527d0efe8b46799
25d95e34bc33065dfd5c5452f8ebf87c63b74bc73873cd0dd76f0ab709f0a3da         b5cee0b7be3803c7c194bb6a3bfe85c5857d1696c84f1408a46d74b42471f976
279e047a5f91cde4767366c446bb38e3584cb81a531003624aa90bff4e9a78ef         b74e2a2b7192429478a819f5ddc27f17bacd75fdea62e8e941fa9adb174d678d
3633caab24d17af3a2679aefad736a9a43c40a75688ff849fcfcd6b86b5098dd         c15f37fc933c09359ae5fa8d33865379a4c7f23328fee09bb5e8871edc6e6fff
3e1d35f15a7f164f4cbfdc0f9f3797f81841498f5c9d9219373a74d62a495c89         c2ee290ad72ec327f43febc3e47e962eefa3f71bd7ea1af83b7c555995992fc7.partial
5e1e4b41fb275751115cf110c1b9b488c0b2a28d5cfced0d931e4b411111df2e.partial d2863fdab15e37a069d3372f529c1f4d4019860f272f5d437d52e5cb24f7b1e9
6641bfba31ef754b331bacc91b94ff45aba2394258fba5ead11479465ee95305         e2695c482277e33af377fce7fd8ca066ed8ab8d6ec482915ca4b774e11e519a2
7185fe0838b512eb87096b23e95b8c5a76fd0c962cbddfb2c34ed0eec0841e5a.partial e76515c5b75da508745259d145d513384fd33fdbe38918aa955e795a26db075c
7243706338030c46adbad4b387d493003e2cd01cfa5226dffdc8975bbccbd396         ec0e28cc0452b871c0b57289560dc30f49ff026d5b2513d9fc3071ef3d0d67ad
7525b45aca040b3d8c0dcb532128dfc0c09cf603746e507d2e0d940bd9e3ce20         ede4f03d16fdec8a88706fce1f52c0bfc34744dfbdc01dc0b1fa058ea71836af
8700c284bc6d61fdd4d0ef26cd68fe2b65f625701709493c106ca5f46e85d1b8         ef0e0bb8abf76b81f0999816595a56982a50791e58ee2895540969a5036a53bd
89b368af8f1a8fd94c70b249cf93dbb0ee2513585e2f84992e888fee1904049a         f20b176e10fec12b15fd7b9a172c4113150d416bf3b682e1f4e5fe7d4992d678

du -h ~/.rustup/downloads/
2.5G /Users/gnzlbg/.rustup/downloads/

That's 2.5 Gb of useless files.

created time in a month

issue commenttaniarascia/takenote

[Feature] Integrate GitHub with TakeNote

Why use a gist?

I'd prefer if my notes would be sync'ed to a git repository, e.g., with one directory per note, and inside each directory one would have the markdown file for the note, attachments, and maybe a human readable toml or json file with metadata like tags.

That way, the directory history can be leveraged by applications, e.g, to show how a note changed over time.

taniarascia

comment created time in a month

issue commentnotable/notable

Mobile app

FWIW I want an iOS app, but I don't want to use a "notable cloud". Instead, I'd prefer if notable would use git for version control, and your "notable" account would just be a git repository, e.g., on github (public or private) or somewhere else. So that you log in with your github account, set permissions for your private repository, and every change gets committed across devices, and devices uses git pull for sync.

I really care about my notes not being in a proprietary format in some proprietary cloud, so that if the "notable ecosystem" gets hit by a bus I can still move to a different ecosystem / app that supports the same protocol" markdown-flavor for notes, some directory structure as a data-base, and some "git commit structure" as history.

LukeDefeo

comment created time in a month

issue commentnotable/notable

Version control

It would be awesome if one could setup the App to use a particular github (or gitlab, or whatever) repo for backup, and this automatically synced notes across devices (e.g. one modifies a note in phone, the modification gets automatically committed to git, and the desktop app automatically pulls it).

fabiospampinato

comment created time in a month

issue commentrust-lang/rust

Imprecise floating point operations (fast-math)

@rkruppe good points about inlining in MIR and about x + y being special cased.

So IIUC, inlining would still need to be very careful of, after inlining, not applying other optimizations that could change the fast-math settings of the inlined code. In a "fast-math" function, x + y would get fast-math flags, but would an x + y that gets inlined from a x.add(y) also get them?

The whole dilemma seems to be coming from tying fp optimizations to scopes, rather than types. If there was f32 and fastf32, then inlining wouldn't be semantics-breaking.

What fastf32 achieves is tying the fp-arithmetic constraints to the operations on the memory it wraps. We already have some core::intrinsics for fast-math. I don't know if it is a good idea, but I would be ok with exposing a perma-unstable set of intrinsics that can express all these operations, since that should be enough to write types like fastf32 and other approaches like NonNan<NoSignedZero<Associative<T>>> in Rust libraries. Maybe something like:

mod core::intrinsics { // or somewhere else
    // bitflags for fast-math:
    const NonNan: u32 = 0b1;
    const NoSignedZero: u32 = 0b10;
    const Associative: u32 = 0b100;
    ...

    // fp arithmetic intrinsics taking a const bitset of fast-math flags
    fn fp_add<T>(T, T, const fast_math_flags: u32) -> T;
    fn fp_sub<T>(T, T, const fast_math_flags: u32) -> T;  
   ...
   fn fp_sqrt<T>(T, T, const fast_math_flags: u32) -> T;
   ...
}

Alternatively, maybe we can just extend all the current floating-point core intrinsics with a bitset. With something like default function arguments, we would just set that bitset to 0 by default, preserving current behavior.

mpdn

comment created time in a month

push eventgnzlbg/static_vector

gnzlbg

commit sha 28606f87553425ab0ca14ceaf927eaf8ce414bce

Add LEWG suggestions to changelog

view details

push time in a month

issue openedbsteinb/accurate

This crate is super hard to find on crates.io

I searched for Kahan, floating point precision, sum, ... and it did not pop up.

It might be worth it to expend some time doing some crates.io SEO.

created time in 2 months

issue commentrust-lang/rust

Imprecise floating point operations (fast-math)

Enabling fast match by default is going to break a lot of floating-point code.

For example, chances are that somewhere in your floating-point program, something is computing the sum of an array of floating-point values. If that code is doing it right, it is probably going to be using a crate like accurate to compute the sum efficiently with a small error.

One of the algorithms that accurate implements is Kahan summation, which is roughly:

S = X[0]
C = 0
for i in [1..N]:
  Y = X[i] - C
  T = S + Y
  C = (T - S) - Y
  S = T

With -ffast-math, a compiler can replace T in C = (T - S) - Y with S + Y, which results in C = ((S + Y) - S) - Y and optimize that to C = 0. Since C is never modified, and adding zero does nothing with -ffast-math, Kahan algorithm can be further optimized to:

S = X[0]
for i in [1..N]:
  S += X[i]

which defeats the point and produces quite inaccurate results.

This optimization is not theoretical, clang performs it when -ffast-math is enabled, e.g., see: https://gcc.godbolt.org/z/8NLIdB

Allowing users to enable -ffast-math globally is only going to destroy all properly-written floating-point code in libstd and elsewhere, which Rust users building applications with more than 200 crates are probably using without knowing.

Even allowing this at the function scope seems like a footgun, e.g., imagine a user writes:

#[math(overflow="wrap", assumptions=("algebraic", "no-nan", "finite"))]
fn foo(b1: &[f32]) -> f32 {
    accurate::kahan_sum(b1)
}

If accurate gets inlined into foo, the same algorithm-destroying optimization shown above would apply. To avoid that, we would need to prevent functions with "incompatible" #[math(...)] annotations from being inlined into each other. This means that the example used by @tkaitchuck above might not work, because the <f32 as Add<f32>::add method does not have a #[math] annotation.

mpdn

comment created time in 2 months

startedbsteinb/accurate

started time in 2 months

issue commentrust-lang/unsafe-code-guidelines

Multiple incompatible function declarations

Re-reading my last comment, I think the most appealing solution for me would be to make extern blocks unsafe in a future edition. I really have no idea how that example could work in safe code.

gnzlbg

comment created time in 2 months

pull request commentrust-lang/rfcs

Add llvm_asm! and deprecate asm!

Personally I find the proposed new name (llvm_asm!) very confusing, as to me that would signify the insertion of LLVM assembly language snippets, rather than insertion x86/whatever assembly language snippets to be passed to LLVM...

We could call it clang_asm! or c_asm! since that's closer to what the macro actually does - it takes syntax closer to clang's C asm(...) statements and maps it to LLVM-IR inline assembler expressions.

I personally don't find the name llvm_asm! confusing but don't mind either way since the name isn't that important for something that's currently not in the path towards stabilization.

Amanieu

comment created time in 2 months

issue commentseanmonstar/reqwest

Logging requests

In reqwest, we could probably add a couple more useful logs, like around starting connections, requests, if a proxy is used...

That would be very helpful.

gnzlbg

comment created time in 2 months

issue commentrust-lang/libc

Does it make sense to support WASM?

By definition, wasm32-unknown-unknown has no libc. If you want to use emscripten or wasi you can use wasm32-unknown-emscripten or wasm32-wasi, both of which are already supported in the libc crate.

shepmaster

comment created time in 2 months

issue openedseanmonstar/reqwest

Logging requests

Is there a simple way to log requests ?

For example, when I use the Python requests library:

session = requests.Session()
session.post(URL, headers=headers, params=request_params, data=form_data)

when setting the log to DEBUG I get:

2020-01-11 16:19:39,505 [DEBUG] Starting new HTTPS connection (1):foo.bar.com:443
2020-01-11 16:19:40,595 [DEBUG] https://foo.bar.com:443 "POST /baz HTTP/1.1" 200 None

which is quite useful, and surprisingly low on noise.

OTOH, when using reqwest, setting the log level to debug outputs:

[2020-01-11T15:29:19Z DEBUG hyper::client::connect::dns] resolving host="foo.bar.com"
[2020-01-11T15:29:19Z DEBUG hyper::client::connect::http] connecting to 123.456.789.123:443
[2020-01-11T15:29:19Z DEBUG hyper::client::connect::http] connected to Some(V4(123.456.789.123:443))
[2020-01-11T15:29:19Z DEBUG hyper::proto::h1::io] flushed 269 bytes
[2020-01-11T15:29:19Z DEBUG hyper::proto::h1::io] flushed 556 bytes
[2020-01-11T15:29:19Z DEBUG hyper::proto::h1::io] flushed 73 bytes
[2020-01-11T15:29:20Z DEBUG hyper::proto::h1::io] read 1369 bytes
[2020-01-11T15:29:20Z DEBUG hyper::proto::h1::io] parsed 20 headers
[2020-01-11T15:29:20Z DEBUG hyper::proto::h1::conn] incoming body is chunked encoding
[2020-01-11T15:29:20Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x29EF (10735 bytes)
[2020-01-11T15:29:20Z DEBUG cookie_store::cookie_store] inserting Set-Cookie 'Cookie { cookie_string: Some("__cfduid=sa7dsa6d86as7sda67ad6; expires=Mon, 10-Feb-20 15:29:19 GMT; path=/; domain=.foo.bar.com; HttpOnly; SameSite=Lax; Secure"), name: Indexed(0, 8), value: Indexed(9, 52), expires: None, max_age: None, domain: Some(Indexed(107, 121)), path: Some(Indexed(96, 97)), secure: Some(true), http_only: Some(true), same_site: Some(Lax) }'
[2020-01-11T15:29:20Z DEBUG cookie_store::cookie_store] inserting Set-Cookie 'Cookie { cookie_string: Some("org.fooframework.web.servlet.i18n.CookieLocaleResolver.LOCALE=en; Path=/"), name: Indexed(0, 64), value: Indexed(65, 67), expires: None, max_age: None, domain: None, path: Some(Indexed(74, 75)), secure: None, http_only: None, same_site: None }'
[2020-01-11T15:29:20Z DEBUG cookie_store::cookie_store] inserting Set-Cookie 'Cookie { cookie_string: Some("STUFF=asdasdas-das-sda-ds-das; Path=/sso/; Secure; HttpOnly"), name: Indexed(0, 7), value: Indexed(8, 44), expires: None, max_age: None, domain: None, path: Some(Indexed(51, 56)), secure: Some(true), http_only: Some(true), same_site: None }'
....dozens or so more cookies....
[2020-01-11T15:29:20Z DEBUG reqwest::async_impl::response] Response: '200 OK' for https://foo.bar.com

which is much noisier and way less useful since:

  • it does not really show the kind of request (a POST in this case) - this is pretty much the main information I cared about. The Python requests library puts it upfront, but reqwest does not show it

  • name: Indexed(0, 8) is not very useful in any kind of logs for the cookies. This might be useful for somebody working on the cookie_store library itself, but it is not useful to me as an application developer.

  • when bytes are read, flushed, etc. aren't really that relevant for me either, at least, not while debugging my application - they might be relevant if I would be debugging a performance issue, or working on hyper itself (no idea).

Some of these issues should probably be filled on downstream crates, but the logging history needs to be coherent for an application developer.

created time in 2 months

issue commentseanmonstar/reqwest

cookie jar implementation

What else is required to close this issue ?

pfernie

comment created time in 2 months

issue commentrust-lang/project-inline-asm

What state is asm allowed to modify?

To clarify, when I talk about flags I am only referring to some bits of the EFLAGS register. Specifically the status bits (CF, PF, AF, ZF, SF, OF) and the direction flag (DF). The other bits are system-level control flags which are not used by the compiler.

You also mentioned MXCSR which allows changing the default floating-point environment in the OP.

It might be worth it to more precisely spell out which "flags" are being discussed here on each architecture, and to which values they can be modified when preserves_flags is omitted (e.g. as mentioned, exiting an inline assembly block with MXCSR.RC != RN is probably instant UB).

Amanieu

comment created time in 2 months

issue commentrust-lang/project-inline-asm

What state is asm allowed to modify?

If we allow inline assembly to clobber flags, we should specify what happens with those flags when we exit the inline assembly. Are they restored to their previous value ? Are they left unchanged ?

If they are left unchanged, the user might expect:

asm!("modify EFLAGS to state A");
asm!("assert EFLAGS is in state A"); 

to never panic, but we cannot currently guarantee that.

Also, if they are left unchanged, then e.g. leaving MXCSR.RC in an inline assembly statement with a value different from RN would mean that the default rounding mode is not round-to-nearest. So even if that inline assembly does not claim to preserve flags, the behavior would still be undefined, and at that point, we probably would need to, for each architecture and register, specify what are the valid values that each flag in each register is allowed to have.

An alternative would be to "restore" all flags to some safe state after an inline assembly statement without preserves_flags.

Amanieu

comment created time in 2 months

pull request commentrust-lang/rust

Change opt-level from 2 back to 3

Would it be possible to minimize the perf regression, and open a tracking issue or fill an LLVM bug ?

Others

comment created time in 2 months

issue commentrust-lang/rust

RawVec stores a capacity field even if T is zero-sized

Notice that in https://github.com/rust-lang/unsafe-code-guidelines/issues/224 a different extension is discussed. That is, adding a mem::CompressedNonNull<T> type that's a ZST if T is a ZST, in which case the address is always just mem::align_of::<T>().

With this and that optimizations, Vec<T> would "semantically" still have 3 fields (ptr, size, cap), but ptr and cap would be ZSTs when T is a ZST, making Vec<ZST> 1-word wide.

A way to provide wording that guarantees that these optimizations will never happen would be to, e.g., guarantee that the size_of::<Vec<T>>() == size_of::<usize>() * 3 for all Ts. The current wording is a bit ambiguous on what's allowed or not.

gnzlbg

comment created time in 2 months

issue commentrust-lang/project-inline-asm

What state is asm allowed to modify?

EFLAGS

Notice that __writeflags and __readeflags were deprecated because they were deemed impossible to use correctly: https://github.com/rust-lang/stdarch/issues/485

Amanieu

comment created time in 2 months

Pull request review commentrust-lang/libc

Add definitions in ucontext.h in Android for ARM, x86, and x86_64

 pub type c_char = u8; pub type wchar_t = u32;+pub type greg_t = i32;+pub type mcontext_t = sigcontext;++s! {+    pub struct sigcontext {+        pub trap_no: ::c_ulong,+        pub error_code: ::c_ulong,+        pub oldmask: ::c_ulong,+        pub arm_r0: ::c_ulong,+        pub arm_r1: ::c_ulong,+        pub arm_r2: ::c_ulong,+        pub arm_r3: ::c_ulong,+        pub arm_r4: ::c_ulong,+        pub arm_r5: ::c_ulong,+        pub arm_r6: ::c_ulong,+        pub arm_r7: ::c_ulong,+        pub arm_r8: ::c_ulong,+        pub arm_r9: ::c_ulong,+        pub arm_r10: ::c_ulong,+        pub arm_fp: ::c_ulong,+        pub arm_ip: ::c_ulong,+        pub arm_sp: ::c_ulong,+        pub arm_lr: ::c_ulong,+        pub arm_pc: ::c_ulong,+        pub arm_cpsr: ::c_ulong,+        pub fault_address: ::c_ulong,+    }+}++cfg_if! {+    if #[cfg(libc_union)] {+        s_no_extra_traits! {+            pub struct __c_anonymous_uc_sigmask_with_padding {+                pub uc_sigmask: ::sigset_t,+                /* Android has a wrong (smaller) sigset_t on x86. */+                __padding_rt_sigset: u32,+            }++            pub union __c_anonymous_uc_sigmask {+                uc_sigmask: __c_anonymous_uc_sigmask_with_padding,+                uc_sigmask64: ::sigset64_t,+            }++            pub struct ucontext_t {+                pub uc_flags: ::c_ulong,+                pub uc_link: *mut ucontext_t,+                pub uc_stack: ::stack_t,+                pub uc_mcontext: mcontext_t,+                pub uc_sigmask__c_anonymous_union: __c_anonymous_uc_sigmask,+                /* The kernel adds extra padding after uc_sigmask to match+                 * glibc sigset_t on ARM. */+                __padding: [c_char; 120],+                __align: [::c_longlong; 0],

In the first place, are this line (__align: [::c_longlong; 0]) and this commit the right way to set alignment for uc_regspace: [::c_ulong; 128]? I'm new to memory alignment...

I think that's the simplest way to achieve that, simpler than by defining a new type, so if that works, I'd prefer this over defining a different type with the repr(align(N)) attribute, since it avoids the need for workarounds.

igrep

comment created time in 2 months

issue commentrust-lang/rust

travis-cargo currently cannot upload / generate code coverage information for documentation tests

I can try, but I don't think anything has happened upstream to support this? cc @QuietMisdreavus I've seen rustdoc can now output the percentage of documented APIs, but that is different from actually being able to run the doc-tests with code-coverage instrumentation, and output the instrumentation results to a meaningful place, or to provide some way that cargo-travis could use to run the doc tests with external code coverage sampling (maybe this would need a custom test runner?).

gnzlbg

comment created time in 2 months

push eventgnzlbg/sleef-sys

Johannes Schilling

commit sha 165e819d2b4284fc1287f125dc6260d17764a147

add shebang for ci/run-docker.sh stumbled upon this when debian-packaging, there's a lint for script files without shebang (#!/some/interpreter) on top

view details

push time in 2 months

PR merged gnzlbg/sleef-sys

add shebang for ci/run-docker.sh

stumbled upon this when debian-packaging, there's a lint for script files without shebang (#!/some/interpreter) on top

+2 -0

1 comment

1 changed file

dario23

pr closed time in 2 months

pull request commentgnzlbg/sleef-sys

add shebang for ci/run-docker.sh

Thank you!

dario23

comment created time in 2 months

issue commentrust-lang/rust

Tracking issue for Vec::remove_item

@withoutboats mentioned here:

I think we should not stabilize this just because it exists, but instead someone should come up with a well justified explanation of exactly what the best "find and remove" set of methods of vec-likes would be.

@SimonSapin also raised some more concerns here, and I raised some concerns here.

I don't see where these concerns have been resolved. From re-reading this whole discussion, it isn't clear to me what the goal of adding this method is nor what problem does this method solve or whether this is a problem worth solving. What's more or less clear to me is that there is at least some design work to be done to at least ensure that libstd exposes a consistent API, and that sounds like this would warrant an RFC to me. The current discussion about using a _by method hints that these APIs might not only want to be consistent for the ordered collections (Vec, VecDeque, List) for which a notion of a "first" element makes sense, but might also want to be consistent with the e.g. _by and _by_key methods of slices.

The main arguments for FCPing this appear to be that this has sit on nightly for too long without anybody complaining, and the main argument for stabilizing this has been that this has been in FCP for too long. I think it would be better to just encourage and support those that want to land this feature to write a small RFC for it. If somebody is interested in doing that, I would be able to help and give feedback.

madseagames

comment created time in 2 months

issue commentrust-lang/unsafe-code-guidelines

Layout of pointers to slices of ZSTs

I think libstd very clearly already promises that Vec's layout isn't clever.

@Lokathor do you have a link to where libstd makes this guarantee? AFAICT we do guarantee certain cleverness for ZSTs (e.g. that we never allocate), and this result in different behavior (e.g. the capacity of Vec::new() being different than 0). We also do provide certain "clever" layout optimizations, like Option<Vec<T>> having the same size as Vec<T>.

gnzlbg

comment created time in 2 months

issue commentrust-lang/unsafe-code-guidelines

Validity of function pointers

@elichai

With the proposed wording, the only value you can't assign to a function pointer is 0, all other values are ok. If you want to represent "emptiness", you probably want to use an Option<...fn ptr...> which is guaranteed to have the same size as the pointer due to its niche.


A particular example of where we need unaligned function pointers came up in libc (https://github.com/rust-lang/libc/pull/1626/files#r360995164). On Windows, libc would like to map Windows C API to:

// In C: typedef void (__CRTDECL* _crt_signal_t)(int);
type sighandler_t = Option<extern "cdecl" fn(c_int) -> ()>;
// In C: #define SIG_... ((_crt_signal_t)-i)  // where i = [0, 4]
pub const SIG_DFL: sighandler_t = None;
pub const SIG_IGN: sighandler_t = Some(1_usize as );
pub const SIG_GET: sighandler_t = Some(2_usize as );
pub const SIG_SGE: sighandler_t = Some(3_usize as );
pub const SIG_ACK: sighandler_t = Some(4_usize as );

so even if Windows requires functions to be 2-byte or 4-byte aligned, that would still require that unaligned function pointers aren't invalid (cc @retep998).

RalfJung

comment created time in 2 months

issue commentrust-lang/rust

Tracking issue for Vec::remove_item

So I guess we should revert stabilization as soon as possible so it won't end up in beta?

Sounds reasonable. Maybe someone wants to volunteer to lead the process about settling on a name for this ?

madseagames

comment created time in 2 months

Pull request review commentrust-lang/rfcs

Add llvm_asm! and deprecate asm!

+- Feature Name: `llvm_asm`+- Start Date: 2019-12-31+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++Deprecate the existing `asm!` macro and provide an identical one called+`llvm_asm!`. The feature gate is also renamed from `asm` to `llvm_asm`.++Unlike `asm!`, `llvm_asm!` is not intended to ever become stable.

The only thing we need to unblock progress on the new inliner assembler is to rename the asm! macro to llvm_asm!, although as exposed here it also makes sense to rename the feature gate.

While I mostly agree with what you said, we don't really need to make any commitments right now about the future of llvm_asm!, so we can just punt that discussion.

Amanieu

comment created time in 2 months

issue commentrust-lang/unsafe-code-guidelines

Layout of pointers to slices of ZSTs

Thanks for the link to that RFC. I think most of the arguments made there apply here. This comment by @nikomatsakis is a good summary.

In the framework of custom DSTs, a "pointer to T" (T sized or DST) composed of a thin pointer to the start of the object and metadata

This would require a more "general" and complicated custom DST framework to support these cases, which is unlikely since the current proposal is already complicated enough.


That RFC discussion suggest that a better solution might be to offer this via a library type, e.g., maybe something like a core::ptr::CompressedNonNull<T> that's similar to a *mut T but always non-null and if T is a ZST then it does not store a pointer address.

That would allow, e.g., a type like Vec<T>, to be potentially smaller when T is a ZST (e.g. 1 word instead of 3 words) if that makes sense for the type (this might not make sense for Vec because libstd might want to promise that its layout isn't "clever").

gnzlbg

comment created time in 2 months

issue commentrust-lang/unsafe-code-guidelines

Layout of pointers to slices of ZSTs

Note that multi-trait objects might be larger than two usizes, so the rule isn't already as simple as:

one usize for normal pointers, and two for slice and trait object pointers.

(and this is without taking into account custom DSTs, which can be arbitrarily large)

gnzlbg

comment created time in 2 months

more