Skip to content

Conversation

@ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Jan 5, 2026

This intrinsic helps with supporting the various AMD & NVIDIA libraries like rocBLAS or cuBLAS.
They provide functions which must be called from the host, but require a mixture of host and device pointers.
This offload_args intrinsic maps our host allocations to device allocations and transfers memory as required.
It reuses the whole infrastructure which we already have for the main offload intrinsic.
Unlike the main offload intrinsic, this also already fully works with std. I also got it to work with a single cargo invocation:
RUSTFLAGS="-L native=/opt/rocm-6.4.0/lib -l dylib=rocblas -l dylib=amdhip64 -l dylib=omp -l dylib=omptarget -Zoffload=Args -Zunstable-options" cargo +offload run -r

To be cleaned up.

TODO: handle mangled fnc names. Done

I updated compiler/rustc_monomorphize/src/collector/autodiff.rs, it now works without no_mangle, otherwise the function won't be codegen'ed. It also works without lto=fat if we only have main.rs
If we put and use stuff in lib.rs and call it in main.rs, then it currently trips the verifier. I guess that should be easy to fix:

thread 'rustc' (494962) panicked at compiler/rustc_monomorphize/src/collector.rs:468:13:
assertion failed: tcx.should_codegen_locally(instance)
stack backtrace:

cc @kevinsala @Sa4dUs

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 5, 2026
@ZuseZ4 ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Jan 5, 2026
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from 020f669 to 555131e Compare January 5, 2026 15:21
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 mentioned this pull request Jan 6, 2026
5 tasks
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch 2 times, most recently from 020f669 to 0228337 Compare January 14, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants