Skip to content

Conversation

@Sa4dUs
Copy link
Contributor

@Sa4dUs Sa4dUs commented Dec 22, 2025

This PR adds scalar support to the offload feature. The scalar management has two main parts:

On the host side, each scalar arg is casted to ix type, zero extended to i64 and passed to the kernel like that.
On the device, the each scalar arg (i64 at that point), is truncated to ix and then casted to the original type.

r? @ZuseZ4

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 22, 2025
@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 30, 2025

@Sa4dUs, what's the current state of this? Given the omp/ol expectation of scalars being 64 bit integers I guess this code only works for i64 and usize and everything else will need the casts (or only works by chance)?
I'm fine with having casts in a follow-up PR, but I guess we should start having more test files in which we check that all the different argument types are being handled correctly, so at least an i64 test for this one?

@Sa4dUs
Copy link
Contributor Author

Sa4dUs commented Dec 30, 2025

it works for all 64-bit scalars. for f64 it's not what omp expects but somehow it just works, not sure it it will break in some weird edge case

}
};

let gep = builder.inbounds_gep(cx.type_f32(), gep_base, &[i32_0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is generally to move away from typed geps here, e.g.
https://www.npopov.com/2025/01/05/This-year-in-LLVM-2024.html
llvm/llvm-project#68882

Since we only index at 0 it doesn't differ (4*0=0), but can you change it from f32 to i8 to not confuse other readers about the meaning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also wondering if our index logic here is correct in general. It's derived from what clang generates, but that makes it more verbose than it needs to be, I'm just testing if further simplifications here help with the benchmark behaving weird.

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 2, 2026

☔ The latest upstream changes made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Collaborator

bors commented Jan 3, 2026

☔ The latest upstream changes (presumably #150606) made this pull request unmergeable. Please resolve the merge conflicts.

@Sa4dUs Sa4dUs force-pushed the offload-bench-fix branch from 0330a0e to ea71b7b Compare January 3, 2026 17:03
@ZuseZ4 ZuseZ4 mentioned this pull request Jan 6, 2026
5 tasks
@Sa4dUs Sa4dUs force-pushed the offload-bench-fix branch from ea71b7b to 8b7e6b6 Compare January 7, 2026 13:28
@Sa4dUs Sa4dUs force-pushed the offload-bench-fix branch 2 times, most recently from 5189bbb to 45ccec5 Compare January 14, 2026 15:30
@Sa4dUs Sa4dUs changed the title WIP Offload Benchmarks Fixes Add scalar support for offload Jan 14, 2026
@Sa4dUs Sa4dUs force-pushed the offload-bench-fix branch from 45ccec5 to 38f7fc7 Compare January 14, 2026 15:38
@Sa4dUs Sa4dUs marked this pull request as ready for review January 14, 2026 17:56
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 14, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 14, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 14, 2026

r? @jdonszelmann

rustbot has assigned @jdonszelmann.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot assigned ZuseZ4 and unassigned jdonszelmann Jan 14, 2026
llvm::set_value_name(a0, CString::new("dyn_ptr").unwrap().as_bytes());

let bb =
unsafe { llvm::LLVMAppendBasicBlockInContext(cx.llcx(), new_fn, c"entry".as_ptr()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use cx.append_block to make this safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i'm using SimpleCx and SBuilder i don't have access to that safe abstractions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can't change the trait to accept a simpleCx, but those functions should work with a SimpleCx, right? In that case you can duplicate the function, see e.g. memset which we have twice. Not ideal, but at least no unsafe.

let num_bits = scalar_width(cx, old_ty);

let trunc = unsafe {
llvm::LLVMBuildTrunc(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cx.trunc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

let target_symbol = symbol_name_for_instance_in_crate(tcx, fn_target, LOCAL_CRATE);

let sig = tcx.fn_sig(fn_target.def_id()).skip_binder().skip_binder();
let sig = tcx.fn_sig(fn_target.def_id()).instantiate_identity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was meant to give the compiler enough info to infer types, but it seems that just using skip_binder() works. i had left it while i was trying to make that work

@Sa4dUs Sa4dUs force-pushed the offload-bench-fix branch from 38f7fc7 to 5e980e1 Compare January 14, 2026 19:42
@rust-log-analyzer

This comment has been minimized.

@Sa4dUs Sa4dUs force-pushed the offload-bench-fix branch from 5e980e1 to 272a1a6 Compare January 15, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants