I was trying out SnoopPrecompile.jl with CUDA.jl, on Julia 1.9, doing some minimal kernel compilation during precompilation:
@precompile_setup let
@precompile_all_calls begin
target = PTXCompilerTarget(; cap=v"7.5.0")
params = CUDACompilerParams()
job = CompilerJob(target, FunctionSpec(identity, Tuple{Nothing}, true), params)
GPUCompiler.code_native(devnull, job)
end
end
This results in an LLVM-related abort when Julia writes out the package image:
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys
[53782] signal (6.-6): Aborted
in expression starting at none:0
unknown function (ip: 0x7fd1bcc9564c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.975 at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
operator() at /home/tim/Julia/src/julia/src/aotcompile.cpp:698
jl_dump_native_impl at /home/tim/Julia/src/julia/src/aotcompile.cpp:710
ijl_write_compiler_output at /home/tim/Julia/src/julia/src/precompile.c:126
ijl_atexit_hook at /home/tim/Julia/src/julia/src/init.c:258
jl_repl_entrypoint at /home/tim/Julia/src/julia/src/jlapi.c:718
main at /home/tim/Julia/src/julia/cli/loader_exe.c:59
unknown function (ip: 0x7fd1bcc3028f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115
Allocations: 66463518 (Pool: 66446875; Big: 16643); GC: 87
It looks like Julia is trying to generate host-native code for GPU-only functionality here. After discussing this with @vchuravy, we think this happens because SnoopPrecompile.jl tracks code that's inferred, which includes GPU-code, and queues that up for precompilation. Setting verbose[] = true does indeed show that it compiles GPU-only functionality:
MethodInstance for CUDA.signal_exception()
That function is implemented here, https://github.com/JuliaGPU/CUDA.jl/blob/3d1670c9fe0bd12fb5d44e8427ab50d5f85a3d6a/src/device/runtime.jl#L35-L47, calling threadfence_system() which in turn is implemented using the llvm.nvvm.membar.sys intrinsic.
I guess that we somehow should avoid this code from getting in the pkgimage, for now. Normally we avoid polluting host caches with GPU code by using a custom AbstractInterpreter, and registering that to codegen using the lookup codegen-parameter. Maybe some property derived from this needs to be added to the data in Core.Compiler.Timings._timings so that SnoopPrecompile can decide to skip this code?
I was trying out SnoopPrecompile.jl with CUDA.jl, on Julia 1.9, doing some minimal kernel compilation during precompilation:
This results in an LLVM-related abort when Julia writes out the package image:
It looks like Julia is trying to generate host-native code for GPU-only functionality here. After discussing this with @vchuravy, we think this happens because SnoopPrecompile.jl tracks code that's inferred, which includes GPU-code, and queues that up for precompilation. Setting
verbose[] = truedoes indeed show that it compiles GPU-only functionality:That function is implemented here, https://github.com/JuliaGPU/CUDA.jl/blob/3d1670c9fe0bd12fb5d44e8427ab50d5f85a3d6a/src/device/runtime.jl#L35-L47, calling
threadfence_system()which in turn is implemented using thellvm.nvvm.membar.sysintrinsic.I guess that we somehow should avoid this code from getting in the pkgimage, for now. Normally we avoid polluting host caches with GPU code by using a custom AbstractInterpreter, and registering that to codegen using the
lookupcodegen-parameter. Maybe some property derived from this needs to be added to the data inCore.Compiler.Timings._timingsso that SnoopPrecompile can decide to skip this code?