[Inference] limit inference timing recording to NativeInterpreter only#49391
[Inference] limit inference timing recording to NativeInterpreter only#49391
NativeInterpreter only#49391Conversation
|
Alternatively we can do something like: diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1eec73d043..eb88a2081e 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -205,8 +205,7 @@ __set_measure_typeinf(onoff::Bool) = __measure_typeinf__[] = onoff
const __measure_typeinf__ = fill(false)
# Wrapper around _typeinf that optionally records the exclusive time for each invocation.
-function typeinf(interp::AbstractInterpreter, frame::InferenceState)
- interp = switch_from_irinterp(interp)
+function typeinf(interp::NativeInterpreter, frame::InferenceState)
if __measure_typeinf__[]
Timings.enter_new_timer(frame)
v = _typeinf(interp, frame)
@@ -216,6 +215,8 @@ function typeinf(interp::AbstractInterpreter, frame::InferenceState)
return _typeinf(interp, frame)
end
end
+# disable recording timings for external `AbstractInterpreter`s
+typeinf(interp::AbstractInterpreter, frame::InferenceState) = _typeinf(interp, frame)
function finish!(interp::AbstractInterpreter, caller::InferenceResult)
# If we didn't transform the src for caching, we may have to transform
@@ -242,6 +243,7 @@ function finish!(interp::AbstractInterpreter, caller::InferenceResult)
end
function _typeinf(interp::AbstractInterpreter, frame::InferenceState)
+ interp = switch_from_irinterp(interp)
typeinf_nocycle(interp, frame) || return false # frame is now part of a higher cycle
# with no active ip's, frame is done
frames = frame.callers_in_cycle? |
I thought about that, but I actually want to be able to see how much time it took to infer CUDA code. |
GPUCompiler can implement its own measurement mechanism for that? I'm not opposed to the current approach, but it seems somewhat risky to leave |
|
The major reason would be that it would be great if SnoopCompile could be extended to custom abstract interpreters. |
I don't know why that would be the case? I would say it is correct? You might have arbitrary interleavings? |
It's possible that there are type unstable parts within some external |
Yeah, I agree. But would it be cleaner if external const gpu_timings = CC.Timings.Timing[]
function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
[ push timing into gpu_timings ]
@invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)
end
macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end
function _snoopi_deep_gpu(cmd::Expr)
return quote
start_gpu_deep_timing()
try
$(esc(cmd))
finally
stop_gpu_deep_timing()
end
CC.Timings.InferenceTimingNode(gpu_timings[1])
end
end |
|
You would want to do both at the same time though...
…On Tue, Apr 18, 2023, 11:56 Shuhei Kadowaki ***@***.***> wrote:
The major reason would be that it would be great if SnoopCompile could be
extended to custom abstract interpreters.
For me that's the next frontier in TTFX. Oceananigans has 10+ minutes of
compilation time.
Yeah, I agree. But would it be cleaner if external AbstractInterpreter
uses SnoopCompile utilities like this?
const gpu_timings = CC.Timings.Timing[]
function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
[ push timing into gpu_timings ]
@invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)end
macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end
function _snoopi_deep_gpu(cmd::Expr)
return quote
start_gpu_deep_timing()
try
$(esc(cmd))
finally
stop_gpu_deep_timing()
end
CC.Timings.InferenceTimingNode(gpu_timings[1])
end end
—
Reply to this email directly, view it on GitHub
<#49391 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDO2WTCOU2UCQKTAQCSTTXB22R3ANCNFSM6AAAAAAXBT4PGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
c25de89 to
aff54d5
Compare
|
Leaving a comment from slack for future reference: Shuhei Kadowaki E.g. julia/base/compiler/typeinfer.jl Lines 165 to 166 in a34261f _timings[end], and currently we may get an inference graph like:
, which seems to be a bit tricky to handle? If we isolate and |
Allows consumers to separate native inference from inference coming from a custom AbstractInterpreter. Co-authored-by: Shuhei Kadowaki <40514306+aviatesk@users.noreply.github.com>
This reverts commit 062fa0a.
aff54d5 to
c2411ce
Compare
NativeInterpreter
NativeInterpreterNativeInterpreter only
…nly (#49391) The logic of `Core.Compiler.Timings` assumes that the whole recorded inference graph is constructed by the same interpreter, thus we should limit the inference timing recording to `NativeInterpreter` only. External `AbstractInterpreter` can implement its own recording logic, likely by reusing existing `Core.Compiler.Timings` utilities, in a way that does not interfere with the recording for native compilation pipeline. --------- Co-authored-by: Shuhei Kadowaki <aviatesk@gmail.com> (cherry picked from commit 3db036e)
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (JuliaDebug/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools.
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (JuliaDebug/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools. (cherry picked from commit c0611e8)
Allows consumers to separate native inference from inference coming
from a custom AbstractInterpreter.
x-ref: JuliaDebug/SnoopCompile.jl#338
I choose
typeof(interp)since the interp may be arbitrarily heavy and we might not want to hold on to them.