interpreter: Use world-age-partitioned cache for @generated results by Keno · Pull Request #54362 · JuliaLang/julia

Keno · 2024-05-05T06:22:43Z

This fixes #54360 by moving the interpreter's cache of @generated results from mi.uninferred into mi.cache with a separate cache owner to partition the cache from regular inference results. There are two other uses of the mi.uninferred field:

As the place to store uninferred code for temporary top-level thunks
Is an uncompressed copy of m->source to avoid having to re-uncompress every time in the interpreter.

In this PR, use case 1 is changed to use the same mechanism as generated functions. Use case 2 is changed to just uncompress the source in place in m->source. As a result, the uninferred field is unused and removed.

Note that I'm planning a somewhat larger refactor of MethodInstance in the immediate future, so this might be a somewhat shortlived representation, but that change should hopefully by largely transparent to users of the wrappers introduced here.

Keno · 2024-05-05T21:55:15Z

@nanosoldier runtests()

nanosoldier · 2024-05-06T15:50:35Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

vtjnash · 2024-05-07T00:56:35Z

The high level goal sounds good, though I didn't give a close review. A couple quick observations: we appear to need the isinferred bit back in CodeInfo, so that it doesn't try to infer these twice at the toplevel (maybe we already did bring that back because of breakage though?); and the direct access of mi.cache.debuginfo seemed awkward-can we adjust that code to have direct access to the CodeInstance instead of just the MethodInstance?

vtjnash · 2024-05-07T00:57:41Z

(Also looks like you may have pushed some extra commits here that are unrelated, from your other PR?)

This fixes #54360 by moving the interpreter's cache of `@generated` results from `mi.uninferred` into `mi.cache` with a separate cache owner to partition the cache from regular inference results. There are two other uses of the `mi.uninferred` field: 1. As the place to store uninferred code for temporary top-level thunks 2. Is an uncompressed copy of m->source to avoid having to re-uncompress every time in the interpreter. In this PR, use case 1 is changed to use the same mechanism as generated functions. Use case 2 is changed to just uncompress the source in place in m->source. As a result, the `uninferred` field is unused and removed. Note that I'm planning a somewhat larger refactor of `MethodInstance` in the immediate future, so this might be a somewhat shortlived representation, but that change should hopefully by largely transparent to users of the wrappers introduced here.

Keno · 2024-05-07T01:02:48Z

(Also looks like you may have pushed some extra commits here that are unrelated, from your other PR?)

Fixed, sorry about the confusion

Keno · 2024-05-07T01:09:06Z

A couple quick observations: we appear to need the isinferred bit back in CodeInfo, so that it doesn't try to infer these twice at the toplevel (maybe we already did bring that back because of breakage though?);

The caches are partitioned by owner, they don't get updated in place. Is that what you were asking?

and the direct access of mi.cache.debuginfo seemed awkward-can we adjust that code to have direct access to the CodeInstance instead of just the MethodInstance?

We could redesign how toplevel thunks work to not have a full MethodInstance (they currently corrupt the uninferred source that's in them anyway, so it's not great), but that's a much larger change.

vtjnash · 2024-05-07T02:57:07Z

Okay, I just skimmed, but now I see there is a different owner, so it can distinguish whether the inferred object is actually already optimized or not, and avoid accidentally trying to optimize it twice. Seems reasonable enough, assuming we don't delete the owner field. I suppose we can convert that to a boolean flag if so, later?

Keno · 2024-05-07T15:44:49Z

Seems reasonable enough, assuming we don't delete the owner field. I suppose we can convert that to a boolean flag if so, later?

The follow on PR does delete it, but the general cache partition idea is maintained, and we'll need some version of it for external absint always, so I think it's reasonably safe, but even if not, yes a boolean flag would be fine.

…uliaLang#54362) This fixes JuliaLang#54360 by moving the interpreter's cache of `@generated` results from `mi.uninferred` into `mi.cache` with a separate cache owner to partition the cache from regular inference results. There are two other uses of the `mi.uninferred` field: 1. As the place to store uninferred code for temporary top-level thunks 2. Is an uncompressed copy of m->source to avoid having to re-uncompress every time in the interpreter. In this PR, use case 1 is changed to use the same mechanism as generated functions. Use case 2 is changed to just uncompress the source in place in m->source. As a result, the `uninferred` field is unused and removed. Note that I'm planning a somewhat larger refactor of `MethodInstance` in the immediate future, so this might be a somewhat shortlived representation, but that change should hopefully by largely transparent to users of the wrappers introduced here.

@benchmark

In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: \## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() \# benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) \# benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` \## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` \## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

@benchmark

In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: \## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() \# benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) \# benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` \## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` \## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

@benchmark

In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: \## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() \# benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) \# benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` \## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` \## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

@benchmark

In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: \## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() \# benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) \# benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` \## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` \## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

@benchmark

In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: \## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() \# benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) \# benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` \## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` \## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

@benchmark

In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: \## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() \# benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) \# benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` \## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` \## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

@benchmark

…54916) In Cassette-like systems, where inference has to infer many calls of `@generated` function and the generated function involves complex code transformations, the overhead from code generation itself can become significant. This is because the results of code generation are not cached, leading to duplicated code generation in the following contexts: - `method_for_inference_heuristics` for regular inference on cached `@generated` function calls (since `method_for_inference_limit_heuristics` isn't stored in cached optimized sources, but is attached to generated unoptimized sources). - `retrieval_code_info` for constant propagation on cached `@generated` function calls. Having said that, caching unoptimized sources generated by `@generated` functions is not a good tradeoff in general cases, considering the memory space consumed (and the image bloat). The code generation for generators like `GeneratedFunctionStub` produced by the front end is generally very simple, and the first duplicated code generation mentioned above does not occur for `GeneratedFunctionStub`. So this unoptimized source caching should be enabled in an opt-in manner. Based on this idea, this commit defines the trait `abstract type Core.CachedGenerator` as an interface for the external systems to opt-in. If the generator is a subtype of this trait, inference caches the generated unoptimized code, sacrificing memory space to improve the performance of subsequent inferences. Specifically, the mechanism for caching the unoptimized source uses the infrastructure already implemented in #54362. Thanks to #54362, the cache for generated functions is now partitioned by world age, so even if the unoptimized source is cached, the existing invalidation system will invalidate it as expected. In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results showed that approximately 1.5~3x inference speedup is achieved by opting into this feature: ## Setup ```julia using CassetteOverlay, BaseBenchmarks, BenchmarkTools @MethodTable table; pass = @overlaypass table; BaseBenchmarks.load!("inference"); benchfunc1() = sin(42) benchfunc2(xs, x) = findall(>(x), abs.(xs)) interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker() # benchmark inference on entire call graphs from scratch @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) # benchmark inference on the call graphs with most of them cached @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) ``` ## Benchmark inference on entire call graphs from scratch > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 61 samples with 1 evaluation. Range (min … max): 78.574 ms … 87.653 ms ┊ GC (min … max): 0.00% … 8.81% Time (median): 83.149 ms ┊ GC (median): 4.85% Time (mean ± σ): 82.138 ms ± 2.366 ms ┊ GC (mean ± σ): 3.36% ± 2.65% ▂ ▂▂ █ ▂ █ ▅ ▅ █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁ 78.6 ms Histogram: frequency by time 86.8 ms < Memory estimate: 52.32 MiB, allocs estimate: 1201192. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 4 samples with 1 evaluation. Range (min … max): 1.345 s … 1.369 s ┊ GC (min … max): 2.45% … 3.39% Time (median): 1.355 s ┊ GC (median): 2.98% Time (mean ± σ): 1.356 s ± 9.847 ms ┊ GC (mean ± σ): 2.96% ± 0.41% █ █ █ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.35 s Histogram: frequency by time 1.37 s < Memory estimate: 637.96 MiB, allocs estimate: 15159639. ``` > with this PR ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1) BenchmarkTools.Trial: 230 samples with 1 evaluation. Range (min … max): 19.339 ms … 82.521 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.938 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.665 ms ± 4.666 ms ┊ GC (mean ± σ): 6.72% ± 8.80% ▃▇█▇▄ ▂▂▃▃▄ █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆ 19.3 ms Histogram: log(frequency) by time 29.4 ms < Memory estimate: 28.67 MiB, allocs estimate: 590138. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 14 samples with 1 evaluation. Range (min … max): 354.585 ms … 390.400 ms ┊ GC (min … max): 0.00% … 7.01% Time (median): 368.778 ms ┊ GC (median): 3.74% Time (mean ± σ): 368.824 ms ± 8.853 ms ┊ GC (mean ± σ): 3.70% ± 1.89% ▃ █ ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁ 355 ms Histogram: frequency by time 390 ms < Memory estimate: 227.86 MiB, allocs estimate: 4689830. ``` ## Benchmark inference on the call graphs with most of them cached > on master ``` julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 45.166 μs … 9.799 ms ┊ GC (min … max): 0.00% … 98.96% Time (median): 46.792 μs ┊ GC (median): 0.00% Time (mean ± σ): 48.339 μs ± 97.539 μs ┊ GC (mean ± σ): 2.01% ± 0.99% ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁ ▁▁▂▂▁ ▁ ▂▁ ▁ ▃ ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █ 45.2 μs Histogram: log(frequency) by time 55 μs < Memory estimate: 25.27 KiB, allocs estimate: 614. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 303.375 μs … 16.582 ms ┊ GC (min … max): 0.00% … 97.38% Time (median): 317.625 μs ┊ GC (median): 0.00% Time (mean ± σ): 338.772 μs ± 274.164 μs ┊ GC (mean ± σ): 5.44% ± 7.56% ▃▆██▇▅▂▁ ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃ 303 μs Histogram: frequency by time 394 μs < Memory estimate: 412.80 KiB, allocs estimate: 6224. ``` > with this PR ``` @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1) BenchmarkTools.Trial: 10000 samples with 6 evaluations. Range (min … max): 5.444 μs … 1.808 ms ┊ GC (min … max): 0.00% … 99.01% Time (median): 5.694 μs ┊ GC (median): 0.00% Time (mean ± σ): 6.228 μs ± 25.393 μs ┊ GC (mean ± σ): 5.73% ± 1.40% ▄█▇▄ ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 5.44 μs Histogram: frequency by time 7.47 μs < Memory estimate: 8.72 KiB, allocs estimate: 196. julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5) BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 211.000 μs … 36.187 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 223.000 μs ┊ GC (median): 0.00% Time (mean ± σ): 280.025 μs ± 750.097 μs ┊ GC (mean ± σ): 6.86% ± 7.16% █▆▄▂▁ ▁ ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █ 211 μs Histogram: log(frequency) by time 1.46 ms < Memory estimate: 374.17 KiB, allocs estimate: 5269. ```

…uliaLang#54362) This fixes JuliaLang#54360 by moving the interpreter's cache of `@generated` results from `mi.uninferred` into `mi.cache` with a separate cache owner to partition the cache from regular inference results. There are two other uses of the `mi.uninferred` field: 1. As the place to store uninferred code for temporary top-level thunks 2. Is an uncompressed copy of m->source to avoid having to re-uncompress every time in the interpreter. In this PR, use case 1 is changed to use the same mechanism as generated functions. Use case 2 is changed to just uncompress the source in place in m->source. As a result, the `uninferred` field is unused and removed. Note that I'm planning a somewhat larger refactor of `MethodInstance` in the immediate future, so this might be a somewhat shortlived representation, but that change should hopefully by largely transparent to users of the wrappers introduced here.

Keno requested a review from vtjnash May 5, 2024 06:22

Keno force-pushed the kf/54360 branch from 3c054df to fae1796 Compare May 5, 2024 21:03

Keno mentioned this pull request May 6, 2024

RFC: Refactor MethodInstance to allow for more general specialization #54373

Closed

Keno force-pushed the kf/54360 branch from b58ed0d to a3c1309 Compare May 7, 2024 01:02

Keno merged commit dbf0bab into master May 7, 2024

Keno deleted the kf/54360 branch May 7, 2024 17:01

aviatesk mentioned this pull request Jun 24, 2024

inference: implement an opt-in interface to cache generated sources #54916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

interpreter: Use world-age-partitioned cache for @generated results#54362

interpreter: Use world-age-partitioned cache for @generated results#54362
Keno merged 1 commit intomasterfrom
kf/54360

Keno commented May 5, 2024

Uh oh!

Keno commented May 5, 2024

Uh oh!

nanosoldier commented May 6, 2024

Uh oh!

vtjnash commented May 7, 2024

Uh oh!

vtjnash commented May 7, 2024

Uh oh!

Keno commented May 7, 2024

Uh oh!

Keno commented May 7, 2024

Uh oh!

vtjnash commented May 7, 2024

Uh oh!

Keno commented May 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Keno commented May 5, 2024

Uh oh!

Keno commented May 5, 2024

Uh oh!

nanosoldier commented May 6, 2024

Uh oh!

vtjnash commented May 7, 2024

Uh oh!

vtjnash commented May 7, 2024

Uh oh!

Keno commented May 7, 2024

Uh oh!

Keno commented May 7, 2024

Uh oh!

vtjnash commented May 7, 2024

Uh oh!

Keno commented May 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants