[AOTI Eager] Add dynamic shapes support to AOTIPythonKernelHolder#176018
[AOTI Eager] Add dynamic shapes support to AOTIPythonKernelHolder#176018
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176018
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit eadb030 with merge base 1342f81 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
903a14d to
a1617e3
Compare
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
a1617e3 to
bc186e6
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
|
@StellarrZ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94301187. |
bc186e6 to
6898147
Compare
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
6898147 to
0ac306d
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
0ac306d to
1a5288d
Compare
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
1a5288d to
9e81984
Compare
9e81984 to
636f481
Compare
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
636f481 to
911158c
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
911158c to
a9c0ed8
Compare
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
a9c0ed8 to
cf1249d
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
3743410 to
c077155
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
c077155 to
9d097b4
Compare
9d097b4 to
f1aa5ca
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
f1aa5ca to
96050ea
Compare
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
96050ea to
4228b81
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
4228b81 to
675f686
Compare
675f686 to
a5cdfff
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
…76018) Summary: Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
a5cdfff to
7921dfc
Compare
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
7921dfc to
a56f4aa
Compare
a56f4aa to
bb4b597
Compare
Summary: Populate the C++ in-memory cache after first compilation so subsequent same-shape dispatches resolve entirely in C++ without calling back into Python. Root cause: `AOTIPythonKernelHolder::cache_miss()` compiled and loaded the kernel but never added it to `aoti_kernel_cache_`. Every dispatch re-entered `produce_aoti_kernel_lib` → acquired the GIL → called Python `_compile_afg` → parsed JSON → loaded AFG from disk. Fix: after `cache_miss()` creates the kernel runner, populate `aoti_kernel_cache_` with the input metadata and runner so subsequent calls with matching shapes hit `cache_hit()` directly. Performance (`aten.bitwise_not`, shape [32,32], 100k iterations): - Before: 34,260 us/call (disk round-trip on every dispatch) - After: 21.5 us/call (in-memory cache hit, 1,593x faster) Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_inmemory_cache_reuse ``` Differential Revision: D93171553
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Reviewed By: arui-meta Differential Revision: D94301187
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / inductor-cpu-test / test (cpu_inductor_timm, 1, 2, linux.2xlarge.amx) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
|
|
Summary:
Add a
dynamicparameter toAOTIPythonKernelHolderso that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. Whendynamic_=true, the holder passesdynamic=Truetoaoti_compile_with_persistent_cache, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.Changes:
AOTIPythonKernelHolder: addeddynamic_member, forwarded toproduce_aoti_kernel_libAOTIKernelMetadata: addedis_dynamic_flag,check()usesdynamic_check()when setTensorMetadata::dynamic_check(): matches by dtype/device/rank, skips exact sizesParameterMetadata::dynamic_check(): delegates toTensorMetadata::dynamic_check()for tensor paramsTest Plan:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo