Skip to content

[AOTI Eager] Add dynamic shapes support to AOTIPythonKernelHolder#176018

Closed
StellarrZ wants to merge 3 commits intomainfrom
export-D94301187
Closed

[AOTI Eager] Add dynamic shapes support to AOTIPythonKernelHolder#176018
StellarrZ wants to merge 3 commits intomainfrom
export-D94301187

Conversation

@StellarrZ
Copy link
Copy Markdown
Contributor

@StellarrZ StellarrZ commented Feb 27, 2026

Summary:
Add a dynamic parameter to AOTIPythonKernelHolder so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When dynamic_=true, the holder passes dynamic=True to aoti_compile_with_persistent_cache, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:

  • AOTIPythonKernelHolder: added dynamic_ member, forwarded to produce_aoti_kernel_lib
  • AOTIKernelMetadata: added is_dynamic_ flag, check() uses dynamic_check() when set
  • TensorMetadata::dynamic_check(): matches by dtype/device/rank, skips exact sizes
  • ParameterMetadata::dynamic_check(): delegates to TensorMetadata::dynamic_check() for tensor params

Test Plan:

buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176018

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit eadb030 with merge base 1342f81 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from 903a14d to a1617e3 Compare March 2, 2026 17:15
meta-codesync bot pushed a commit that referenced this pull request Mar 2, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
meta-codesync bot pushed a commit that referenced this pull request Mar 2, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 2, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 2, 2026

@StellarrZ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94301187.

@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from bc186e6 to 6898147 Compare March 4, 2026 15:43
meta-codesync bot pushed a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
meta-codesync bot pushed a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from 0ac306d to 1a5288d Compare March 4, 2026 18:24
meta-codesync bot pushed a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from 9e81984 to 636f481 Compare March 4, 2026 19:17
meta-codesync bot pushed a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from 911158c to a9c0ed8 Compare March 4, 2026 20:04
meta-codesync bot pushed a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 4, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 5, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
meta-codesync bot pushed a commit that referenced this pull request Mar 5, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from c077155 to 9d097b4 Compare March 5, 2026 22:58
StellarrZ added a commit that referenced this pull request Mar 5, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from f1aa5ca to 96050ea Compare March 6, 2026 00:36
meta-codesync bot pushed a commit that referenced this pull request Mar 6, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
StellarrZ added a commit that referenced this pull request Mar 6, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
meta-codesync bot pushed a commit that referenced this pull request Mar 6, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from 4228b81 to 675f686 Compare March 6, 2026 06:00
StellarrZ added a commit that referenced this pull request Mar 6, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
meta-codesync bot pushed a commit that referenced this pull request Mar 6, 2026
…76018)

Summary:

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from a5cdfff to 7921dfc Compare March 6, 2026 15:47
StellarrZ added a commit that referenced this pull request Mar 6, 2026
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
@meta-codesync meta-codesync bot force-pushed the export-D94301187 branch from a56f4aa to bb4b597 Compare March 6, 2026 16:25
Summary:
Populate the C++ in-memory cache after first compilation so subsequent same-shape dispatches resolve entirely in C++ without calling back into Python.

Root cause: `AOTIPythonKernelHolder::cache_miss()` compiled and loaded the kernel but never added it to `aoti_kernel_cache_`. Every dispatch re-entered `produce_aoti_kernel_lib` → acquired the GIL → called Python `_compile_afg` → parsed JSON → loaded AFG from disk.

Fix: after `cache_miss()` creates the kernel runner, populate `aoti_kernel_cache_` with the input metadata and runner so subsequent calls with matching shapes hit `cache_hit()` directly.

Performance (`aten.bitwise_not`, shape [32,32], 100k iterations):
- Before: 34,260 us/call (disk round-trip on every dispatch)
- After: 21.5 us/call (in-memory cache hit, 1,593x faster)

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_inmemory_cache_reuse
```

Differential Revision: D93171553
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Reviewed By: arui-meta

Differential Revision: D94301187
@facebook-github-tools
Copy link
Copy Markdown

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / inductor-cpu-test / test (cpu_inductor_timm, 1, 2, linux.2xlarge.amx)

Details for Dev Infra team Raised by workflow job

@huydhn
Copy link
Copy Markdown
Contributor

huydhn commented Mar 13, 2026

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-tools
Copy link
Copy Markdown

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 13, 2026

-i flag is only allowed for users with write permissions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants