PyTorch API for Single Request Prefill Operator by Yifei-Zuo · Pull Request #17 · flashinfer-ai/flashinfer

Yifei-Zuo · 2023-11-27T00:13:23Z

add prefill kernel pytorch api

## 📌 Description We currently have unit tests failing as: ``` ========================================== Running: pytest --continue-on-collection-errors -s --junitxml=/junit/tests/comm/test_trtllm_mnnvl_allreduce.py.xml "tests/comm/test_trtllm_mnnvl_allreduce.py" ========================================== Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: MPIR_Init_thread(192)........: MPID_Init(1665)..............: MPIDI_OFI_mpi_init_hook(1586): (unknown)(): Unknown error class [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447 : system msg for write_line failure : Bad file descriptor Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: MPIR_Init_thread(192)........: MPID_Init(1665)..............: MPIDI_OFI_mpi_init_hook(1586): ... Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, tvm_ffi.core, markupsafe._speedups, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, mpi4py.MPI (total: 22) !!!!!!! Segfault encountered !!!!!!! ... ❌ FAILED: tests/comm/test_trtllm_mnnvl_allreduce.py ``` These tests should be skipping in a single GPU environment, but are failing, which indicates that they are failing at MPI module load time. The current `dockerfile.cuXXX` installs MPI via `RUN conda install -n py312 -y mpi4py`. Upon investigating the docker build logs, [A month ago (Nov. 4)](https://github.com/flashinfer-ai/flashinfer/actions/runs/19084098717/job/54520197904#step:6:802), ``` #17 13.68 mpi-1.0.1 | mpich 6 KB conda-forge #17 13.68 mpi4py-4.1.1 |py312hd0af0b3_100 866 KB conda-forge #17 13.68 mpich-4.3.2 | h79b1c89_100 5.4 MB conda-forge ``` was being installed, [but yesterday](https://github.com/flashinfer-ai/flashinfer/actions/runs/19960576464/job/57239792717#step:6:673): ``` #17 13.59 impi_rt-2021.13.1 | ha770c72_769 41.7 MB conda-forge #17 13.59 mpi-1.0 | impi 6 KB conda-forge #17 13.59 mpi4py-4.1.1 |py312h18f78f0_102 864 KB conda-forge ``` is being installed. The mpich vs. impi are Implementations to the MPI: MPICH vs. Intel MPI. This is currently the suspected issue underlying the MPI load failures. Current PR specifies the MPI implementation via `RUN conda install -n py312 -y mpi4py mpich`. The result of the current PR produces ([build log](https://github.com/flashinfer-ai/flashinfer/actions/runs/19976372640/job/57293423165?pr=2182#step:6:436)): ``` #15 14.63 mpi-1.0.1 | mpich 6 KB conda-forge #15 14.63 mpi4py-4.1.1 |py312hd0af0b3_102 865 KB conda-forge #15 14.63 mpich-4.3.2 | h79b1c89_100 5.4 MB conda-forge ``` which now matches what we had before  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes

(flashinfer-ai#17) Co-authored-by: yangxurui <yangxurui@meituan.com>

## 📌 Description We currently have unit tests failing as: ``` ========================================== Running: pytest --continue-on-collection-errors -s --junitxml=/junit/tests/comm/test_trtllm_mnnvl_allreduce.py.xml "tests/comm/test_trtllm_mnnvl_allreduce.py" ========================================== Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: MPIR_Init_thread(192)........: MPID_Init(1665)..............: MPIDI_OFI_mpi_init_hook(1586): (unknown)(): Unknown error class [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447 : system msg for write_line failure : Bad file descriptor Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: MPIR_Init_thread(192)........: MPID_Init(1665)..............: MPIDI_OFI_mpi_init_hook(1586): ... Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, tvm_ffi.core, markupsafe._speedups, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, mpi4py.MPI (total: 22) !!!!!!! Segfault encountered !!!!!!! ... ❌ FAILED: tests/comm/test_trtllm_mnnvl_allreduce.py ``` These tests should be skipping in a single GPU environment, but are failing, which indicates that they are failing at MPI module load time. The current `dockerfile.cuXXX` installs MPI via `RUN conda install -n py312 -y mpi4py`. Upon investigating the docker build logs, [A month ago (Nov. 4)](https://github.com/flashinfer-ai/flashinfer/actions/runs/19084098717/job/54520197904#step:6:802), ``` flashinfer-ai#17 13.68 mpi-1.0.1 | mpich 6 KB conda-forge flashinfer-ai#17 13.68 mpi4py-4.1.1 |py312hd0af0b3_100 866 KB conda-forge flashinfer-ai#17 13.68 mpich-4.3.2 | h79b1c89_100 5.4 MB conda-forge ``` was being installed, [but yesterday](https://github.com/flashinfer-ai/flashinfer/actions/runs/19960576464/job/57239792717#step:6:673): ``` flashinfer-ai#17 13.59 impi_rt-2021.13.1 | ha770c72_769 41.7 MB conda-forge flashinfer-ai#17 13.59 mpi-1.0 | impi 6 KB conda-forge flashinfer-ai#17 13.59 mpi4py-4.1.1 |py312h18f78f0_102 864 KB conda-forge ``` is being installed. The mpich vs. impi are Implementations to the MPI: MPICH vs. Intel MPI. This is currently the suspected issue underlying the MPI load failures. Current PR specifies the MPI implementation via `RUN conda install -n py312 -y mpi4py mpich`. The result of the current PR produces ([build log](https://github.com/flashinfer-ai/flashinfer/actions/runs/19976372640/job/57293423165?pr=2182#step:6:436)): ``` flashinfer-ai#15 14.63 mpi-1.0.1 | mpich 6 KB conda-forge flashinfer-ai#15 14.63 mpi4py-4.1.1 |py312hd0af0b3_102 865 KB conda-forge flashinfer-ai#15 14.63 mpich-4.3.2 | h79b1c89_100 5.4 MB conda-forge ``` which now matches what we had before  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes

prefill python api

effb698

yzh119 changed the title ~~prefill python api~~ PyTorch API for Single Request Prefill Operator Nov 27, 2023

yzh119 approved these changes Nov 27, 2023

View reviewed changes

yzh119 merged commit 7cae480 into flashinfer-ai:main Nov 27, 2023

wangbo981016 pushed a commit to meituan-longcat/flashinfer that referenced this pull request Feb 5, 2026

fused_ag支持非量化模式

553235a

(flashinfer-ai#17) Co-authored-by: yangxurui <yangxurui@meituan.com>

kahyunnam mentioned this pull request May 2, 2026

DGX Spark (SM121) Current Support Audit #3170

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch API for Single Request Prefill Operator#17

PyTorch API for Single Request Prefill Operator#17
yzh119 merged 1 commit intoflashinfer-ai:mainfrom
Yifei-Zuo:main

Yifei-Zuo commented Nov 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yifei-Zuo commented Nov 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants