Skip to content

PyTorch API for Single Request Prefill Operator#17

Merged
yzh119 merged 1 commit intoflashinfer-ai:mainfrom
Yifei-Zuo:main
Nov 27, 2023
Merged

PyTorch API for Single Request Prefill Operator#17
yzh119 merged 1 commit intoflashinfer-ai:mainfrom
Yifei-Zuo:main

Conversation

@Yifei-Zuo
Copy link
Copy Markdown
Contributor

add prefill kernel pytorch api

@yzh119 yzh119 changed the title prefill python api PyTorch API for Single Request Prefill Operator Nov 27, 2023
@yzh119 yzh119 merged commit 7cae480 into flashinfer-ai:main Nov 27, 2023
yzh119 pushed a commit that referenced this pull request Dec 6, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description
We currently have unit tests failing as:
```
==========================================
Running: pytest --continue-on-collection-errors -s --junitxml=/junit/tests/comm/test_trtllm_mnnvl_allreduce.py.xml "tests/comm/test_trtllm_mnnvl_allreduce.py"
==========================================
Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1665)..............:
MPIDI_OFI_mpi_init_hook(1586):
(unknown)(): Unknown error class
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447
:
system msg for write_line failure : Bad file descriptor
Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1665)..............:
MPIDI_OFI_mpi_init_hook(1586):
...
Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, tvm_ffi.core, markupsafe._speedups, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, mpi4py.MPI (total: 22)
!!!!!!! Segfault encountered !!!!!!!
...

❌ FAILED: tests/comm/test_trtllm_mnnvl_allreduce.py
```
These tests should be skipping in a single GPU environment, but are
failing, which indicates that they are failing at MPI module load time.

The current `dockerfile.cuXXX` installs MPI via `RUN conda install -n
py312 -y mpi4py`. Upon investigating the docker build logs,

[A month ago (Nov.
4)](https://github.com/flashinfer-ai/flashinfer/actions/runs/19084098717/job/54520197904#step:6:802),
```
#17 13.68     mpi-1.0.1                  |            mpich           6 KB  conda-forge
#17 13.68     mpi4py-4.1.1               |py312hd0af0b3_100         866 KB  conda-forge
#17 13.68     mpich-4.3.2                |     h79b1c89_100         5.4 MB  conda-forge
```
was being installed, [but
yesterday](https://github.com/flashinfer-ai/flashinfer/actions/runs/19960576464/job/57239792717#step:6:673):

```
#17 13.59     impi_rt-2021.13.1          |     ha770c72_769        41.7 MB  conda-forge
#17 13.59     mpi-1.0                    |             impi           6 KB  conda-forge
#17 13.59     mpi4py-4.1.1               |py312h18f78f0_102         864 KB  conda-forge
```
is being installed.

The mpich vs. impi are Implementations to the MPI: MPICH vs. Intel MPI.
This is currently the suspected issue underlying the MPI load failures.

Current PR specifies the MPI implementation via `RUN conda install -n
py312 -y mpi4py mpich`. The result of the current PR produces ([build
log](https://github.com/flashinfer-ai/flashinfer/actions/runs/19976372640/job/57293423165?pr=2182#step:6:436)):

```
#15 14.63     mpi-1.0.1                  |            mpich           6 KB  conda-forge
#15 14.63     mpi4py-4.1.1               |py312hd0af0b3_102         865 KB  conda-forge
#15 14.63     mpich-4.3.2                |     h79b1c89_100         5.4 MB  conda-forge
```

which now matches what we had before

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
wangbo981016 pushed a commit to meituan-longcat/flashinfer that referenced this pull request Feb 5, 2026
 (flashinfer-ai#17)

Co-authored-by: yangxurui <yangxurui@meituan.com>
BingooYang pushed a commit to BingooYang/flashinfer that referenced this pull request Mar 13, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description
We currently have unit tests failing as:
```
==========================================
Running: pytest --continue-on-collection-errors -s --junitxml=/junit/tests/comm/test_trtllm_mnnvl_allreduce.py.xml "tests/comm/test_trtllm_mnnvl_allreduce.py"
==========================================
Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1665)..............:
MPIDI_OFI_mpi_init_hook(1586):
(unknown)(): Unknown error class
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447
:
system msg for write_line failure : Bad file descriptor
Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1665)..............:
MPIDI_OFI_mpi_init_hook(1586):
...
Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, tvm_ffi.core, markupsafe._speedups, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, mpi4py.MPI (total: 22)
!!!!!!! Segfault encountered !!!!!!!
...

❌ FAILED: tests/comm/test_trtllm_mnnvl_allreduce.py
```
These tests should be skipping in a single GPU environment, but are
failing, which indicates that they are failing at MPI module load time.

The current `dockerfile.cuXXX` installs MPI via `RUN conda install -n
py312 -y mpi4py`. Upon investigating the docker build logs,

[A month ago (Nov.
4)](https://github.com/flashinfer-ai/flashinfer/actions/runs/19084098717/job/54520197904#step:6:802),
```
flashinfer-ai#17 13.68     mpi-1.0.1                  |            mpich           6 KB  conda-forge
flashinfer-ai#17 13.68     mpi4py-4.1.1               |py312hd0af0b3_100         866 KB  conda-forge
flashinfer-ai#17 13.68     mpich-4.3.2                |     h79b1c89_100         5.4 MB  conda-forge
```
was being installed, [but
yesterday](https://github.com/flashinfer-ai/flashinfer/actions/runs/19960576464/job/57239792717#step:6:673):

```
flashinfer-ai#17 13.59     impi_rt-2021.13.1          |     ha770c72_769        41.7 MB  conda-forge
flashinfer-ai#17 13.59     mpi-1.0                    |             impi           6 KB  conda-forge
flashinfer-ai#17 13.59     mpi4py-4.1.1               |py312h18f78f0_102         864 KB  conda-forge
```
is being installed.

The mpich vs. impi are Implementations to the MPI: MPICH vs. Intel MPI.
This is currently the suspected issue underlying the MPI load failures.

Current PR specifies the MPI implementation via `RUN conda install -n
py312 -y mpi4py mpich`. The result of the current PR produces ([build
log](https://github.com/flashinfer-ai/flashinfer/actions/runs/19976372640/job/57293423165?pr=2182#step:6:436)):

```
flashinfer-ai#15 14.63     mpi-1.0.1                  |            mpich           6 KB  conda-forge
flashinfer-ai#15 14.63     mpi4py-4.1.1               |py312hd0af0b3_102         865 KB  conda-forge
flashinfer-ai#15 14.63     mpich-4.3.2                |     h79b1c89_100         5.4 MB  conda-forge
```

which now matches what we had before

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants