Add daily lib integration test#2601
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2601
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit a8161ba with merge base 376d6d2 ( CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| pip install . | ||
| # to not interfere with pytorch version | ||
| git clone https://github.com/vllm-project/vllm.git | ||
| cd vllm |
There was a problem hiding this comment.
is this installing vllm into the torchao directory? if yes, maybe we can change it to install it side by side instead?
root
/torchao
/vllm
There was a problem hiding this comment.
oh OK, makes sense
| include: | ||
| - name: SM-89 | ||
| runs-on: linux.g6.4xlarge.experimental.nvidia.gpu | ||
| torch-lib-spec: '--pre fbgemm-gpu-genai --index-url https://download.pytorch.org/whl/nightly/cu126' |
There was a problem hiding this comment.
seems like this is testing nightly pytorch + latest torchao + latest vllm?
how about switching it to stable pytorch + latest torchao + stable vllm, so the only thing moving often is torchao? Otherwise I feel it will be noisy.
There was a problem hiding this comment.
let me switch a bit later or add the stable version test later, since we are actively adding new things now, we need fixes from fbgemm-gpu-genai
Summary: * We are separating integration tests since it tends to be more noisy than the other tests * run it daily instead of on every PR to reduce the cost of running the tests Test Plan: CI Reviewers: Subscribers: Tasks: Tags:
609b808 to
a8161ba
Compare
| git submodule update --init --recursive | ||
| python use_existing_torch.py | ||
| pip install -r requirements/build.txt | ||
| pip install --no-build-isolation -e . |
There was a problem hiding this comment.
Per my comment in #2463 (comment), please don't land this PR with pip install --no-build-isolation -e . vLLM from source as it kills our H100 cluster. I have created an issue on our end to have better isolation for this case https://github.com/pytorch-labs/pytorch-gha-infra/issues/766.
There was a problem hiding this comment.
Alternatively, we could consider using vLLM Docker image to avoid building it altogether #2610
Summary:
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags: