Skip to content

python312Packages.triton*: 3.0.0 -> 3.1.0#349159

Merged
GaetanLepage merged 1 commit intoNixOS:masterfrom
GaetanLepage:openai-triton
Oct 17, 2024
Merged

python312Packages.triton*: 3.0.0 -> 3.1.0#349159
GaetanLepage merged 1 commit intoNixOS:masterfrom
GaetanLepage:openai-triton

Conversation

@GaetanLepage
Copy link
Copy Markdown
Contributor

@GaetanLepage GaetanLepage commented Oct 16, 2024

Things done

Diff: triton-lang/triton@91f24d8...cf34004

Changelog: one day maybe

cc @SomeoneSerge

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Oct 16, 2024
@DerDennisOP
Copy link
Copy Markdown
Contributor

lgtm

@ofborg ofborg bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. labels Oct 17, 2024
@GaetanLepage
Copy link
Copy Markdown
Contributor Author

It builds lol. First try X)

@SomeoneSerge
Copy link
Copy Markdown
Contributor

❯ nom build -I nixpkgs=https://github.com/GaetanLepage/nixpkgs/archive/openai-triton.tar.gz -f "<nixpkgs>" python3Packages.triton.gpuCheck
...
triton-pytest> FAILED language/test_line_info.py::test_line_info[dot_combine] - subprocess.CalledProcessError: Command '['/nix/store/97ba8v5ffpm7r7z8grpaqj...
triton-pytest> FAILED language/test_subprocess.py::test_print[device_print_large-int32] - assert False
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-True-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-True-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-False-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-True-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-True-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-False-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED tools/test_aot.py::test_compile_link_matmul_no_specialization - subprocess.CalledProcessError: Command '['./test', '/build/tmpd7xmo9c2/a.cs...
triton-pytest> FAILED tools/test_aot.py::test_compile_link_matmul - subprocess.CalledProcessError: Command '['./test', '/build/tmph4nb393y/a.cs...
triton-pytest> FAILED tools/test_aot.py::test_launcher_has_no_available_kernel - AssertionError: assert -11 == -6
triton-pytest> FAILED tools/test_aot.py::test_compile_link_autotune_matmul - subprocess.CalledProcessError: Command '['./test_0', '/build/tmp0g352yw6/a....
triton-pytest> === 23 failed, 9134 passed, 2196 skipped, 181 warnings in 1224.10s (0:20:24) ===
error: builder for '/nix/store/gvcldv6vz16xkr2cpjrwp3a76isg00ph-triton-pytest-3.1.0.drv' failed with exit code 1;
       last 10 log lines:
       > FAILED operators/test_flash_attention.py::test_op[True-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-True-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-True-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-False-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED tools/test_aot.py::test_compile_link_matmul_no_specialization - subprocess.CalledProcessError: Command '['./test', '/build/tmpd7xmo9c2/a.cs...
       > FAILED tools/test_aot.py::test_compile_link_matmul - subprocess.CalledProcessError: Command '['./test', '/build/tmph4nb393y/a.cs...
       > FAILED tools/test_aot.py::test_launcher_has_no_available_kernel - AssertionError: assert -11 == -6
       > FAILED tools/test_aot.py::test_compile_link_autotune_matmul - subprocess.CalledProcessError: Command '['./test_0', '/build/tmp0g352yw6/a....
       > === 23 failed, 9134 passed, 2196 skipped, 181 warnings in 1224.10s (0:20:24) ===
       For full logs, run 'nix log /nix/store/gvcldv6vz16xkr2cpjrwp3a76isg00ph-triton-pytest-3.1.0.drv'.
┏━ 1 Errors: 
┃ error: builder for '/nix/store/gvcldv6vz16xkr2cpjrwp3a76isg00ph-triton-pytest-3.1.0.drv' failed with exit code 1;
┣━ Dependency Graph showing 1 of 2 roots:
┃          ┌─ ✔ triton-llvm-19.1.0-rc1 ⏱ 55m57s
┃       ┌─ ✔ python3.12-triton-3.1.0 ⏱ 5m0s
┃       ├─ ✔ magma-2.7.2 ⏱ 1h10m27s
┃    ┌─ ✔ python3.12-torch-2.4.1 ⏱ 1h30m28s
┃ ┌─ ✔ python3-3.12.6-env ⏱ 2s
┃ ├─ ⏸ source
┃ ⚠ triton-pytest-3.1.0 failed with exit code 1 after ⏱ 20m27s in checkPhase

No regression compares to the previous PR

@GaetanLepage
Copy link
Copy Markdown
Contributor Author

No regression compares to the previous PR

Is it sarcastic or was this test really already broken ?

@SomeoneSerge
Copy link
Copy Markdown
Contributor

Is it sarcastic or was this test really already broken ?

An understandable confusion! Yes it was broken, which I just took as a given since we've never run the pytest suite before ("pytorch is our test"):

... #328247 (comment)
I currently observe about 20 tests failing and spitting out junk: https://gist.github.com/SomeoneSerge/f9bd9ececc3a16438bd087edadf0fef4

The "OutOfResource" parts can be probably put in disabledTests, but iirc a few tests spat out outright gibberish

@DerDennisOP
Copy link
Copy Markdown
Contributor

DerDennisOP commented Oct 17, 2024

Is it sarcastic or was this test really already broken ?

An understandable confusion! Yes it was broken, which I just took as a given since we've never run the pytest suite before ("pytorch is our test"):

... #328247 (comment)
I currently observe about 20 tests failing and spitting out junk: https://gist.github.com/SomeoneSerge/f9bd9ececc3a16438bd087edadf0fef4

The "OutOfResource" parts can be probably put in disabledTests, but iirc a few tests spat out outright gibberish

The "OutOfResource" Tests work for me, I guess you need more resources. If I build it on 64C/128T CPU with a GPU piped into the nix builder it works.

@SomeoneSerge
Copy link
Copy Markdown
Contributor

The "OutOfResource" Tests work for me, I guess you need more resources. If I build it on 64C/128T CPU with a GPU piped into the nix builder it works.

🥲 24G VRAM too little, rtx 3090 too old

May I ask what your GPU is?

@DerDennisOP
Copy link
Copy Markdown
Contributor

The "OutOfResource" Tests work for me, I guess you need more resources. If I build it on 64C/128T CPU with a GPU piped into the nix builder it works.

🥲 24G VRAM too little, rtx 3090 too old

May I ask what your GPU is?

Specs:
CPU: AMD Epyc 7702P (64C/128T)
RAM: 512 GB (ECC)
STORAGE: 38 TB
GPU: 1x H100
NETWORKING: 600 Mbit/s

@GaetanLepage
Copy link
Copy Markdown
Contributor Author

Specs:
CPU: AMD Epyc 7702P (64C/128T)
RAM: 512 GB (ECC)
STORAGE: 38 TB
GPU: 1x H100
NETWORKING: 600 Mbit/s

Not bad X)

@SomeoneSerge
Copy link
Copy Markdown
Contributor

GPU: 1x H100

Maybe we should consider marking tests with a grading of "system features", e.g. based on vram size (e.g. "cuda-80gb"). We're not doing that yet because there would be many orthogonal dimensions, like cuda capabilities. An alternative is to hard-code GPU names for derivations where it matters

@DerDennisOP
Copy link
Copy Markdown
Contributor

GPU: 1x H100

Maybe we should consider marking tests with a grading of "system features", e.g. based on vram size (e.g. "cuda-80gb"). We're not doing that yet because there would be many orthogonal dimensions, like cuda capabilities. An alternative is to hard-code GPU names for derivations where it matters

in my nix-ai project, I'm using hard-coded GPU names. Its not optimal but the best solution I could find.

@GaetanLepage GaetanLepage merged commit baaa9d5 into NixOS:master Oct 17, 2024
@GaetanLepage GaetanLepage deleted the openai-triton branch October 17, 2024 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants