rope cache optim for jit prune in llm.py#11678
Merged
geohot merged 5 commits intotinygrad:masterfrom Aug 28, 2025
Merged
Conversation
Contributor
|
This branch currently is behind tinygrad/master. The line count difference bot is disabled. |
geohot
reviewed
Aug 22, 2025
test/unit/test_llm_tokenizer.py
Outdated
| def test_llama_repeat(self): self._test_coding(self.llama_tok, "00000000000000000", [ 931, 931, 931, 931, 931, 410 ]) | ||
| def test_llama_pat(self): self._test_coding(self.llama_tok, "today\n \n", [ 31213, 14211 ]) | ||
|
|
||
| def test_apply_rope(self): |
Collaborator
There was a problem hiding this comment.
This isn't a tokenizer test, I'm also not sure what it's really testing.
Collaborator
|
Can you test that it's pruned? |
This reverts commit 69ede54.
Contributor
Author
yes i added the tests here 0f2494b as i think its more appropriate than in llm tokenizer.
|
Contributor
Author
|
@geohot needs more work or good to go? thx |
liej6799
added a commit
to liej6799/tinygrad
that referenced
this pull request
Sep 6, 2025
commit c6c16b294616447238d5d19974bceca52c9f2a40
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sat Sep 6 04:16:12 2025 +0200
`var_vals` uses str for var (#12011)
* var_vals is str,int
* remove imports
* remove print
* fix test
* change var_vals in hcq
* update test_hcq
* fix multitensor _device_num var
* fix syminfer test
* shorten line
* p.vars stays list[Variable]
* shorten line
* vars is back to tuple[Variable, ...]
* change var_vals in extra
* change var_vals from shapetracker
* var_vals is str:int
* fix signature
commit 8658a971970859c6311924de08a13a429a560bfa
Author: George Hotz <geohot@gmail.com>
Date: Fri Sep 5 17:20:24 2025 -0700
hotfix: name the shift rewrite better + no ctx there
commit 6ef3270fc80cae1db234dc040f7398b9d721bb65
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 18:59:54 2025 -0700
fix opt gate (#12050)
commit 66c5206b4226a70a2b6d814713252ebe8462777e
Author: George Hotz <geohot@gmail.com>
Date: Fri Sep 5 18:24:00 2025 -0700
hotfix: minimal scheduler copy
commit 478e7587557b2c65ed288255c00664e0efc6742e
Author: George Hotz <geohot@gmail.com>
Date: Fri Sep 5 18:21:55 2025 -0700
Revert "fix scheduler copy (#12048)"
This reverts commit 51b7c407887d9e7ded6f1d792e5745a1a59b8b9c.
commit 51b7c407887d9e7ded6f1d792e5745a1a59b8b9c
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 17:17:49 2025 -0700
fix scheduler copy (#12048)
* fix scheduler copy
* hand coded opt only runs once
commit 0123c394e5d47fa90d170d44c74bb8d38fd6e301
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 16:39:20 2025 -0700
early simplfy_merge_adjacent (#12045)
* do simplify_merge_adjacent before schedule
* do simplify_merge_adjacent before schedule
* disable that slow test
commit 8423c06144963d4288d3690264f0bdcc195a7191
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 16:08:28 2025 -0700
delete unused bufs_from_lin (#12044)
commit 38dcadf07b4d1ac89015fff7ee24c59c429eeca1
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 15:52:07 2025 -0700
delete kernel.py (#12040)
* delete kernel.py
* delete that file
* rip and tear
* don't test search
* imports
* fix torch frontend
* not a part of regen
commit ee4f696086a1f1c07fdc987fdd9d95ba5d905139
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 15:31:30 2025 -0700
delete more tests (#12043)
* delete more tests
* delete and simplify
* flaky on windows
* a few more, those remained
commit 12c7b1bb01d3da5df048d9ef59baba7412dba495
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 15:13:14 2025 -0700
cleanup lin tests without Kernel (#12041)
* cleanup lin tests without Kernel
* no kernel.py there
* remove that test
commit 8435d2d23bde3328a797c597557bb63abc102c5a
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sat Sep 6 00:05:45 2025 +0200
fix openpilot speed regeression (#12039)
* set local_size=None if special.arg[0]=='i'
* add cast back
commit e00858a2c3fba03c4e239c03a24563d8da0a34d6
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 14:46:33 2025 -0700
only POSTOPT (#12038)
commit 433581f8edcf140a7ae1151955dc55e99d754d72
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 14:34:05 2025 -0700
make POSTOPT=2 the default (#12034)
* make POSTOPT=2 the default
* more matching tc
* fix winograd
* fix that test
* add matvec to Scheduler
* flip tc sort order
* similar speed
* fix beam on image
* disable slow tests
* slow
commit 3b41a04b96485f1812930adb720d9af1b249209a
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 16:20:03 2025 -0400
remove test_openpilot in test_onnx (#12037)
openpilot is tested in compile3
commit 290521f68e4eca2a847c3f6446d05a5d19086518
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Fri Sep 5 20:33:26 2025 +0200
add check for z3>=4.12.4 (#12035)
commit 870f63d9cc9010369323eac199b95262d9afd6c2
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 10:36:55 2025 -0700
add WARP axistype, fix postopt bugs (#12033)
* postopt is 83% match
* warp is bright CYAN
* beautiful mnist beam works
* fix shutdown bug
commit 4c2d4f683a2d0e0e87bfa3ff6466b1b522315377
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 12:19:44 2025 -0400
lower universal_test_unary cos domain (#12032)
flaky
commit a340723bf1834556fc18e87f638b9e80cefb2d73
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 11:52:02 2025 -0400
SKIP_SLOW_TEST=1 for nv CI (#12031)
commit ce7163e9b4be894b42f39e5e647027ad6e8bce6f
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 11:35:26 2025 -0400
clean up skip slow tests in PYTHON (#12028)
skip with SKIP_SLOW_TEST and decorators
commit f08299d2ecd861f73b5dcda9003b9cfde5bab145
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Sep 5 18:29:03 2025 +0300
viz: small profiler resizing improvements (#12026)
* switch to ResizeObserver
* set a fixed size for device-list
* less
* height from devices
* int
* side rect, more const
commit 5dcc4c7f1b75d8cf355812d6aa46d3f6d2d19a48
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 11:28:40 2025 -0400
skip test_linalg in windows unit test (#12030)
commit f8e2dd4dd18f5805ed69a4c7f7841f2cc0822c2d
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Sep 5 07:40:29 2025 -0700
investigate opts mismatches (#12020)
commit e0da6441717cb9ee8b15e9424de31bf0d560f854
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 10:10:28 2025 -0400
lower sample count in test_multinomial (#12027)
commit 9b6f1b86cb0ca1b445063350e16417e6f900ed45
Author: chenyu <chenyu@fastmail.com>
Date: Fri Sep 5 09:48:39 2025 -0400
add Tensor.maximum in test_dtype_alu (#12025)
works except nan
commit 3e1c04bcdf3186996104a267b437205c89ea0c9a
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Fri Sep 5 16:04:35 2025 +0300
jit: noopt for copy buffers (#12023)
commit ab413ce72f2ae01c23c61456097c8962614d3eef
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Sep 5 14:25:38 2025 +0300
viz: give tooltips a max-width (#12022)
* viz: give tooltips a max-width
* better
commit f461ccf407a19fc41aefacf6aa83636629109ebc
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Sep 5 14:14:22 2025 +0300
exclude op2 nan lt in test_dtype_alu (#12024)
failure: https://github.com/tinygrad/tinygrad/actions/runs/17490320000/job/49679581331?pr=12022#step:6:125
commit 4fcea8493dca14e7de349159f18a6cf12116fbe2
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Fri Sep 5 13:06:33 2025 +0300
viz: add label to tooltip (#12021)
commit 2b5a73ac651eba1388b791275d10ce23ced74690
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 20:44:05 2025 -0700
improve test_linearizer (#12016)
* improve test_linearizer
* tweaks
* simpler
* get_prg
* that one doesn't have to return
* fix postopt bugs
* fix rng
commit 7f3df6ea21235fe4dae8ccea784a72b73b6914e5
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 23:38:37 2025 -0400
exclude nan in test_dtype_alu lt (#12019)
commit f5404ca53c82e2221db90158cf6ce1eb3475046f
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Fri Sep 5 03:44:02 2025 +0200
Divmod combine - associative variations (#12017)
* add rule and test
* more rules and tests
* add all four variations
* fix test
* test fixed!
* adjust commment
* add new variations
* disable intel tensor core ops count test for bigger_matmul_half
commit 677220ae7e7de619a474ce6bd68afb21a5f7a0c6
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 19:58:27 2025 -0400
test_tesnor_data to unit/ (#12013)
commit 431666da740021e04af8c12feca0597acd7ded8d
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 16:55:56 2025 -0700
POSTOPT=2 work (#12012)
* POSTOPT=2 work
* bugfixes
* add chain in one place
* tensor cores match
* better hcopt check
* match from old
* Change POSTOPT ContextVar value to 0
* we didn't need to check that
commit 30eb42a69e0d46fed176558214bfa1ea7a0c7f94
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 14:28:58 2025 -0700
fix POSTOPT pad (#11999)
* fix POSTOPT=1
* fix some tests
* Revert "fix some tests"
This reverts commit 8ee058e206c4ee2758b71e015f6535bffa27269b.
* fix padding restrictions
* cuda has two tensor cores
* Set POSTOPT ContextVar to 0 in helpers.py
commit da61b40604dadae662fecd7078308ee49a3342f9
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Thu Sep 4 23:59:32 2025 +0300
some viz tests don't need track_rewrites (#12010)
commit be364a1adb75c5f43c12c33936ba3eff965db50f
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Thu Sep 4 23:29:56 2025 +0300
viz: add default tracing group (#12009)
This enables seeing rewrites in unit tests like `VIZ=1 python3 test/test_uop_graph.py TestUOpGraph.test_in_bounds_access_gated_local` that call graph_rewrite directly.
`@track_rewrites` keeps existing as an optional helper to organize larger traces.
commit 52166fd7eb5f753863e2cc77922d93737c53e960
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 16:22:33 2025 -0400
smaller test_ops inputs (#12007)
commit dc8501af30528a5d0f5a1d9126ae3545b1bb3e6e
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 16:14:55 2025 -0400
clean up wino tests (#12008)
removed the one that tests hcopt and added one for backward kernel counts
commit 8c720e87605d3c9f3d9bd384c2e7df262aa8d8da
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 15:09:17 2025 -0400
less iterations for symbolic double for loops (#12006)
commit 70ce29b6300e193934a62b9bf20afd0680cd0ac4
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 11:48:40 2025 -0700
test pyrender (#12005)
* test pyrender
* make them print
* switch to pyrendered
commit 560df206cc927a97f94abfec140b9a686d802ed8
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 11:47:56 2025 -0700
split tc test (#12003)
* split tc test
* split hand coded opts
* remove some skipped tests
* skips on emulated
commit 4996bb668bf0ca38e7ff94b54dc853d49fecfe8d
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Thu Sep 4 21:34:48 2025 +0300
load all traces before asserting in test_viz (#12004)
commit 9dee724fc45aa37111cbb4622e68d2fba09357a4
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 11:15:43 2025 -0700
make EMULATE a context var (#12002)
* make EMULATE a context var
* fix test amx
commit 09106e4aae09b3fae3148c20ebed41f493dde9aa
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Sep 4 10:53:07 2025 -0700
refactor and split test_linearizer (#12001)
* refactor and split test_linearizer
* forget that file
* imports
* remove from docs
* test gen float4
commit fb71d1e5fd2aa9a44a80a27e127c4b2ba83acdcf
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 11:19:49 2025 -0400
delete some test_search tests (#11998)
TC_SEARCH_OVER_SHAPE was removed so should the tests
commit ca7574cb2de5a71af0fea0c4a64f13c72e644a2b
Author: chenyu <chenyu@fastmail.com>
Date: Thu Sep 4 10:06:04 2025 -0400
ci set PYTHONPATH for all (#11997)
commit e213b858100a9aa34afa8d50a76033d7897efec3
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Sep 4 14:58:13 2025 +0300
cpu: add thread_id to worker (#11995)
commit 35f37a64a95b2e0d47764e0da04bc5fb129ba714
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Thu Sep 4 14:56:41 2025 +0300
viz: remove useless ctx.save and restore calls (#11996)
It's a UI no-op since we always set the styles right before drawing.
commit 572a3c15c6fc78911649a2ee174a3624556728ba
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Thu Sep 4 09:31:44 2025 +0200
Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src
* arg to src
* remove import
* fixup linearizer
* arg to src
* fix test_uop_graph
* fix more tests
* fix python renderer
* get const value from const uop
* ssimplify uop estimates
* fix webgpu locals
* fix old test
* gate Ops.SPECIAL in linearizer
* use ssimplify() for local/global_size
* remove toposort gate_parents_instead_of_self
* fix rendering in comment
* cleanup
* rename and add comments
* add BottomUpGate with test
commit 5cf42dc4db466ffcfd55e1bc43e5e1d83f6942e7
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Sep 3 19:23:30 2025 -0700
add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt
* support old
* fix beam
* beaming
* beam on old
* bring tensor cores back
* raise
* postbeam
* test ops passes on mac
* skip that
* postopt default
* gate that
* fix tensor cores
* a few test fixes
* dsp fix
* tc fix
* loop
* support swap
* test_gemv
* fix beam for variable
* test opts from high level stuff
* range annoying
* compile slow
* metal slow
* better beam
* no POSTBEAM
* fix nolocals
* hc opt mostly works
* put that back
* lil
* some work
* fix that
* POSTOPT 2
* fix tests
* no postopt 2
* work
* back
* padded tensors cores
* shift_to
* postopt 0 passes?
* write PADTO
* fix padded tensor cores
* compare hcopt
* 18000 lines
* should pass tests
* fix rangeify
* put types back
commit b13e0714635aafdc5599b7ddfcfa0e40f2b2a91f
Author: chenyu <chenyu@fastmail.com>
Date: Wed Sep 3 21:47:32 2025 -0400
move test_winograd to unit test (#11993)
commit edc8b9985389893802ceba3a2591362e52a41c88
Author: chenyu <chenyu@fastmail.com>
Date: Wed Sep 3 21:18:14 2025 -0400
more tests that pass PTX now (#11992)
commit ed2f45712b8597f5ea217e8a3ae28b2b428ae6e6
Author: chenyu <chenyu@fastmail.com>
Date: Wed Sep 3 20:45:19 2025 -0400
remove skip PTX in test_arange (#11991)
all passes now
commit a5f2b4872abda7cdcc54870883fb8a9ccab3d673
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Sep 3 17:05:10 2025 -0700
use_tensor_cores is a heuristic (#11989)
* use_tensor_cores is a heuristic
* context
commit 63e930fec3fe7148b92bac3c360b3ec9fc01ba42
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Sep 3 16:39:33 2025 -0700
apply_tensor_cores is a heuristic (#11988)
* apply_tensor_cores is a heuristic
* delete extra_opts
commit d0e739453ebbca697d8b9aa3e0738c34e4242a5d
Author: chenyu <chenyu@fastmail.com>
Date: Wed Sep 3 15:40:20 2025 -0400
update many einsum tests (#11981)
correct the exception testing, and raise ValueError instead of assert when checking args
commit 55e4bdd353e83e411b99e7d2e9680ab41ac984d4
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Sep 3 10:46:17 2025 -0700
split_uop is a method (#11984)
commit 1877eddde4979cadfa9469937257b699ac6bf6b4
Author: ttomsa <tomasvsilva8@gmail.com>
Date: Wed Sep 3 18:04:23 2025 +0100
broadcast for upat (#11940)
commit 5ed262982af6c1a621dabc5cee96a8b755a9a122
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Sep 3 09:59:10 2025 -0700
remove some tc hacks from BEAM (#11980)
* remove some tc hacks from BEAM
* cosmetic changes
* revert that
commit 6d53cac45716e62619d083029063805a141ae164
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Thu Sep 4 00:10:42 2025 +0800
dtype fuzz: log need input > 0 (#11979)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
commit 68e83b850f31b7735e0735a6a51771905cdd21f5
Author: Jordan Chalupka <9794216+jordan-chalupka@users.noreply.github.com>
Date: Wed Sep 3 10:06:20 2025 -0400
nbytes should raise an exception when size is unlimited (#11928)
* nbytes should raise an exception when size is unlimited
* adding a test
commit 86e908db5769715780d7246cc0815f87cd9902da
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Wed Sep 3 11:05:04 2025 +0200
cast parents of int64 alu to int32 if possible (#11977)
* add overflows helper
* add rules
* x -> y
* check overflow of u too
* cleaner
* use alu instead of replace to preserve vectorization
* just one rule
* add test
commit 033184b3cb6a05d24b8691e882e388f995ed3e84
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Wed Sep 3 08:08:46 2025 +0200
parse_valid with non const rhs (#11957)
* const to using vmin/vmax
* add test
* convert to int
* remove left over part of and
commit 53eff8970a5785aa5cee588d4264cccf27c63378
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Wed Sep 3 07:07:54 2025 +0200
add Ops.GEP to _min_max (#11976)
commit d1d0960e6eae870af5f23ef57bea815041a2fac8
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Wed Sep 3 06:24:40 2025 +0200
remove intermediate cast using bounds - weaker pattern (#11974)
commit 8a2846b31a47cae96f49f5e4671f54aa58293e00
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Wed Sep 3 01:44:26 2025 +0200
assert embedding input is integer dtype (#11963)
* cast embedding input
* raise error if not using int for index embedding
commit d16cc6c0123c81db77e1f7c1032ade88edfa5f05
Author: wozeparrot <wozeparrot@gmail.com>
Date: Tue Sep 2 15:47:48 2025 -0700
feat: resume ckpt (#11970)
commit 1b73993521737b54c936593b583bfa169e8384a6
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Sep 2 15:44:01 2025 -0700
pyrender to render uops (#11968)
* pyrender to render uops
* new pyrender style
* pyrender works
* list str
* store render
commit e921fb44ee46ae18377f7cb3b40f0311950d092f
Author: chenyu <chenyu@fastmail.com>
Date: Tue Sep 2 18:29:00 2025 -0400
clean up testnvidia env (#11969)
commit 69dd1817d011d70c1949098ad0ec52ad65d2e3d3
Author: chenyu <chenyu@fastmail.com>
Date: Tue Sep 2 17:18:44 2025 -0400
raise RuntimeError in merge_dicts instead of assert [pr] (#11965)
commit f750c1596536bac906099170a5a4b12223373443
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Sep 2 23:44:00 2025 +0300
viz: add python marker (#11952)
* viz: add python marker
* remove duplicate
commit 550cf2ca7fe1c0fdfebd658dba5c9e479b22d8c0
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Sep 2 13:34:17 2025 -0700
tests from postopt (#11964)
* tests from postopt
* reraise is fine
commit b977ec0813cbb6f892ff1af7a766a9417bde1d71
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Sep 2 19:30:45 2025 +0300
viz: axes domains cleanup (#11962)
commit 897254ad6caa10dc4db4d012f88de47600c4656b
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Tue Sep 2 15:22:44 2025 +0300
ci: add dev<->cpu copy speeds (#11959)
commit 74040663bfae71aeb8aed6a11a842273963976d5
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Mon Sep 1 16:35:43 2025 -0700
make ptrdtype a UOp property (#11955)
commit 0dfca4e74bdb93b00e8c2fbbb2e715a45659902c
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Mon Sep 1 16:24:35 2025 -0700
add failing test for rangeify setitem (#11954)
commit 7c21271a5f456fa1adb801d69a1765fa6ee660e5
Author: wozeparrot <wozeparrot@gmail.com>
Date: Mon Sep 1 14:53:07 2025 -0700
feat: end_lr envvar (#11953)
commit 6a40216724d6dbcbd1c7bbbc233f24c185850a49
Author: chenyu <chenyu@fastmail.com>
Date: Mon Sep 1 10:52:26 2025 -0400
correct bf16 fuzz input in test_dtype_alu (#11933)
it was using float16 inputs, now it's uint16 then convert to bf16
commit 965ea59b16679793b8f48368ac24c4a0ef587e71
Author: chenyu <chenyu@fastmail.com>
Date: Mon Sep 1 10:03:17 2025 -0400
test_dtype_alu use AMD_LLVM from helpers (#11950)
commit a9f07c31bce2208246ed2bd1cfe5d91b6ea6efcc
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Mon Sep 1 21:31:14 2025 +0800
fix amd llvm sqrt (#11936)
* fix amd llvm sqrt
* lint
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
commit 0a53e72f709baa46ead764364070eb85f631fa88
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Mon Sep 1 14:32:25 2025 +0300
viz: fix trace duration in python test decoder (#11949)
commit 27c9ed5a844717e325fb22c65ad8aecf71a653ec
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Mon Sep 1 14:16:47 2025 +0300
viz: more consistent naming of events (#11948)
* s/shapes/events in test_viz
* s/bufs/events in the memory packer
commit c7bb561ef9c853924dd0dbdaff1036831a906d5c
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Mon Sep 1 11:29:31 2025 +0300
remu: add v_rsq_f32_e32 instruction (#11947)
https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to
the AMD LLVM renderer that outputs this instruction. Adding both 32 and
64 bit variants.
commit d9560a631c2f934c739a26d3eb243e9d68c9c1b4
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Mon Sep 1 05:56:49 2025 +0200
remove cast between ints if safe (#11946)
commit a19d689481b19b05ee7a2c9625708eeb7fb174ee
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Mon Sep 1 03:24:07 2025 +0200
fix vec dtype _min_max (#11944)
commit f32f3464d67942a6907eadadf3e0869691f96bf7
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Mon Sep 1 00:51:24 2025 +0200
Can safe cast from certain ints to floats (#11941)
* add rule
* add some tests
* prevent infinite loop with bfloat16
* add some ints to double and float can_safe_cast
* add tests
commit 1c6e43c2034aba45a9e3afe554d1545825434711
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Mon Sep 1 00:36:29 2025 +0200
Double cast is one cast if intermediate cast is safe (#11939)
* add rule
* add some tests
* prevent infinite loop with bfloat16
* prevent more infinite rewrite
commit 7e68045fb2129f56b4e25e28ce2882ca28874976
Author: wozeparrot <wozeparrot@gmail.com>
Date: Sun Aug 31 13:41:47 2025 -0700
feat: small llama3 training (#11829)
commit 020abe05568ba7a431669478eee64b69b15cf61f
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sun Aug 31 18:39:13 2025 +0300
hcq: finalize without synchronization when in error state (#11872)
* hcq: finalize without synchronization when in error state
* ooops
* fix
* fix
* fix
commit 2004c9757d650bd152d1255c2ff440b0066ebf8e
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sun Aug 31 18:24:44 2025 +0300
tracing: add default clock (#11935)
commit c1eeb3b99cac6cf9ce295bb354dcc81e95c0f424
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Sun Aug 31 23:15:47 2025 +0800
only skip AMD_LLVM (#11934)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
commit 75d380a77c21174183301e8e18ef37832a912207
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Sun Aug 31 21:37:17 2025 +0800
fix transcendentals in python renderer (#11932)
* fix transcendentals in python renderer
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
commit 61e4dc6ad5db451e646b31e3b0272fe8f440940d
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 31 07:01:29 2025 +0200
render special arg in cstyle if arg is UOp (#11931)
commit d3252ccd85d31a76e7da73e9b54ea2ba1fbf1c7c
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 31 06:54:39 2025 +0200
fix special vmax when arg is UOp (#11930)
commit 0bacd9fc9bba946ea726f569162d44ce8e410144
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sun Aug 31 00:28:52 2025 +0300
viz: give disassembly its own node (#11927)
commit af89be317eb78f43acb2b6f9589b80a94bb857a7
Author: chenyu <chenyu@fastmail.com>
Date: Sat Aug 30 17:16:08 2025 -0400
relax rtol for bfloat16 test_dtype_alu (#11926)
commit 632c2fb119dc54bfebdda58f0df4692bc0c443b3
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 30 12:05:44 2025 -0700
lowerer works on rangeifed + print exception (#11925)
commit c27b99d68f5d070aa1e789aa2ad30aaf5cbc00ff
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 30 20:01:47 2025 +0300
viz: refactor to indexed rewrite traces (#11923)
commit 9aff00a6ea34b87c9982667c021416672dbedea3
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 30 18:13:47 2025 +0300
switch viz command line args to pathlib (#11922)
commit c86ee5bfafee7d946989447fd4046b532e5133b2
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 30 18:12:30 2025 +0300
viz: canonicalize device name colors (#11921)
commit a4f05ebd1a4cc4e5334e40d6eb095321bd32a938
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sat Aug 30 17:24:19 2025 +0300
ci: rebuild gpuocelot with boost libs (#11920)
commit bf0d055b398f4466961473e8fca123bd5b9376d8
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 30 16:04:58 2025 +0300
viz: color by name (#11919)
commit 0bc34c000f49a1ddb0895f10e5a29322b795d8cd
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sat Aug 30 08:37:35 2025 +0200
simplify range mod its own upper bound (#11917)
* add rules
* add tests
commit 561318fea7a53b35427d5e9063f1a476f50734f5
Author: chenyu <chenyu@fastmail.com>
Date: Fri Aug 29 20:26:36 2025 -0400
Tensor.cos in test_stype_alu (#11916)
* Tensor.cos in test_stype_alu
* need this fix anyway
commit 08380217538576ea61bef21d89a318096b8741cc
Author: NoahKusaba <97474796+NoahKusaba@users.noreply.github.com>
Date: Fri Aug 29 19:34:16 2025 -0400
remove np from beautiful_cifar (#10988)
* remove np from beautiful_cifar
* remove np from cifar
* rename variable and rename tensor.arrange to just tensor.randperm
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
commit cf9d8c814229225f0f9a9c0f96df8dfcdaa544a7
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sat Aug 30 01:38:06 2025 +0300
ci: pin boost for macos runners (#11910)
commit c6e342cdac66602c00f362cbedef7ee0a5e0468f
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sat Aug 30 00:44:49 2025 +0300
mockgpu: no hang if gpuocelot failed (#11915)
commit 26d03a86a18e589012f3093233dfbb6982575813
Author: chenyu <chenyu@fastmail.com>
Date: Fri Aug 29 17:11:59 2025 -0400
test_symbolic_ops.py cleanup (#11895)
commit b2cc06218a14e0ed2c8ee4891e395d998e06d6fb
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Sat Aug 30 03:18:02 2025 +0800
python bfloat16 (#11912)
* python bf16
* _to_torch_storage_type
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
commit afad7d0cd15a73c728f2a8fa95bf2cdc817a96a4
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Fri Aug 29 09:52:07 2025 -0700
remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]
* a few more
commit 30e72d58206a32184dc547042b392c8b5a80dd56
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Aug 29 15:31:00 2025 +0300
multi device and copy tracing for NULL device (#11913)
* add device name to NULL programs
* trace transfers
commit d8e1e4dc61fd5ff9f291bbeb0d2ca022199da0f9
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Aug 29 14:09:33 2025 +0300
tracing: show NULL programs (#11911)
commit 75678b2cbe9793f36b032fe8e6bc59dc27b06015
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Fri Aug 29 09:56:27 2025 +0300
amd: retire pm4 xcc sync (#11835)
* amd: aql default when several xccs
* amd: retire om4 xcc sync
* remove more
* more
* more
commit 394c2d1db114dcbe80262f8ddb76c3b5947af7bf
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 28 15:12:47 2025 -0700
update Kernel API in tests + move optimize_local_size (#11907)
commit fa695ac1ce4e1c0db945d97509e29fcb0acad04e
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Aug 28 23:29:43 2025 +0300
ci: mac gpuocelot (#11906)
* gm
* fix?
* ops
* imp
* xx
* add file
commit b9b438c516e132e0b1900159fd5df56fbff7769a
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 28 12:34:52 2025 -0700
small updates from postopt (#11903)
* tests from postopt
* modernize
* skip lin tests
* that's fixed?
* skip, not failure
commit bb55a3001f6b895970d04ef2ce00991f26abcef2
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Aug 28 22:17:20 2025 +0300
nv: flush reset message (#11897)
commit e8289c75b13173ab951f5d360c6731d2f19fe67c
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Aug 28 21:20:15 2025 +0300
ci: do not reinstall existing pkgs in macos (#11900)
commit 134cf5690464a1e7fff510db9a7eec7b0677cf3d
Author: chenyu <chenyu@fastmail.com>
Date: Thu Aug 28 13:11:10 2025 -0400
update cache name for gpuocelot (#11896)
commit ea1be2e4cdee9b44cdd4ba0e1265eaa570cd638a
Author: Ben Waldron <140399313+ben-waldron-1@users.noreply.github.com>
Date: Thu Aug 28 16:30:49 2025 +0000
[bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape
* Refactor symbolic reshape
* fix small error
* much cleaner + fix more tests
* Can remove this now
* Update test_symbolic_ops and test_tiny
* Couple more tests
* Unused import
* More tests and add EXPAND to Tensor.empty
* Fix test beam search
* all int
* Fix rangeify by adding shrink
* Remove OOB check and so fix test_symbolic_jit
* test_symbolic_jit doesn't need OOB Context anymore either
* Should remove that test now
* Cleanups part 1
* fix linters
* Final cleanups
* Don't reassign inside for loop
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
commit 53853ae49bbb2ed530660d35df161662f983f91b
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Thu Aug 28 18:58:16 2025 +0300
viz: switch to Path2D (#11892)
commit 874c1db4afc333dccac7788d91a15b687c759203
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Aug 28 18:41:46 2025 +0300
am: init support for aql (#11888)
commit 17ecaf4682fa3bf5960bd5291d5fa27e4af29c0b
Author: Ben Waldron <140399313+ben-waldron-1@users.noreply.github.com>
Date: Thu Aug 28 15:38:27 2025 +0000
Add test_variable_empty (#11889)
* Add test_variable_empty
* Move test and add TODO
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
commit 54be477152e5ef9070f705ab5f4b4c96805de856
Author: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Date: Thu Aug 28 17:31:29 2025 +0200
rope cache optim for jit prune in llm.py (#11678)
* rope cache optim for jit prune
* rope test
* tests in test attention
* Revert "rope test"
This reverts commit 69ede543d02558620ff08fde55d6e50995fca690.
* lint
commit 5f8fe9a331b787196bfba373eec928ad1fb50831
Author: quortus <156855065+quortus@users.noreply.github.com>
Date: Thu Aug 28 16:33:20 2025 +0200
Replace ASSIGN with STORE in test_linearizer (#11821)
commit 4e8370309cd665968b986dfb729eb4ee1693c885
Author: geohotstan <135171913+geohotstan@users.noreply.github.com>
Date: Thu Aug 28 22:17:35 2025 +0800
Support onnx If OP (#11648)
* start
* tiny clean up
* whoops, didn't mean to accidentally fix this
* fix .to(device), kinda hacky and this fix makes it slower?
* merge properly
* FINALLY figured out slowness, also hack pylint for now
* add DEBUGONNX print for subgraph
* oops
* WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE
* small fix, but maybe all deterministic Tensor creation in fp should be cached
* cache condition
* sliiiightly cleaner
* better abstraction?
* remove sam from model_benchmark
* remove shape cache speed up for now
* less lines
* isinstance fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
commit 6d6f0dada7d611a6ad30d6ae67e9e3612311f747
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 28 07:02:31 2025 -0700
support for tuple ranges (#11890)
* support for tuple ranges
* breaks it
commit 60dd9a162c92ce1e2e8db34c2367b2f7bc3eebde
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Aug 28 14:07:18 2025 +0300
memory: tiny tlsf cleanup (#11887)
commit beb5982165104af9369493105b003e8e5b44b976
Author: chenyu <chenyu@fastmail.com>
Date: Wed Aug 27 19:59:17 2025 -0400
FUSE_ATTENTION (#11884)
commit cb5295168d4f5814f6dfa7d3f9e2fdfd5a4ef120
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Aug 27 15:22:59 2025 -0700
postrange boilerplate work (#11881)
commit fd579433bcd66a2170a5f47c54e17405e9e70360
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Aug 27 14:52:24 2025 -0700
pre expander shouldn't go in gpudims (#11880)
commit 44816218b55c2c0c02e8e2225aa67e9f81970bc8
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Wed Aug 27 23:54:27 2025 +0300
memplan: fix large buffers planning (#11878)
* memplan: fix large buffers planning
* fix
* fix dsp
commit 400636675279e218269da693fe509d626409787d
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Wed Aug 27 22:36:14 2025 +0300
Revert "memplan: fix large buffers planning (#11876)" (#11877)
This reverts commit 7f90497efcb5f40f13c88b0bd6c61975945ee137.
commit 7f90497efcb5f40f13c88b0bd6c61975945ee137
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Wed Aug 27 22:04:15 2025 +0300
memplan: fix large buffers planning (#11876)
* memplan: fix large buffers planning
* fix
commit e4afdf9ea18ded914ac2d3ecd5313e37744e4823
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Aug 27 11:42:41 2025 -0700
improve DEBUG=2 string with TB/s and TFLOPS [pr] (#11875)
commit e9789d8a707f25b956d2b71dce1fb3397875e30d
Author: Jordan Chalupka <9794216+jordan-chalupka@users.noreply.github.com>
Date: Wed Aug 27 13:56:56 2025 -0400
Add mxfp4 support (#11873)
* bump ggml url
* map mxfp4 to tensor
* tests
commit 884eb53e892046e449f380c86a91ec1ac50813cd
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Wed Aug 27 15:50:43 2025 +0300
tracing: fix types (#11871)
* tracing: fix types
* /profiler isn't a thing
* return list
commit d39365809ae344f91b6c41993b2e72866bda7416
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Wed Aug 27 03:38:15 2025 +0200
add ctx to z3_renderer arg (#11867)
* add ctx to z3_renderer arg
* update symbolic fuzzer
* rewrite u1,u2,u3
* update fuzz_fast_idiv
* remove imports
commit 24c00a40612f5266f9e189617c41c82487dbe6bb
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 26 15:57:50 2025 -0700
darken hex on viz (#11865)
* darken hex on viz
* more readable
commit f38e4af22615500fb72c876b08823cbe7100581c
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Wed Aug 27 01:30:29 2025 +0300
viz: add custom zoom filter (#11861)
commit 62df6c39aff0cbb2c02bb4e473bde652a5dc0f54
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Wed Aug 27 01:26:45 2025 +0300
amd: correct handling of relocations (#11863)
* amd: correct handling of relocations
* ops
* add
commit d261458ecd49562c7bac4e293a625b55cc2dc6fb
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 26 14:32:12 2025 -0700
add colors to range (#11860)
commit 7dfc7e4abcc5204fa741ab2bc1f50a11001fe3fa
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Tue Aug 26 22:58:05 2025 +0200
uops_to_z3 helper(#11859)
commit 1bbb578afd39465132a167e1c0d443705c4a64f4
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 26 16:03:03 2025 -0400
named expression for POW and MAX gradient (#11858)
commit 7028cb41677b66a9b80ef05ecbd87b309dee7393
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 26 15:26:47 2025 -0400
clean up TestBitcastConstFolding (#11856)
commit d4154e0349f3854bca7583275c3c79e239e5d55f
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 26 12:05:48 2025 -0700
split devectorizing of buf/index (#11855)
commit b268755d51d73543ee7bb7586bbae847ff39cd5c
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 26 11:56:16 2025 -0700
small changes from postopt (#11854)
commit a3aeef45ccac58a17df1893e3b91159930235ec4
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Tue Aug 26 19:27:05 2025 +0200
associative variation of where branch-merging (#11851)
* add rule and test
* change comment
commit aabe7756bee49d9206f1fcda650940d2c04625dd
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 26 13:22:30 2025 -0400
fix type in fold_bitcast [pr] (#11853)
commit 4785cd959ab7e609021f6223501d3360fc0e0380
Author: Jordan Chalupka <9794216+jordan-chalupka@users.noreply.github.com>
Date: Tue Aug 26 12:49:51 2025 -0400
[TYPED=1] cvar should allow dtype as a tuple (#11770)
* cvar dtype:DType|tuple[DType, ...]|None=None
* fmt
* add a test
* list typeguard as a dep for CI
* extra step to install mypy
* fix venv
* ci fixes
* mv typeguard to testing install group
* simpler TYPED=1 test
* add typeguard to lint group
commit b111076301d9d5d4223ccf566b0f1da968793929
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 19:25:42 2025 +0300
viz: fixup click on overlay rect (#11850)
commit 1dd613cb898b4bd633aedbbbc0b1e536b2ca53f3
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Wed Aug 27 00:16:10 2025 +0800
test float_to_bf16 round-to-even behavior (#11849)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
commit 409399c609b2ca48a5bdb1c6d2b1ec127d6bc76e
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Tue Aug 26 23:42:25 2025 +0800
fix nan in float_to_bf16 (#11843)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
commit 43d5d66d34a5f0ec2842a93720c065e62b3ba0af
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 18:31:52 2025 +0300
viz: add UOp ports to edges (#11847)
* viz: add UOp ports to edges
* one edge label
* g.tag styling
* replace with NodeList
commit f28f613f85e4b204e9f9af0f4cd9684793257073
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 26 11:14:06 2025 -0400
improved float_to_bf16 (#11848)
round instead of truncate
commit afe14ccbfa86f92dca9437c49005a25250cca30c
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Tue Aug 26 15:16:36 2025 +0300
amd: aql default when several xccs (#11832)
commit 3674c0754ecd2fae4cf8470d83d69fa486d1fd40
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 14:56:13 2025 +0300
viz: small uop click changes (#11846)
* also highlight self
* can always unselect by clicking outside
* less layout
commit f2a3c2737272dfd19d8f3301e669b2a627299ffd
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 13:29:59 2025 +0300
viz: g.edges() once (#11845)
commit b0df3e62a829ea19193cc5ee108ba398bdf10aa1
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 09:03:09 2025 +0300
viz: light up srcs and paths on UOp click (#11844)
* viz: light up srcs and paths on UOp click
* safari doesn't have context-stroke
* safari also has a bug
* safari acceptance
commit 6236749867601244c6b16d2fa360642517ecdf9b
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 07:55:34 2025 +0300
viz: move rect styles to classes (#11842)
* viz: move rect styles to classes
* add rect
commit 81ffa07439d29c0eb06145a38a4041901a94228f
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 26 07:00:43 2025 +0300
viz: pass through nodes without a link (#11841)
commit 265d287615a607e0544b3281a1ad831fae63827c
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Tue Aug 26 05:21:06 2025 +0200
add decomp for !x&!y -> !(x|y) (#11836)
commit 337e979a599d91f98782fa5db96a049d2327cb7c
Author: chenyu <chenyu@fastmail.com>
Date: Mon Aug 25 22:08:26 2025 -0400
call dtypes.as_const in Tensor(list) (#11840)
commit 215818379b602dbadebfd7a2eb54dabb165cab00
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Mon Aug 25 18:03:00 2025 -0700
new (post) group for reduce (#11837)
* new (post) group for reduce
* fixes
* leave if
* fix locals
* size
* no vectorized buf
* image fixes
* don't track that
* fix ptx
* name buffer with reduce range
* remove unused in lowerer
* yay DEFINE_REG refactor
commit ac3449b0c87fbf80237dc45d40d053b40e32cf39
Author: chenyu <chenyu@fastmail.com>
Date: Mon Aug 25 19:03:41 2025 -0400
truncate_fp16 cleanup (#11838)
native `@` is default
commit e146418f6566301ffe8ebea093c0081409241c8d
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Mon Aug 25 15:56:42 2025 +0300
hotfix: profiler content-type is application/octet-stream (#11831)
commit a1f68230603e56669636f1864592129783c58a75
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Mon Aug 25 14:49:33 2025 +0300
viz: memory layout in client side (#11830)
* viz: memory layout in client side
* update test_viz
commit a6dbb0905836e0569f7bf3dc27b20a9bac55542e
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sun Aug 24 17:37:07 2025 -0700
changes for postrange (#11828)
commit 27701ef82344a70e6cb53cf909db1809e1d8d47c
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sun Aug 24 14:03:12 2025 -0700
add locals support to rangeify (#11826)
commit a286a1a6f7e2c761ee09ec9949799e1a78da4c6f
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 24 20:04:25 2025 +0200
Fast idiv try removing factors of two before cast (#11824)
* try removing factors of two
* dont return if None
* add test
commit a03b930339bc1c44678da49b8a33d7f20616b239
Author: George Hotz <geohot@gmail.com>
Date: Sun Aug 24 10:25:14 2025 -0700
hotfix: green v2 in docs
commit 6540bb32a6d69f85a145da4a95ad531dda33ce50
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sun Aug 24 10:23:25 2025 -0700
move into codegen late [pr] (#11823)
commit bba088ef1152358f1e57c201130090e55b2e2801
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sun Aug 24 19:53:00 2025 +0300
amd aql queue (#11708)
* amd aql queue
* xcc
* fiz
* aql better
* llvm
* no for aql
* wrap
* is_sql
* am support
* complete
* fix
* mypy
* minor
commit 1fa09d9edea13f3510d49b0c5231f97eb378eccd
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sun Aug 24 09:41:34 2025 -0700
BLOCK_REORDER is context var, heuristic cleanups [pr] (#11819)
* BLOCK_REORDER is context var, heuristic cleanups [pr]
* split get opt and do opt
* oops, should be on
commit 8b18cc2a94f87e6752d093d8ea94a8c04400c491
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sun Aug 24 19:37:31 2025 +0300
viz memory layout cleanup (#11820)
* rename to dtype_size
* cleanr memory shape creator
commit dd691145736ae3ff576c5890f3e5c5dd26b9c45b
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 24 18:11:24 2025 +0200
Revert "Better div nesting (#11811)" (#11818)
This reverts commit 952f729b072eaf1b6f1a8f2e993f60d8dd913045.
commit e19f90133081bb2eb2e63f8964d7ae7e2165c9f8
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sun Aug 24 18:03:45 2025 +0300
amd: rptr/wptr in create_queue (#11817)
commit d71444857e27fb3478a988d1f980401c31004f3e
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Sun Aug 24 17:48:40 2025 +0300
amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM (#11816)
* amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM
* fix
commit 44bc7dc73d7d03a909f0cc5c792c3cdd2621d787
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 23 19:55:41 2025 -0700
remove KernelInfo from GROUP_REDUCE (#11814)
commit 229adfb7c3cc7ae1e7f28502ff1e3a10b271018b
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 23 19:37:10 2025 -0700
Revert "remove KernelInfo from gpudims (#11809)" (#11813)
This reverts commit 846753f343551ff9f648c00e7b4006ad5bc4d9c5.
commit 952f729b072eaf1b6f1a8f2e993f60d8dd913045
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 24 04:17:40 2025 +0200
Better div nesting (#11811)
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
commit e652062f9221df4f8e6a2065190dc3792bb3c4ab
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 24 02:59:02 2025 +0200
tweak divmod_folding condition (#11810)
commit 846753f343551ff9f648c00e7b4006ad5bc4d9c5
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 23 16:32:45 2025 -0700
remove KernelInfo from gpudims (#11809)
* remove KernelInfo from gpudims
* that's good in there
commit 07d4ed7e4cf48ba9103fdbd0ef873398f18d83e0
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sun Aug 24 01:15:04 2025 +0200
one more symbolic add variation (#11807)
commit 759ebea4ebd56296e0ef85a2a13e763423d24103
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sun Aug 24 02:12:12 2025 +0300
viz: reflect timeline API boundary in names (#11808)
* define shapes once
* depth isn't an event property
* update server naming
commit 132f09fab7ec431f410b1f79b22ab1fb922a3fb2
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 23 15:49:17 2025 -0700
global/locals from AxisType in range (#11806)
commit 0d86288bd7bff683b5e0896a779443184afd9e34
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sun Aug 24 01:44:40 2025 +0300
viz: calculate timeline fixed points in client side (#11805)
* viz: calculate timeline fixed points in client side
* 26 bytes / event
* math
commit a75da499512d8a03ad2eb658a190e4f71e91f605
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 23 14:44:48 2025 -0700
use AxisType for UPCAST/UNROLL (#11800)
* use AxisType for UPCAST/UNROLL
* fixes
* fix the bug
* fix hack
* bad test
* flaky test
commit 2407fecdae1bc6c59eca147d099173471f2abd52
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 23 23:50:21 2025 +0300
viz bytepack format (#11792)
* viz bytepack format
Training a 1B llama yields ~20M profiler events.
With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB.
**Design decisions:**
- Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events.
- Strings (kernel names, metadata, etc) are deduped.
- Buffer sizes are in u64 nbytes.
More optimization possible:
- The string lookup is a JSON dumped array, we can compress this.
- Can store less for memory by moving the layout to client.
**Results**
| | Events | JSON | bytepack |
|----------------|---------|-------------|-------------|
| DP=8 llama 1B train (`command: [1]`) | 24M | 5.8GB | 640MB |
| examples/beautiful_mnist.py | 16K | 3.7MB | 745KB |
| examples/gpt2.py | 55K | 12.54MB | 1.40MB |
`[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py`
* python reference decoder
* 27 bytes / event, 1hr hard limit
commit b12d1d866c0fb1bae73f384b12ac2bb3f0cbc459
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 23 23:35:27 2025 +0300
count bytes per kernel in test_viz (#11801)
Currently at ~100 bytes/kernel with JSON.
commit 6a50ab6b8755fdd3de565e66017d85154ca7d456
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sat Aug 23 22:25:51 2025 +0200
adjust idiv min_max (#11802)
* change div min_max
* add tests
commit 9d4cccd0f95d86f12ce2411a79ff3dd9c055f15b
Author: chenyu <chenyu@fastmail.com>
Date: Sat Aug 23 15:11:17 2025 -0400
test_dtype_alu cleanups (#11799)
commit aefabaf77496abccc4d8174b03a36b697538c3d3
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Sat Aug 23 11:15:00 2025 -0700
add AxisType to range (#11798)
* add AxisType to range
* missed them
* fix that test
* fix that test
commit b9758304245e52bffea909da64a7a9b5891910af
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 23 19:20:29 2025 +0300
add profile loader helper in test_viz (#11797)
commit 7123df39285186d537b2c657a8e722b4f5c2230b
Author: chenyu <chenyu@fastmail.com>
Date: Sat Aug 23 11:52:29 2025 -0400
Use Tensor.logaddexp to implement Tensor.softplus (#11796)
instead of piecewise linear, numerical is handled by logaddexp. jax does this and i think it's more elegant than torch's approach
commit aaea6b97adcb98b811323c95c3b4c8cb2a99d81e
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 23 17:34:07 2025 +0300
viz memory: compute nbytes (#11795)
* viz memory: compute nbytes
* local map
commit 58653b5eaeebeb3fbadfe556de3be8e2170b21e4
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Sat Aug 23 16:19:44 2025 +0300
viz: store memory scale (#11794)
commit fb8ee02424d8e27dd28ba92eb28c89c37cb62c21
Author: chenyu <chenyu@fastmail.com>
Date: Sat Aug 23 09:15:00 2025 -0400
Tensor.logaddexp (#11793)
commit 5a6817d5f87649fd6d229ebee2f993ed40a333ed
Author: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Date: Sat Aug 23 05:56:19 2025 +0200
Fix z3 rendering of floats in indexing (#11740)
* Fix floating point comparison in indexing
* wrap in noop
* update tests
* improve rules for loading and comparing floats
* add test cast to bool
commit 4267c45db35f2bb9ffb871146c22340c7179c2b5
Author: chenyu <chenyu@fastmail.com>
Date: Fri Aug 22 23:13:45 2025 -0400
non-supported dtype in transcendental (#11754)
* non-supported dtype in transcendental
`CPU=1 python3 test/test_dtype_alu.py TestDTypeALU.test_bfloat16_unary` works
* test
* works on real mac
commit e39b25cd36ad9bbd59921f36f19dee46b0976089
Author: chenyu <chenyu@fastmail.com>
Date: Fri Aug 22 20:16:34 2025 -0400
upcast float exp to at least float32 (#11758)
* upcast float exp to at least float32
* unlucky seed
commit b057a90d493664d37558eb6c5447bc5bd5c15009
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Fri Aug 22 20:08:58 2025 +0300
memory: rename is_huge_page -> is_page (#11786)
commit 38f0fa7bde5d25b3df4c0ecfa6a883e5985dfd6a
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Aug 22 20:00:48 2025 +0300
viz: only send trace duration (#11789)
* viz: only send trace duration
* can unwrap
commit 1c81ec924832122d7712add6b065c90baa0133d5
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Aug 22 19:47:49 2025 +0300
viz: rename to start/end timestamp (#11788)
commit 9ff03680ba047a21b843a4e9299ed43e68730701
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Aug 22 19:30:21 2025 +0300
viz: store relative timestamps (#11787)
* viz: store relative timestamps
* err
* update test
commit 698392334f76c84dcd08b6099bcdc202b74bdfef
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Fri Aug 22 18:21:32 2025 +0300
system: message for eaccess as well (#11785)
commit 1e679bd789ec266c0c7d6843e7d6cdc950df135d
Author: geohotstan <135171913+geohotstan@users.noreply.github.com>
Date: Fri Aug 22 20:31:24 2025 +0800
fix max_unpool2d inf (#11784)
* start
* add regression test for maxunpool2d
commit 9832599c9e3e8f06daa281097b7c7d1288b33f01
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 21 22:39:35 2025 -0700
test_vmap + permute isn't a sint (#11783)
* test_vmap + permute isn't a sint
* order
commit bb8de51e5f9c5b9be7823d41ed83541f412b2b43
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 21 20:04:45 2025 -0700
remove unused early cleanups + contig w range [pr] (#11780)
* remove unused early cleanups [pr]
* contiguous with range
* woah, this works
commit 91a4de4ca72fcf63c9010e709292b25912408177
Author: chenyu <chenyu@fastmail.com>
Date: Thu Aug 21 21:55:32 2025 -0400
fix getitem with inf in tensor (#11781)
commit 66e9d54eed766626ddde4f53319fbdc5ca0925c7
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 21 16:53:58 2025 -0700
RANGEIFY=2 is partial contig (#11777)
commit 8de6db15ac42e5213556f48fa05bb71202836f17
Author: Jordan Chalupka <9794216+jordan-chalupka@users.noreply.github.com>
Date: Thu Aug 21 18:37:50 2025 -0400
exclude .git from ruff (#11773)
commit 5954a0975fefc690be04c15ac7a0be3f32189075
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 21 15:15:54 2025 -0700
fix some assigns on rangeify (#11774)
* fix some assigns
* llvm test
* more tests
* upd test
commit 2e0eb885490b87d4c69f6018e5fa029b4706cd02
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Fri Aug 22 00:18:45 2025 +0300
viz: add metadata to UOp tracing (#11772)
* viz: add metadata to UOp tracing
* place after tag
* optional field
* err, refcount of root must be 0
commit d6f9606e936f3799d22f44bbb118128056b905b3
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 21 11:15:09 2025 -0700
small cleanups to rangeify (#11769)
commit bd4a9473b0bc8bc008569e8bc0b1ba1824fad159
Author: uuuvn <83587632+uuuvn@users.noreply.github.com>
Date: Thu Aug 21 17:51:49 2025 +0000
Multihost exception handling (#11729)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
commit a2c7b807e0d77db34c233a0e669e31b79f349097
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Thu Aug 21 10:10:56 2025 -0700
don't bufferize 0s (#11766)
commit 9eff7cd1d8e24e114b712b3531c439a85c70af2c
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Thu Aug 21 18:28:13 2025 +0300
am: support 64bit discovery (#11768)
commit 56cd47a159df9f63a4b77f8e0dc4c0a80c76cc40
Author: b1tg <33436708+b1tg@users.noreply.github.com>
Date: Thu Aug 21 21:33:28 2025 +0800
fix amd llvm bf16 tc (#11713)
* fix amd llvm bf16 tc
* is_cdna
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
commit a04464811145b950a330abc20d9cf82afcdffa90
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Aug 20 20:55:49 2025 -0700
rangeify load cleanups + multi support (#11765)
* use the old buf_uop + cleanups
* simpler handling of load
* everything needed for multi too
commit 9f94c25a254efce7161a56254d7f82cd7b05b764
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Aug 20 18:35:42 2025 -0700
fix symbolic usage. use shrink, not reshape (#11762)
* fix test_var
* revert those things
* fix the ones in test tiny
* use better syntax
* it's the same, but that's clearer
* fix pad
commit 5276fbc9c574ff8b964c5368f2b4f6f9db728d2e
Author: chenyu <chenyu@fastmail.com>
Date: Wed Aug 20 20:35:40 2025 -0400
fix gather with inf values (#11760)
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
commit b979162c5dea9cdbc4e7e1e3bdb1cfbab1449e2c
Author: wozeparrot <wozeparrot@gmail.com>
Date: Wed Aug 20 19:56:35 2025 -0400
llama3 eval train (#11706)
commit dbd3b67657b03fa6237116467bef0aac4d9d9dcd
Author: chenyu <chenyu@fastmail.com>
Date: Wed Aug 20 19:55:50 2025 -0400
clamp GRAD_CLIP_NORM in llama (#11761)
commit 9635592141e9e01623cd4bd92470f2286a9caeac
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Wed Aug 20 14:22:44 2025 -0700
** rangeify, try 3 (#11683)
* ** rangeify, try 3
* bring that over
* bufferize, don't use contig tag
* work
* ish
* fix rangeify
* flash attention is back
* fix rangeify tests
* stuff passes
* fix test_log_softmax
* more stuff passes
* progress children
* new endrange solution
* progress
* progress counter
* basic assign
* contigs only
* symbolic in schedule
* unbind_kernel
* late children
* ops fixed
* beautiful mnist is close
* that seems to work
* mnist works
* improve names
* fix bmnist
* no pcontig
* testing backward
* work
* clone movement ops
* new_range helper
* MBLOCK/MERGE
* ops tests pass
* revert mblock stuff
* cleanups...but it breaks ops
* remove reindex
* hack for relu
* disable the hacks
* more hacks
* upd
* mostly works with cleanups disabled
* ndr
* ops tests pass
* terrible hacks for indexing to work
* context mismatch
* pcontig
* split pcontig v contig
* z3 trunc
* null
* no fuse in rangeify
* ops test passes
* lnorm
* fix assign
* nd rangeify
* both should work
* tests for rangeify
* cleanups
* stores pass the pointer through
* disable pcontig for now
* PARTIAL_CONTIG is a flag
commit d7553721d1f216fd0780513a9a68bb7c53b33bdb
Author: chenyu <chenyu@fastmail.com>
Date: Wed Aug 20 14:36:18 2025 -0400
clean up test_dtype_alu (#11757)
remove the check that looks into schedule, only test if output matches
commit 5f08a3e9289b0cf8fe7fa8f48e03d3d1966961a4
Author: chenyu <chenyu@fastmail.com>
Date: Wed Aug 20 12:18:35 2025 -0400
hotfix: cast half to float in Tensor.tolist (#11755)
workaround for python < 3.12
commit de4cb722a4ec68e9002eae3903d0b91f9e98b7c1
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Wed Aug 20 18:39:51 2025 +0300
viz: add metadata and var_vals tracing (#11753)
* viz: add metadata and var_vals tracing
* add test_trace_metadata
* set TRACEMETA=1
commit 6589c9e643cc5f25d6b39fb8b5d48421f5a654e5
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Wed Aug 20 17:50:51 2025 +0300
hcq: better errors for ifaces (#11751)
* hcq: better errors for ifaces
* fix linter
* typo
* space
commit be7b0b69709ea8f223b9dc3bee85063e18759100
Author: chenyu <chenyu@fastmail.com>
Date: Wed Aug 20 10:29:36 2025 -0400
TRANSCENDENTAL_SUPPORTED_DTYPES->TRANSCENDENTAL_DTYPES (#11752)
commit 220a2a88d7e87b2648839c3044065d8c8de82ccd
Author: ttomsa <tomasvsilva8@gmail.com>
Date: Wed Aug 20 14:35:10 2025 +0100
a*(1/b) -> a/b on LLVM, CPU (#11743)
* add fdiv rewrite
* :)
* use float_lop
* use reciprocal()
* revert
* move to decompositions
commit 12ab3f8b06ad594c6ab0c39c5e247f34db2001f4
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 19 22:21:07 2025 -0700
correct row_count in process replay (#11748)
commit 8af8808c61a97f1a7b4fb79b190bd517e43bba8b
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 19 21:21:07 2025 -0700
cleanup tests, bump caches (#11746)
commit 00391db628f365e9c02fec3b8d48688e839d6a30
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 19 20:18:45 2025 -0700
no ast for mem estimate (#11744)
* no ast for mem estimate
* skip for webgpu
commit dd413e12086e7013eaed9627c85f78576651e572
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 19 16:21:28 2025 -0700
remove a Ops.REDUCE check in reduce_collapse [pr] (#11734)
commit 70c3f1fb290da9af7dfcc25c3f6126567c831ba4
Author: ttomsa <tomasvsilva8@gmail.com>
Date: Wed Aug 20 00:08:16 2025 +0100
x.where(False, True) -> !x (#11738)
* add pat
* add test
commit 1d307f568c609db0966d61eddbf90e680fb9fbc1
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 19 16:02:20 2025 -0700
move device tests to test/device + test cleanups (#11735)
* move device tests to test/device
* test speedups
* test device
* linalg to unit
* upd
* so pytest just works
* more divide and skip
* speed
* test devectorize
* add pillow
commit bcc7623025d39f4994eab0394beb83662d879ec8
Author: wozeparrot <wozeparrot@gmail.com>
Date: Tue Aug 19 17:08:56 2025 -0400
feat: bump version to 0.11.0 (#11736)
commit 8c987b3293f424b92588cec49ec8687273d1dc86
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 19 23:30:50 2025 +0300
DISABLE_FAST_IDIV is a context var [pr] (#11733)
commit bf467c623d6c0928134e371d2ec168098235b75c
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Tue Aug 19 12:51:54 2025 -0700
changes from rangeify + better NullRenderer (#11732)
* changes from rangeify + better NullRenderer
* fix test
commit 02353588cb22e4bc66501750d1e61e7656a6604c
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 19 09:25:58 2025 -0700
small getitem cleanup (#11730)
commit 712a5c651a863ab01ce6f93e84092d1b79b9089a
Author: chenyu <chenyu@fastmail.com>
Date: Tue Aug 19 05:07:38 2025 -0700
minor Tensor.triu cleanup (#11728)
less confusing dtype
commit 9c9e337c7815513dcb84582a7f040a5048bea972
Author: nimlgen <138685161+nimlgen@users.noreply.github.com>
Date: Tue Aug 19 15:06:09 2025 +0300
amd: parse soc enums (#11727)
* amd: parse soc enums
* remove from mock
* fix
* minimal amd_gpu
commit 57ad69160a8d86bd698042f53af8da7724bc3705
Author: qazal <77887910+Qazalin@users.noreply.github.com>
Date: Tue Aug 19 08:03:29 2025 +0300
viz: inline memory shape spec (#11725)
commit c5b52e9321e02e25302abe419847b9dfba72413b
Author: chenyu <chenyu@fastmail.com>
Date: Mon Aug 18 20:34:42 2025 -0700
onnx RotaryEmbedding cleanup (#11724)
commit 31619774a9dfe96e5449cfc0b31a87f483d31a55
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Mon Aug 18 19:44:35 2025 -0700
Revert "Revert "fix the misused cast in amd llvm tc (#11711)" (#11715)" (#11723)
This reverts commit ca28db5a97af52b03bd56fa0aa4fe45a18457a59.
commit 2ea54d733778308ff9c920002e80165d63191367
Author: George Hotz <72895+geohot@users.noreply.github.com>
Date: Mon Aug 18 17:49:45 2025 -0700
improve syntax of UPats using f [pr] (#11717)
Co-authored-by: chenyu <chenyu@fastmail.com>
commit b67345caa3fa2cdcbe8e1669a12ae3ec643bef1a
Author…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

i worked on fixing the #Note/TODO comments to optimize RoPE so that JIT prunes rope cache outside the kernel. Jit can now optim and prune redundant rope comp by seing consistent tensor shape T + start_pos. I think now there is much better computation pattern and precision. Tried to make the math and code compact and in a few lines as possible.