Commit f5d1c41
authored
hexagon: dma optimizations (mostly fixing regressions) (#21137)
* hex-fa: add simple dma cache for Mask
I noticed that we were refetch the mask rows over and over.
This simple cache avoids that.
* hex-dma: unset in-order desc bit which caused signficant perf regression
We don't rely on true in order processing of the DMA descriptors anywhere.
Turns out this mode caused significant regression of around 3-4 TPS during token gen.
* hex-rope: update comment to clarify that we don't need in-order DMA completions1 parent 2405d59 commit f5d1c41
File tree
3 files changed
+74
-17
lines changed- ggml/src/ggml-hexagon/htp
3 files changed
+74
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
349 | 352 | | |
350 | 353 | | |
351 | 354 | | |
| |||
389 | 392 | | |
390 | 393 | | |
391 | 394 | | |
392 | | - | |
393 | 395 | | |
394 | | - | |
| 396 | + | |
395 | 397 | | |
396 | 398 | | |
397 | 399 | | |
| |||
554 | 556 | | |
555 | 557 | | |
556 | 558 | | |
557 | | - | |
| 559 | + | |
558 | 560 | | |
559 | 561 | | |
560 | 562 | | |
| |||
684 | 686 | | |
685 | 687 | | |
686 | 688 | | |
687 | | - | |
| 689 | + | |
688 | 690 | | |
689 | 691 | | |
690 | 692 | | |
| |||
705 | 707 | | |
706 | 708 | | |
707 | 709 | | |
| 710 | + | |
| 711 | + | |
708 | 712 | | |
709 | 713 | | |
710 | 714 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
146 | | - | |
| 146 | + | |
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
154 | | - | |
155 | | - | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
156 | 160 | | |
157 | 161 | | |
158 | 162 | | |
| |||
175 | 179 | | |
176 | 180 | | |
177 | 181 | | |
178 | | - | |
| 182 | + | |
179 | 183 | | |
180 | 184 | | |
181 | 185 | | |
| |||
197 | 201 | | |
198 | 202 | | |
199 | 203 | | |
200 | | - | |
201 | | - | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
202 | 210 | | |
203 | 211 | | |
204 | 212 | | |
| |||
215 | 223 | | |
216 | 224 | | |
217 | 225 | | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
| 226 | + | |
223 | 227 | | |
| 228 | + | |
224 | 229 | | |
225 | 230 | | |
226 | 231 | | |
| |||
312 | 317 | | |
313 | 318 | | |
314 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
315 | 368 | | |
316 | 369 | | |
317 | 370 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
336 | | - | |
337 | | - | |
| 336 | + | |
| 337 | + | |
338 | 338 | | |
339 | 339 | | |
340 | 340 | | |
| |||
0 commit comments