ggml : add ggml_build_forward_select#18550
Conversation
|
Just want to make sure I understand how this is used - it would still be two separate graphs, they'd just be able to reuse allocations (i.e. ggml-alloc would decide they match)? I think ggml_can_fuse and ggml_can_fuse_subgroup would need to be updated to make sure all nodes are computed. And any backend-specific fusion logic. |
Yes, for example the graph when the input is token ids (
Not yet sure that it's really necessary to do so - at least I can't think of a fail case so far. Note that the |
|
We would need to check how this behaves with CUDA graphs, since inherently the computation is changing |
|
cc: @AlekseiNikiforovIBM @Andreas-Krebbel Give us a week or so to check on this :) |
e7b6c35 to
da5d289
Compare
89d19e0 to
c92df39
Compare
9922d3a to
9f8a79c
Compare
|
For CUDA graphs I think adding a check for flags in |
LGTM |
taronaeo
left a comment
There was a problem hiding this comment.
Ack for IBM zDNN backend :)
c92df39 to
cf2b3ca
Compare
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
| } | ||
|
|
||
| if ((cgraph->nodes[i]->flags & GGML_TENSOR_FLAG_COMPUTE) == 0) { | ||
| continue; |
There was a problem hiding this comment.
If the last node or nodes are not flagged, the loop would end without the final command submission. This would need some way to ensure a final submit if submitted_nodes > 0.
There was a problem hiding this comment.
Moved the check over here: eafbd13
Hm, just noticed this is maybe not enough. Will take some extra look.
There was a problem hiding this comment.
Maybe update this logic:
// If the last op in the cgraph isn't backend GPU, the command buffer doesn't get closed properly
while (last_node > 0 && ggml_vk_is_empty(cgraph->nodes[last_node])) {
last_node -= 1;
}
There was a problem hiding this comment.
Waiting for one final ack on the Vulkan changes and will proceed to merge.
reeselevine
left a comment
There was a problem hiding this comment.
WebGPU update looks good to me, we always do a final submission if commands > 0 so there shouldn't be a problem like the comment about the Vulkan backend above.
cf2b3ca to
4b74410
Compare
9f8a79c to
3f6df60
Compare
4b74410 to
b579b97
Compare
0e73cf9 to
fc2dd15
Compare
1d13a3d to
637e779
Compare
637e779 to
8d8647e
Compare
@jeffbolznv This should be enough I think: 3646af9. Or maybe I am missing something regarding "And any backend-specific fusion logic." |
ggml_can_fuse_subgraph_ext needs the same check. All vulkan fusions go through either ggml_can_fuse or ggml_can_fuse_subgraph, so we shouldn't need anything else for ggml-vulkan's fusion. |
8d8647e to
574a1db
Compare
574a1db to
548e4e0
Compare
0cc4m
left a comment
There was a problem hiding this comment.
The Vulkan changes are fine now, thank you.
target #18547
alt #18549
GGML_TENSOR_FLAG_COMPUTEflag indicating that a tensor in the graph must be computedggml_build_forward_select()call:All provided tensors are built forward into the graph. Only
tensors[idx]and it's ancestry are marked for computing via the new flag value.This new logic allows us to construct graphs that compute different things, but at the same time have the same topology. This is needed to avoid unwanted graph reallocations (#17617).
TODOs:
Enable(next PR)-DGGML_SCHED_NO_REALLOC=ONfor server CI