ggml : extend ggml_can_fuse to work with non-sequential nodes#16123
ggml : extend ggml_can_fuse to work with non-sequential nodes#16123
Conversation
jeffbolznv
left a comment
There was a problem hiding this comment.
Fwiw, with the sorting I have in the vulkan backend, I try to separate empty nodes from ninety nodes, which makes it less important to be able to skip.
ggml/src/ggml-impl.h
Outdated
| static inline bool ggml_can_fuse_ext(const struct ggml_cgraph * cgraph, const int * node_idxs, const enum ggml_op * ops, int num_ops) { | ||
| for (int i = 0; i < num_ops; ++i) { | ||
| struct ggml_tensor * node = cgraph->nodes[node_idx + i]; | ||
| if (node_idxs[i] + num_ops > cgraph->n_nodes) { |
There was a problem hiding this comment.
Why add num_ops here? Also, should be >=, i think.
ggml/src/ggml-impl.h
Outdated
| return ggml_can_fuse(cgraph, node_idx, ops.begin(), (int)ops.size()); | ||
| } | ||
|
|
||
| inline bool ggml_can_fuse(const struct ggml_cgraph * cgraph, std::initializer_list<int> node_idx, std::initializer_list<enum ggml_op> ops) { |
There was a problem hiding this comment.
I'm not totally clear on how you foresee this function being used. I assume the caller needs to look through nodes and find the ops its interested in (skipping empty nodes), so I dont think it'll be able to use an initializer list for the node idxs.
There was a problem hiding this comment.
The idea is to create a list of indices that excludes the empty nodes in idxs. And then use this like so:
// non-empty node indices
std::vector<int> idxs;
bool can_fuse(int i0, int i1, std::initializer_list<enum ggml_op> ops) const {
assert(use_fusion);
assert(i0 >= 0 && i0 < n_nodes());
assert(i1 >= 0 && i1 < n_nodes());
assert(ops.size() == 2);
return ggml_can_fuse(gf, { idxs[i0], idxs[i1] }, ops);
}Btw, I really need an API to tell if 2 ops are fusable - don't think there is any benefit to support more than a pair of ops. The assumption is that if we can fuse N ops, then for sure we can fuse N-1 ops. So at step N we only need to check if ops N and N+1 are fusable - no need to iterate again over all previous ops.
There was a problem hiding this comment.
If you have to rebuild the initializer_list from the vector, why not just pass an int* instead?
There was a problem hiding this comment.
Right, I can do that. Removed the extra overload.
Yes, I noticed that works, but I'm not really confident if this won't break any assumption in the future. Specifically, this case: If I just move all the adds together and put the views after them, everything seems to work. But I think I also saw that depending on the order of the views, the allocr can do different things. For example, at some point I was thinking, let's make the |
* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...
…rg#16123) * ggml : extend ggml_can_fuse to work with non-sequential nodes in the graph * cont : fix wrong bounds check condition * cont : remove unnecessary overload
…rg#16123) * ggml : extend ggml_can_fuse to work with non-sequential nodes in the graph * cont : fix wrong bounds check condition * cont : remove unnecessary overload
…rg#16123) * ggml : extend ggml_can_fuse to work with non-sequential nodes in the graph * cont : fix wrong bounds check condition * cont : remove unnecessary overload
* ggml : extend ggml_can_fuse to work with non-sequential nodes in the graph * cont : fix wrong bounds check condition * cont : remove unnecessary overload
In some cases, we might want to fuse ops that are not sequential in the graph. For example, when there are intermediate view ops in-between the fusable ops.
Sample usage: #16102