Conversation
|
@max-krasnyansky I am using the graph-profiler branch but I'm unsure how to trigger and get the profiling details. Any docs, commands or references would be appreciated. Thanks. |
6246824 to
e7e9a7f
Compare
Sorry for the delay. Here is how to build (arm64-ubuntu)
And here is how to run
This will get you the output I included in the PR |
d4051c8 to
a362c74
Compare
|
Hi, I am also trying to find how to do profile properly with llama.cpp. In my case, I would like to know the performance beyond the node level. For example, I would like to know the aggregated time of all nodes generated by |
|
I think a good approach can be that for each |
a362c74 to
ca40774
Compare
I'm thinking for that it might make sense to insert dummy graph nodes that record profiling data. |
ca40774 to
dd0b9aa
Compare
|
Hi, I was looking for a tool exactly like this to dump the actual graph operators during execution. Very useful. One feedback but I actually needed the following change to compile in my environment (Windows 11 x64 + VS2022). diff --git a/ggml/src/ggml-cpu/CMakeLists.txt b/ggml/src/ggml-cpu/CMakeLists.txt
index 2cc42d4b0..acfa79fff 100644
--- a/ggml/src/ggml-cpu/CMakeLists.txt
+++ b/ggml/src/ggml-cpu/CMakeLists.txt
@@ -583,6 +583,10 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
list(APPEND GGML_CPU_SOURCES ${GGML_KLEIDIAI_SOURCES})
endif()
+ if (GGML_GRAPH_PROFILER)
+ target_link_libraries(${GGML_CPU_NAME} PRIVATE ggml-base)
+ endif()
+
message(STATUS "Adding CPU backend variant ${GGML_CPU_NAME}: ${ARCH_FLAGS} ${ARCH_DEFINITIONS}")
target_sources(${GGML_CPU_NAME} PRIVATE ${GGML_CPU_SOURCES})
target_compile_options(${GGML_CPU_NAME} PRIVATE ${ARCH_FLAGS})
diff --git a/ggml/src/ggml-profile.h b/ggml/src/ggml-profile.h
index 3f8fecc08..f63f019ce 100644
--- a/ggml/src/ggml-profile.h
+++ b/ggml/src/ggml-profile.h
@@ -77,11 +77,11 @@ static inline void ggml_graph_profile_event(const struct ggml_cgraph *cg, enum g
#else
-void ggml_graph_profile_init(struct ggml_cgraph *cg, int n_threads);
-void ggml_graph_profile_start(struct ggml_cgraph *cg, int n_threads);
-void ggml_graph_profile_finish(struct ggml_cgraph *cg, int n_threads);
-void ggml_graph_profile_free(struct ggml_cgraph *cg);
-void ggml_graph_profile_event(const struct ggml_cgraph *cg, enum ggml_profile_event e, int node_n, int ith);
+GGML_API void ggml_graph_profile_init(struct ggml_cgraph *cg, int n_threads);
+GGML_API void ggml_graph_profile_start(struct ggml_cgraph *cg, int n_threads);
+GGML_API void ggml_graph_profile_finish(struct ggml_cgraph *cg, int n_threads);
+GGML_API void ggml_graph_profile_free(struct ggml_cgraph *cg);
+GGML_API void ggml_graph_profile_event(const struct ggml_cgraph *cg, enum ggml_profile_event e, int node_n, int ith);
#endif // GGML_GRAPH_PROFILER |
aa5a7c6 to
0033263
Compare
0033263 to
0be2c42
Compare
0be2c42 to
c58a3bf
Compare
c58a3bf to
6b1394e
Compare
Here is an attempt at reintroducing the original whole-graph profiler (LLAMA_PERF) with some additional features.
Not ready for the merge into master but useful for profiling different models (on CPU).
Features:
Known issues:
ggml_init_param.graph_profileor it'll be moved into the backend paramsIf there is interest it should be easy to extend to other backends where they could update per-node/per-thread
ggml_profile_timingdata (they'd have to collect it on the accelerator and then export into this common format.See original PR #9647 for additional details.
Details
Example of the terminal output
Same example in rendered MarkDown