ggml : cgraph export/import/eval example + GPU support#108
ggml : cgraph export/import/eval example + GPU support#108
Conversation
|
@ggerganov I'm a bit curious/interested in this approach; I like that you are trying to separate ggml and the GPU implementation layer like this. I'd be keen to make a quick attempt at executing the ggml graph output you have here using WebGPU from Zig; but I'm not sure exactly how to piece that output together (or even read it, necessarily) - so I wonder if you'd consider adding a C example or something that executes it on the CPU and validates the results it gets, so I could better understand how it works? |
|
@slimsag Will try to prioritise this soon and finalize the export format + a CPU and/or Metal example |
|
Netron supports many formats of exported graphs already. I think GGML could be easily added. |
6264c52 to
eed3eac
Compare
|
Bit of slow progress here, but I think it is starting to work out |
|
Ive been waiting for this for months, Nothing has been as easy to use as llama.cpp. |
|
Ok, I'm finally at the interesting part. I have the
Regarding the memory mapping, it looks like I need to use MTLHeap to map the Everything should go into a single |
Even though that command buffer takes multiple milliseconds, it won't cause a UI hitch. The Apple GPU can execute two separate command buffers concurrently from different |
|
This is now working as expected and can serve as a proof-of-concept for offloading a Before merging this, I will move the new import / export functions to the core After merging, the next step will be to implement LLaMA inference with the same approach. |
This is the first step towards full GPU and custom hardware inference support (see ggml-org/llama.cpp#915)
The idea is to be able to export the
ggmlcomputation graphs (ggml_cgraph) into standalone.ggmlfiles.These files can be later imported by a separate application and evaluated based on the available hardware / framework (CUDA, Metal, WebGPU, etc.). The computation graph contains everything necessary to perform the inference:
As an example, we export the MNIST computation graph from the mnist example into the file
mnist.ggml:Next, using the
mnist-cputool, we load the graph and re-evaluate it on the CPU usingggml_graph_compute():Or we can run it on the Apple Silicon GPU using Metal:
Here is a sample run:
$ dot -Tpng mnist.dot -o mnist.dot.png && open mnist.dot.pngCPU (via ggml)
Metal