Author's Latest Posts


Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues


This blog post explains the cross-NUMA memory access issue that occurs when you run llama.cpp in Neoverse. It also introduces a proof-of-concept patch that addresses this issue and can provide up to a 55% performance increase for text generation when you run the llama3_Q4_0 model on the ZhuFeng Neoverse system. Cross-NUMA memory access problem In llama.cpp, performance drops when the number o... » read more

BOLT Optimization Technology Could Bring Obvious Performance Uplift On Arm Server


BOLT is a post-link optimization technology which builds on LLVM framework, which leverages perf tool to collection sampling data and convert the executable into an optimized version. After evaluating BOLT on several workloads such as MySQL, Redis, memcached and nginx on Arm server, we could see obvious performance uplift. This blog post illustrates the methods used to enable BOLT and per... » read more