ggml-cpu: Respect cpumask settings with OpenMP#16164
Merged
ggerganov merged 1 commit intoggml-org:masterfrom Sep 23, 2025
Merged
ggml-cpu: Respect cpumask settings with OpenMP#16164ggerganov merged 1 commit intoggml-org:masterfrom
ggerganov merged 1 commit intoggml-org:masterfrom
Conversation
slaren
approved these changes
Sep 22, 2025
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
Sep 23, 2025
* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...
struct
pushed a commit
to struct/llama.cpp
that referenced
this pull request
Sep 26, 2025
pwilkin
pushed a commit
to pwilkin/llama.cpp
that referenced
this pull request
Oct 23, 2025
Anico2
added a commit
to Anico2/llama.cpp
that referenced
this pull request
Jan 15, 2026
blime4
referenced
this pull request
in blime4/llama.cpp
Feb 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It is useful to set CPU affinity on CPUs with heterogeneous cores to force using performance cores, in order to eliminate CPU stalling caused by speed mismatch between different types of cores.
In llama.cpp there are a series of options named "--cpu-mask", "--cpu-range", etc for this purpose. But these options seem to only affect (old?) ggml internal thread pool implementation but not when using OpenMP for threading, which is now enabled by default.
This pull request adds back support for related options when using OpenMP, thus making it easier to set CPU affinity using CLI arguments.
Side note: I also think we should make a better default CPU affinity assignment. Right now the related code seems to be a bit messy. We have two functions
cpu_get_num_physical_cores()andcpu_get_num_math()for almost the same purpose whilst the latter only implemented in x86 Linux. Since we do not set affinity we are relying on the OS task scheduler to do the right thing. Unfortunately at least on my machine (13700k on win11) the OS persistently uses E cores without explicit affinity settings. So I do think we should do affinity settings by default, at least on heterogeneous CPUs. This PR can serve as a basis towards that goal by allowing manual settings at first.