Skip to content

Fix LoongArch test-quantize-fns f16 and q4_0 failed when use LSX#16958

Merged
ggerganov merged 2 commits intoggml-org:masterfrom
MQ-mengqing:a_fix
Nov 3, 2025
Merged

Fix LoongArch test-quantize-fns f16 and q4_0 failed when use LSX#16958
ggerganov merged 2 commits intoggml-org:masterfrom
MQ-mengqing:a_fix

Conversation

@MQ-mengqing
Copy link
Contributor

@MQ-mengqing MQ-mengqing commented Nov 3, 2025

LoongArch mistakenly used {__lasx_xv,__lsx_v}replgr2vr_w(), which can only operate the integer. When operating on float, it will round them to integers, causing error.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 3, 2025
@HmnSn
Copy link

HmnSn commented Nov 3, 2025

I tested it on 3A6000 with -DGGML_LSX=ON -DGGML_LASX=OFF and -DGGML_LASX=ON. It works well!

gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 3, 2025
* origin/master: (169 commits)
opencl: support imrope (ggml-org#16914)
fix: Viewing multiple PDF attachments (ggml-org#16974)
model-conversion : pass config to from_pretrained (ggml-org#16963)
server : add props.model_alias (ggml-org#16943)
ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962)
mtmd: add --image-min/max-tokens (ggml-org#16921)
mtmd: pad mask for qwen2.5vl (ggml-org#16954)
ggml : LoongArch fixes (ggml-org#16958)
sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949)
SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869)
feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508)
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936)
ci : disable failing riscv cross build (ggml-org#16952)
model: add Janus Pro for image understanding (ggml-org#16906)
clip : use FA (ggml-org#16837)
server : support unified cache across slots (ggml-org#16736)
common : move gpt-oss reasoning processing to init params (ggml-org#16937)
docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920)
CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917)
devops: fix failing s390x docker build (ggml-org#16918)
...
@MQ-mengqing MQ-mengqing deleted the a_fix branch November 4, 2025 00:42
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* Fix test-quantize-fns f16 and q4_0 failed when use LSX

* Fix LoongArch set float intrinsic when use LSX/LASX
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* Fix test-quantize-fns f16 and q4_0 failed when use LSX

* Fix LoongArch set float intrinsic when use LSX/LASX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants