devops: add s390x & ppc64le CI by taronaeo · Pull Request #15925 · ggml-org/llama.cpp

taronaeo · 2025-09-10T13:05:13Z

Introduce s390x & ppc64le CI using IBM Actions on POWER and Z Runner images.

TODO:

Fix s390x build warnings (ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl #15928)
Turn LLAMA_FATAL_WARNINGS back ON
Fix tokeniser differences between Little-Endian vs. Big-Endian
(OPTIONAL) Turn on ppc64le CI

Edit: I have to use this PR to test the GitHub Actions as only Llama.cpp is authorised to use the s390x and ppc64le images. Unfortunately I do not have an alternative to develop this elsewhere and submit the PR once I'm ready.

CISC · 2025-09-10T13:12:15Z

But it's not a cross build is it? Should be in build.yml.

taronaeo · 2025-09-10T13:13:42Z

Nope, native build. I am testing around, hence draft haha. Thanks for the heads up!

CISC · 2025-09-10T18:16:41Z

I think you will need to resolve the warnings and revert before undrafting, the build is done with LLAMA_FATAL_WARNINGS=ON for a reason.

This looks like an actual bug (fixed in #15928):
https://github.com/ggml-org/llama.cpp/actions/runs/17615331967/job/50046823487#step:5:87

This might be a bug, as it suggests these loops are no-op:
https://github.com/ggml-org/llama.cpp/actions/runs/17615204324/job/50046429801#step:5:121

llama.cpp/ggml/src/ggml-cpu/vec.h

Lines 677 to 684 in 6ab397e

    
           for (int i = 0; i < np; i += GGML_F32_STEP) { 
        
               for (int j = 0; j < GGML_F32_ARR; j++) { 
        
                   ay[j] = GGML_F32_VEC_LOAD(y + i + j*GGML_F32_EPR); 
        
                   ay[j] = GGML_F32_VEC_MUL(ay[j], vx); 
        
                   GGML_F32_VEC_STORE(y + i + j*GGML_F32_EPR, ay[j]); 
        
               } 
        
           }

llama.cpp/ggml/src/ggml-cpu/vec.h

Lines 740 to 747 in 6ab397e

    
           for (int i = 0; i < np; i += GGML_F16_STEP) { 
        
               for (int j = 0; j < GGML_F16_ARR; j++) { 
        
                   ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j); 
        
                   ay[j] = GGML_F16_VEC_MUL(ay[j], vx); 
        
                   GGML_F16_VEC_STORE(y + i + j*GGML_F16_EPR, ay, j); 
        
               } 
        
           }

taronaeo · 2025-09-10T18:31:52Z

I think you will need to resolve the warnings and revert before undrafting, the build is done with LLAMA_FATAL_WARNINGS=ON for a reason.

Yep, will take note of that before marking as ready. I wanted to progress past the compile stage to fix the endianness issues with the test models first because that is the problematic part.

Also, the problem about GCC on s390x and ppc64le is that the compiler is a little more strict. Things that would have been okay on x86 or ARM are reported as warnings on s390x and ppc64le and, in this case, marked as fatal.

I suppose we can either have the codeowner work to fix the warning on the affected lines, or (not ideal) we disable warnings as fatal for s390x and ppc64le.

CISC · 2025-09-10T19:02:04Z

Maybe just convert and overwrite the vocab GGUFs for BE tests?

CISC · 2025-09-10T19:05:41Z

You need the .out files too.

taronaeo · 2025-09-10T19:11:47Z

It's 3 AM on my side now, will continue this tomorrow 🙂

CISC · 2025-09-10T19:16:34Z

Looks like there's an endian-issue in WPM and BPE tokenizer.

CISC · 2025-09-10T19:25:08Z

If you add vocab GGUF conversion instead of duplicating files, test-tokenizers-ggml-vocabs could also do that, so preferable IMO.

taronaeo · 2025-09-11T13:29:35Z

If you add vocab GGUF conversion instead of duplicating files

You mean use the gguf-py/gguf/scripts/gguf_convert_endian.py script to convert the GGUF files during CI tests right? That's a good approach but I didn't want to increase the CI runtime because our s390x and ppc64le runners are pretty limited (supplied by IBM Actions team directly).

I don't think storage is an issue here also right? But if this is a concern, do let me know and we can go the conversion route during CI run.

test-tokenizers-ggml-vocabs could also do that, so preferable IMO.

I didn't quite get this part. Can you explain further?

CISC · 2025-09-11T13:37:59Z

If you add vocab GGUF conversion instead of duplicating files

You mean use the gguf-py/gguf/scripts/gguf_convert_endian.py script to convert the GGUF files during CI tests right? That's a good approach but I didn't want to increase the CI runtime because our s390x and ppc64le runners are pretty limited (supplied by IBM Actions team directly).

Yes, shouldn't add much to runtime as they are only small vocab files.

I don't think storage is an issue here also right? But if this is a concern, do let me know and we can go the conversion route during CI run.

Generally we don't want to bloat git with more binaries than necessary, that's why new vocab files are being stored on HF.

test-tokenizers-ggml-vocabs could also do that, so preferable IMO.

I didn't quite get this part. Can you explain further?

This test downloads extra vocab files from HF, it's preferable that you also run this test and not skip it, but these files are little-endian only, so must be converted.

CISC · 2025-09-11T15:59:40Z

Don't try to install all the repo requirements just for gguf_convert_endian.py, pip install gguf or even just numpy and tqdm is enough.

CISC · 2025-09-11T16:06:18Z

It's just gguf, not gguf-py. :)

taronaeo · 2025-09-11T16:09:04Z

It's just gguf, not gguf-py. :)

I was trying to target the relative path so it installs from local. Should I prefer installing via pypi instead?

CISC · 2025-09-11T16:14:45Z

It's just gguf, not gguf-py. :)

I was trying to target the relative path so it installs from local. Should I prefer installing via pypi instead?

Ah, no, that's fine, it'll install the right dependencies then.

CISC · 2025-09-11T16:33:47Z

Here's a nice trick to check if system is big-endian, that can be added to test-tokenizers-repo.sh, returns 1 for big-endian and 0 for little-endian:

echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c1

taronaeo · 2025-09-11T16:36:20Z

Taking a quick glance at the tokeniser tests, its clear that Little-Endian and Big-Endian models process these a little differently (i.e., space vs. no space before text)

- 6: src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天～'
+ 6: res: 'Hello, y'all! How are you 😁?我想在apple工作1314151天～'
6: tok: 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252

For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos. I suppose it's okay for me to edit the .out file to match the Big-Endian results right? :)

taronaeo · 2025-09-11T16:38:50Z

Here's a nice trick to check if system is big-endian, that can be added to test-tokenizers-repo.sh, returns 1 for big-endian and 0 for little-endian:
echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c1

I suppose this is to separate the Little-Endian and Big-Endian .inp and .out files right?

Edit: Doesn't work :(

[taronaeo@aqlinux2 ~]$ echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c1
0

CISC · 2025-09-11T16:43:58Z

I suppose this is to separate the Little-Endian and Big-Endian .inp and .out files right?

For converting the downloaded files.

Edit: Doesn't work :(

Dang, I was making a guess there to invert the result, guess I was wrong, change -c1to -c6 and it should return 0 for big-endian (returns 1 on little-endian).

CISC · 2025-09-11T16:46:33Z

Taking a quick glance at the tokeniser tests, its clear that Little-Endian and Big-Endian models process these a little differently (i.e., space vs. no space before text)
- 6: src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天～'
+ 6: res: 'Hello, y'all! How are you 😁?我想在apple工作1314151天～'
6: tok: 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252 
For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos. I suppose it's okay for me to edit the .out file to match the Big-Endian results right? :)

No, this looks like an endian-bug in the tokenizer, the output must be equal.

Edit: It would be interesting to know what transformers produces though, have you run convert_hf_to_gguf_update.py and looked at ggml-vocab-phi-3.gguf.outf.ex?

taronaeo · 2025-09-11T16:49:56Z

Hmm I've added this check though. I think its enough to prevent the endianness conversion script from running on LE systems.

https://github.com/ggml-org/llama.cpp/blob/ec1993d165b8b582668bda4cc99086f420b1b89b/.github/workflows/build.yml#L228

I was thinking your check can be added for the .inp and .out files though, since the tokeniser will be different between LE and BE

CISC · 2025-09-11T16:54:21Z

Hmm I've added this check though. I think its enough to prevent the endianness conversion script from running on LE systems.

Yeah, but test-tokenizers-repo.sh downloads more, see:
https://github.com/ggml-org/llama.cpp/actions/runs/17650662673/job/50160591872?pr=15925#step:8:4454

I was thinking your check can be added for the .inp and .out files though, since the tokeniser will be difference between LE and BE

They should not be different, this is a bug.

CISC · 2025-09-11T16:59:03Z

For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos.

The emoji issue seems to be specific for the WPM tokenizer, so you probably just haven't used any models with that.

taronaeo · 2025-09-11T17:27:48Z

Taking a quick glance at the tokeniser tests, its clear that Little-Endian and Big-Endian models process these a little differently (i.e., space vs. no space before text)
- 6: src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天～'
+ 6: res: 'Hello, y'all! How are you 😁?我想在apple工作1314151天～'
6: tok: 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252 
For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos. I suppose it's okay for me to edit the .out file to match the Big-Endian results right? :)
No, this looks like an endian-bug in the tokenizer, the output must be equal.

Edit: It would be interesting to know what transformers produces though, have you run convert_hf_to_gguf_update.py and looked at ggml-vocab-phi-3.gguf.outf.ex?

Got the same tokeniser result using Big-Endian model.

Steps taken:

python3 convert_hf_to_gguf_update.py
python3 convert_hf_to_gguf.py models/tokenizers/phi-3/ --outfile models/ggml-vocab-phi-3.gguf --vocab-only --bigendian

➜  llama.cpp git:(feat/s390x-ci) diff '/Users/taronaeo/Documents/llama.cpp/models/ggml-vocab-phi-3.gguf.out' '/Users/taronaeo/Downloads/ggml-vocab-phi-3.gguf.out'
➜  llama.cpp git:(feat/s390x-ci) echo $?
0

Big-Endian Phi-3 Tokeniser

 474 287 29871 29946 29871 30226 7378
 11585 7810 295

 259
 1678
 268
 29871 12
 29871 13
 29871 13 13
 29871 13 13 13
 29871 12 13
 15043 3186
 29871 15043 3186
 15043 2787
 29871 15043 2787
 29871 15043 2787 29991
 15043 29892 3186 29991
 29871 15043 29892 3186 29991
 29871 445 338 29871 243 162 169 156 29889 8223
 281 29900 29946 29947 29871 29955 9161 13535 18031 2176 6905
 1538 4851 665 1386 29713 1305
 29871 31849 31324 31934 228 162 142 228 161 146 228 162 133 228 161 153 228 161 186 31708 228 162 132 31708 228 161 165 31324 228 161 136 228 161 132 228 161 158 228 161 136 228 162 132 228 161 140
 29871 243 162 157 131 313 8945 29897 29871 243 162 155 185 30722 243 162 143 174 30598 313 20787 953 3848 275 16125 630 29897 29871 31681 313 6194 953 29877 2397 393 756 967 1914 5993 29897
 15043
 29871 15043
 259 15043
 1678 15043
 268 15043
 268 15043 13 1678 15043
 29871 313
 29871 13 353
 525 3152
 15043 29892 343 29915 497 29991 1128 526 366 29871 243 162 155 132 1577 30672 31522 30505 11548 31041 30732 29896 29941 29896 29946 29896 29945 29896 30408 30739
 1738 6824 21004
 29871 29941
 29871 29941 29941
 29871 29941 29941 29941
 29871 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941 29941 29941 29941
 315 228 190 176 29874 10630 30529 29873
 29871 2313 3163
 29871 13 29871 13 13 29871 13 13 13 29871 12 29871 12 12 29871 12 13 259 13 1678 13 268 13 418 13 243 162 157 131 313 8945 29897 29871 243 162 155 185 30722 243 162 143 174 30598 313 20787 953 3848 275 16125 630 29897 29871 31681 29871 243 162 169 156 243 162 169 156 29871 29941 29871 29941 29941 29871 29941 29941 29941 29871 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29941 29941 29941 29871 29941 29889 29941 29871 29941 636 29941 29871 29941 856 29941 29871 31849 31324 31934 228 162 142 228 161 146 228 162 133 228 161 153 228 161 186 31708 228 162 132 31708 228 161 165 31324 228 161 136 243 162 155 132 1577 30672 31522 30505 11548 31041 30732 29896 29941 29896 29946 29896 29945 29896 30408 30739 448 23648 2751 25512 1538 4851 665 1386 29713 1305 14550 4907 11120 16159 16159 16159 15945 15945 3045 636 6824 6824 6824 8773 8773 8773 306 29915 345 1063 525 29873 1025 540 29915 29879 727 29892 525 1525 366 1854 29973 525 29924 451 1854 306 29915 645 1207 372 29892 525 29928 366 763 777 23429 29973 1334 29915 29963 29872 263 29915 29880 29931

taronaeo · 2025-09-11T17:37:18Z

Hmm I've added this check though. I think its enough to prevent the endianness conversion script from running on LE systems.

Yeah, but test-tokenizers-repo.sh downloads more, see: https://github.com/ggml-org/llama.cpp/actions/runs/17650662673/job/50160591872?pr=15925#step:8:4454

Got it. I was looking at the test-tokenizer-0 results and I didn't scroll down to check that portion.

With regards to the tokeniser, I'm a little stumped now. If the generated .out file is the same as the Little-Endian variant, as you said, the test results should match 1-to-1 to Little-Endian. I'm not an expert on this, any idea how can we move forward?

CISC · 2025-09-11T19:41:16Z

With regards to the tokeniser, I'm a little stumped now. If the generated .out file is the same as the Little-Endian variant, as you said, the test results should match 1-to-1 to Little-Endian. I'm not an expert on this, any idea how can we move forward?

Someone will have to debug the tokenizers in question in llama-vocab.cpp and figure out where it goes wrong.

CISC · 2025-09-12T07:03:08Z

Got the same tokeniser result using Big-Endian model.

Great, but that was just SPM, which appears to work fine on big-endian, so we need to test some others. Check bert-bge (WPM) and t5 (UGM, though I think this one is skipped?) too. If you remove the chkhsh entry for f.ex. seed-coder (BPE) in convert_hf_to_gguf.py and run the update script again, it should have downloaded and generated files to test with.

taronaeo · 2025-09-13T14:38:53Z

Ah, I have a flight tomorrow and will on vacation from 14 to 21 September. I'll come back to this the week after, or I'll check if @AlekseiNikiforovIBM is able to continue this whilst I'm away :)

for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

…pendent way

AlekseiNikiforovIBM · 2025-09-26T13:42:45Z

I've rebased this PR since #16275 was merged, removed miniaudio-related commits since #16212 was merged, and removed commits devops: add s390x to build-linux-cross and Revert "devops: add s390x to build-linux-cross" since together they change nothing.

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

tests/CMakeLists.txt

examples/eval-callback/CMakeLists.txt

ggerganov · 2025-09-26T14:06:00Z

@taronaeo It's up: https://huggingface.co/ggml-org/models/tree/main/tinyllamas

Would it be possible to upload stories15M-be.Q4_0.gguf there too?

Yes, it's uploaded now.

CISC · 2025-09-26T16:11:46Z

@AlekseiNikiforovIBM Tests passing, wait for @taronaeo to merge or shall I merge when CI is done?

taronaeo · 2025-09-26T17:44:02Z

Feel free to merge this, I didnt have the time to check the CI results :)

taronaeo · 2025-09-26T18:01:08Z

CI / ggml-ci-x64-cpu-amx failing has been the same as other PRs.
CI / ggml-ci-arm64-cpu-high-perf-sve, CI / ggml-ci-x64-nvidia-vulkan-cm2 and CI / macOS-latest-cmake-arm64 failing does not seem related to this PR.

CI / ubuntu-22-cmake-vulkan (pull_request) This I'm unsure though:

	 27 - test-thread-safety (ILLEGAL)                      main
	 29 - test-opt (ILLEGAL)                                main
	 31 - test-backend-ops (ILLEGAL)                        main
	 34 - test-barrier (ILLEGAL)                            main
	 35 - test-quantize-fns (ILLEGAL)                       main
	 36 - test-quantize-perf (ILLEGAL)                      main
	 37 - test-rope (ILLEGAL)                               main

CISC · 2025-09-26T18:02:01Z

CI / ubuntu-22-cmake-vulkan (pull_request) This I'm unsure though:

It's "OK", it's a corrupt ccache.

taronaeo · 2025-09-26T18:03:14Z

CI / ubuntu-22-cmake-vulkan (pull_request) This I'm unsure though:

It's "OK", it's a corrupt ccache.

Gotcha. Will proceed to merge then.

* devops: move s390x and ppc64le ci build we have access to ubuntu-24.04-s390x and ppc64le images now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le for now since they have compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: stop warnings as errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: switch to non-macro flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: going the llama macro route Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian gguf test models Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le to test s390x, check test build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.inp files for big-endian tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.out files for big-endian too Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add python setup and endian byteswap Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: pooring thing does not have s390x python3 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add missing rust compiler for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try rust actions runner Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "devops: try rust actions runner" This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try a different path for rust Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dump home directory and user info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: install gguf-py only Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: missed relative path Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: remove big-endian files since local swapping is working Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: revert test-tokenizer-0 cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix unicode flags conversion from and to uint16_t Bitfields are allocated in different order on s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Simplify byteswap command Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix endianness detection in vocab loader Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Disable test-thread-safety on s390x In this test a model is downloaded, then immediately loaded to check if more downloads are needed, and then used for test. There is no clean way to separate all those steps to add byteswapping between them, so just skip this test. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q8_0 test in test-quantize-fns vec_signed uses unexpected rounding mode. Explicitly use different rounding function. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian stories260K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add s390x test-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix test does not exist Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix model not found llama-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q3_K dot product error in test-quantize-fns on s390x Array q8bytes had only 4 elements allocated, but 8 elements accessed. This lead to write out of bounds and later read of overwritten values out of bounds and incorrect result. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: re-enable ppc64le for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: activate test-thread-safety for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le tests for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: LLAMA_FATAL_WARNINGS=ON Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Correct repository URL for s390x for test-thread-safety model Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix fs_get_cache_directory Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Re-enable CI for ppc64le Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fortify ggml_rope_impl Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way * Update URL for big-endian model * Update .github/workflows/build.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update remaining mentions of BE models to ggml-org/models repo --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC · 2025-11-06T08:36:53Z

@taronaeo I'm not sure why, but the build has started failing:
https://github.com/ggml-org/llama.cpp/actions/runs/19126157232/job/54656632702

AlekseiNikiforovIBM · 2025-11-06T09:26:42Z

I will take a look.

* devops: move s390x and ppc64le ci build we have access to ubuntu-24.04-s390x and ppc64le images now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le for now since they have compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: stop warnings as errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: switch to non-macro flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: going the llama macro route Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian gguf test models Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le to test s390x, check test build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.inp files for big-endian tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.out files for big-endian too Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add python setup and endian byteswap Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: pooring thing does not have s390x python3 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add missing rust compiler for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try rust actions runner Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "devops: try rust actions runner" This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try a different path for rust Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dump home directory and user info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: install gguf-py only Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: missed relative path Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: remove big-endian files since local swapping is working Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: revert test-tokenizer-0 cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix unicode flags conversion from and to uint16_t Bitfields are allocated in different order on s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Simplify byteswap command Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix endianness detection in vocab loader Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Disable test-thread-safety on s390x In this test a model is downloaded, then immediately loaded to check if more downloads are needed, and then used for test. There is no clean way to separate all those steps to add byteswapping between them, so just skip this test. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q8_0 test in test-quantize-fns vec_signed uses unexpected rounding mode. Explicitly use different rounding function. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian stories260K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add s390x test-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix test does not exist Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix model not found llama-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q3_K dot product error in test-quantize-fns on s390x Array q8bytes had only 4 elements allocated, but 8 elements accessed. This lead to write out of bounds and later read of overwritten values out of bounds and incorrect result. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: re-enable ppc64le for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: activate test-thread-safety for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le tests for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: LLAMA_FATAL_WARNINGS=ON Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Correct repository URL for s390x for test-thread-safety model Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix fs_get_cache_directory Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Re-enable CI for ppc64le Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fortify ggml_rope_impl Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way * Update URL for big-endian model * Update .github/workflows/build.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update remaining mentions of BE models to ggml-org/models repo --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

github-actions bot added the devops improvements to build systems and github actions label Sep 10, 2025

github-actions bot added the testing Everything test related label Sep 10, 2025

taronaeo and others added 8 commits September 26, 2025 15:38

devops: disable ppc64le tests

5310bbb

for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

devops: LLAMA_FATAL_WARNINGS=ON

41fb59a

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Correct repository URL for s390x for test-thread-safety model

5b79489

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Fix fs_get_cache_directory

6d8d61a

Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Re-enable CI for ppc64le

4aa2c9e

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Fortify ggml_rope_impl

461add2

Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Add TODO in struct unicode_cpt_flags to reimplement it in endian-inde…

6dbeda3

…pendent way

Update URL for big-endian model

e33d9ee

AlekseiNikiforovIBM force-pushed the feat/s390x-ci branch from ecf6aaa to e33d9ee Compare September 26, 2025 13:40

Update .github/workflows/build.yml

f3ada9e

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ggerganov approved these changes Sep 26, 2025

View reviewed changes

tests/CMakeLists.txt Outdated Show resolved Hide resolved

examples/eval-callback/CMakeLists.txt Outdated Show resolved Hide resolved

Update remaining mentions of BE models to ggml-org/models repo

8ab0491

AlekseiNikiforovIBM force-pushed the feat/s390x-ci branch from 1fd003c to 8ab0491 Compare September 26, 2025 14:25

taronaeo merged commit 624207e into ggml-org:master Sep 26, 2025
62 of 67 checks passed

CISC mentioned this pull request Oct 3, 2025

Add Trigger at PR creation #15386

Merged

AlekseiNikiforovIBM mentioned this pull request Nov 6, 2025

[BUG] x86_64 binaries in /usr/local/bin in ubuntu-24.04-s390x image IBM/actionspz#60

Closed

Conversation

taronaeo commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

taronaeo commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

taronaeo commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

taronaeo commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

taronaeo commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

taronaeo commented Sep 11, 2025

Uh oh!

taronaeo commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

taronaeo commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Sep 11, 2025

Uh oh!

CISC commented Sep 11, 2025

Uh oh!

CISC commented Sep 12, 2025

Uh oh!

taronaeo commented Sep 13, 2025

Uh oh!

AlekseiNikiforovIBM commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Sep 26, 2025

Uh oh!

CISC commented Sep 26, 2025

Uh oh!

taronaeo commented Sep 26, 2025

Uh oh!

taronaeo commented Sep 26, 2025

Uh oh!

CISC commented Sep 26, 2025

Uh oh!

taronaeo commented Sep 10, 2025 •

edited

Loading

CISC commented Sep 10, 2025 •

edited

Loading

CISC commented Sep 10, 2025 •

edited

Loading

taronaeo commented Sep 11, 2025 •

edited

Loading

CISC commented Sep 11, 2025 •

edited

Loading

taronaeo commented Sep 11, 2025 •

edited

Loading

taronaeo commented Sep 11, 2025 •

edited

Loading