Add Vector-Length-Specific RISC-V Vector extension support using generic backend by camel-cdr · Pull Request #2593 · simdjson/simdjson

camel-cdr · 2026-01-18T19:52:54Z

This PR adds a new "rvv_vls" backend, which uses the RISC-V Vector extension.

It supports VLEN=128, VLEN=256 and VLEN=512, but requires globally compiling for the exact vector length of the hardware.
This works by setting adding -march=rv64gcv_zvl<VLEN>b -mrvv-vector-bits=zvl to the CXXFLAGS.

Due to limitations in current toolchains runtime dispatch isn't possible if there is only one translation unit, as is the case in the amalgamate.

I expect there will be a compiler switch in the future to make the codegen for the fixed length RVV binary compatible with larger VLENs (only using the lower part of the vector registers), but this isn't currently supported.

The plan is to later add a fully scalable Vector-Length-Agnostic "rvv" backend with proper runtime dispatch.
This requires writing entirely new backend without using the existing generic infrastructure, so it will take more work to do. (It's on the list of projects I'm currently working on)

Benchmark on the SpacemiT-X60, rvv-vls variant compiled with clang-19 and -march=rv64gcv_zba_zbb_zbs_zbc_zicond_zvl256b -mrvv-vector-bits=zvl:

$ ./build/benchmark/bench_ondemand --benchmark_min_time=30 --benchmark_filter=\<simdjson_ondemand\>
simdjson::dom implementation:      fallback
simdjson::ondemand implementation (stage 1): fallback
simdjson::ondemand implementation (stage 2): fallback
----------------------------------------------------------------------------------------------------------
Benchmark                                                Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------
json2msgpack<simdjson_ondemand>/manual_time        7926653 ns      8671429 ns         5301 best_branch_miss=0 best_bytes_per_sec=81.6701M best_cache_miss=0 best_cache_ref=0 best_cycles=163.166k best_cycles_per_byte=0.258372 best_docs_per_sec=129.324 best_frequency=21.1013M best_instructions=21.267k best_instructions_per_byte=0.0336762 best_instructions_per_cycle=0.13034 best_items_per_sec=129.324 branch_miss=0 bytes=631.515k bytes_per_second=75.9791Mi/s cache_miss=0 cache_ref=0 cycles=113.048k cycles_per_byte=0.179011 docs_per_sec=126.157/s frequency=14.2617M/s instructions=27.8688k instructions_per_byte=0.0441301 instructions_per_cycle=0.246522 items=1 items_per_second=126.157/s [BEST: throughput=  0.08 GB/s doc_throughput=   129 docs/s instructions=       21267 cycles=      163166 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=         1 avg_time=   7926652 ns]
partial_tweets<simdjson_ondemand>/manual_time      5388798 ns      5760000 ns         7797 best_branch_miss=0 best_bytes_per_sec=119.771M best_cache_miss=0 best_cache_ref=0 best_cycles=100.816k best_cycles_per_byte=0.159641 best_docs_per_sec=189.657 best_frequency=19.1205M best_instructions=25.757k best_instructions_per_byte=0.040786 best_instructions_per_cycle=0.255485 best_items_per_sec=18.9657k branch_miss=0 bytes=631.515k bytes_per_second=111.761Mi/s cache_miss=0 cache_ref=0 cycles=108.964k cycles_per_byte=0.172544 docs_per_sec=185.57/s frequency=20.2205M/s instructions=26.6578k instructions_per_byte=0.0422124 instructions_per_cycle=0.244647 items=100 items_per_second=18.557k/s [BEST: throughput=  0.12 GB/s doc_throughput=   189 docs/s instructions=       25757 cycles=      100816 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=       100 avg_time=   5388797 ns]
distinct_user_id<simdjson_ondemand>/manual_time    5192376 ns      5590659 ns         8096 best_branch_miss=0 best_bytes_per_sec=124.37M best_cache_miss=0 best_cache_ref=0 best_cycles=91.159k best_cycles_per_byte=0.14435 best_docs_per_sec=196.939 best_frequency=17.9528M best_instructions=24.485k best_instructions_per_byte=0.0387718 best_instructions_per_cycle=0.268597 best_items_per_sec=22.648k branch_miss=0 bytes=631.515k bytes_per_second=115.989Mi/s cache_miss=0 cache_ref=0 cycles=121.281k cycles_per_byte=0.192047 docs_per_sec=192.59/s frequency=23.3575M/s instructions=28.1257k instructions_per_byte=0.0445369 instructions_per_cycle=0.231906 items=115 items_per_second=22.1479k/s [BEST: throughput=  0.12 GB/s doc_throughput=   196 docs/s instructions=       24485 cycles=       91159 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=       115 avg_time=   5192375 ns]
find_tweet<simdjson_ondemand>/manual_time          4619933 ns      5022548 ns         9090 best_branch_miss=0 best_bytes_per_sec=139.359M best_cache_miss=0 best_cache_ref=0 best_cycles=80.198k best_cycles_per_byte=0.126993 best_docs_per_sec=220.674 best_frequency=17.6976M best_instructions=22.981k best_instructions_per_byte=0.0363903 best_instructions_per_cycle=0.286553 best_items_per_sec=220.674 branch_miss=0 bytes=631.515k bytes_per_second=130.361Mi/s cache_miss=0 cache_ref=0 cycles=113.802k cycles_per_byte=0.180205 docs_per_sec=216.453/s frequency=24.6329M/s instructions=28.2055k instructions_per_byte=0.0446633 instructions_per_cycle=0.247847 items=1 items_per_second=216.453/s [BEST: throughput=  0.14 GB/s doc_throughput=   220 docs/s instructions=       22981 cycles=       80198 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=         1 avg_time=   4619932 ns]
top_tweet<simdjson_ondemand>/manual_time           5221692 ns      5669724 ns         8055 best_branch_miss=0 best_bytes_per_sec=124.351M best_cache_miss=0 best_cache_ref=0 best_cycles=233.206k best_cycles_per_byte=0.36928 best_docs_per_sec=196.908 best_frequency=45.9202M best_instructions=41.065k best_instructions_per_byte=0.0650262 best_instructions_per_cycle=0.176089 best_items_per_sec=196.908 branch_miss=0 bytes=631.515k bytes_per_second=115.338Mi/s cache_miss=0 cache_ref=0 cycles=139.233k cycles_per_byte=0.220475 docs_per_sec=191.509/s frequency=26.6643M/s instructions=29.9431k instructions_per_byte=0.0474146 instructions_per_cycle=0.215057 items=1 items_per_second=191.509/s [BEST: throughput=  0.12 GB/s doc_throughput=   196 docs/s instructions=       41065 cycles=      233206 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=         1 avg_time=   5221691 ns]
Creating a source file spanning 134087 KB
kostya<simdjson_ondemand>/manual_time           1173527116 ns   1225406185 ns           36 best_branch_miss=0 best_bytes_per_sec=117.286M best_cache_miss=0 best_cache_ref=0 best_cycles=9.5114M best_cycles_per_byte=0.069272 best_docs_per_sec=0.854201 best_frequency=8.12464M best_instructions=1.4845M best_instructions_per_byte=0.0108117 best_instructions_per_cycle=0.156076 best_items_per_sec=447.847k branch_miss=0 bytes=137.305M bytes_per_second=111.582Mi/s cache_miss=0 cache_ref=0 cycles=9.72474M cycles_per_byte=0.0708258 docs_per_sec=0.852132/s frequency=8.28676M/s instructions=1.51865M instructions_per_byte=0.0110604 instructions_per_cycle=0.156164 items=524.288k items_per_second=446.763k/s [BEST: throughput=  0.12 GB/s doc_throughput=     0 docs/s instructions=     1484503 cycles=     9511396 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=    524288 avg_time=1173527115 ns]
Creating a source file spanning 44921 KB
large_random<simdjson_ondemand>/manual_time      711698319 ns    729061562 ns           59 best_branch_miss=0 best_bytes_per_sec=65.1227M best_cache_miss=0 best_cache_ref=0 best_cycles=6.30771M best_cycles_per_byte=0.137128 best_docs_per_sec=1.41575 best_frequency=8.93014M best_instructions=926.008k best_instructions_per_byte=0.0201311 best_instructions_per_cycle=0.146806 best_items_per_sec=1.41575M branch_miss=0 bytes=45.9988M bytes_per_second=61.6383Mi/s cache_miss=0 cache_ref=0 cycles=6.26053M cycles_per_byte=0.136102 docs_per_sec=1.40509/s frequency=8.7966M/s instructions=919.83k instructions_per_byte=0.0199968 instructions_per_cycle=0.146925 items=1M items_per_second=1.40509M/s [BEST: throughput=  0.07 GB/s doc_throughput=     1 docs/s instructions=      926008 cycles=     6307712 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=   1000000 avg_time= 711698318 ns]

$ ./build/benchmark/bench_ondemand --benchmark_min_time=30 --benchmark_filter=\<simdjson_ondemand\>
simdjson::dom implementation:      rvv_vls
simdjson::ondemand implementation (stage 1): rvv_vls
simdjson::ondemand implementation (stage 2): rvv_vls
----------------------------------------------------------------------------------------------------------
Benchmark                                                Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------
json2msgpack<simdjson_ondemand>/manual_time        3382191 ns      4134661 ns        12424 best_branch_miss=0 best_bytes_per_sec=192.675M best_cache_miss=0 best_cache_ref=0 best_cycles=51.866k best_cycles_per_byte=0.0821295 best_docs_per_sec=305.1 best_frequency=15.8243M best_instructions=16.684k best_instructions_per_byte=0.026419 best_instructions_per_cycle=0.321675 best_items_per_sec=305.1 branch_miss=0 bytes=631.515k bytes_per_second=178.068Mi/s cache_miss=0 cache_ref=0 cycles=85.1543k cycles_per_byte=0.134841 docs_per_sec=295.666/s frequency=25.1773M/s instructions=21.965k instructions_per_byte=0.0347815 instructions_per_cycle=0.257944 items=1 items_per_second=295.666/s [BEST: throughput=  0.19 GB/s doc_throughput=   305 docs/s instructions=       16684 cycles=       51866 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=         1 avg_time=   3382190 ns]
partial_tweets<simdjson_ondemand>/manual_time      1862772 ns      2238208 ns        22554 best_branch_miss=0 best_bytes_per_sec=351.579M best_cache_miss=0 best_cache_ref=0 best_cycles=83.173k best_cycles_per_byte=0.131704 best_docs_per_sec=556.723 best_frequency=46.3043M best_instructions=21.848k best_instructions_per_byte=0.0345962 best_instructions_per_cycle=0.262681 best_items_per_sec=55.6723k branch_miss=0 bytes=631.515k bytes_per_second=323.314Mi/s cache_miss=0 cache_ref=0 cycles=86.0076k cycles_per_byte=0.136192 docs_per_sec=536.834/s frequency=46.1718M/s instructions=21.9277k instructions_per_byte=0.0347224 instructions_per_cycle=0.254951 items=100 items_per_second=53.6834k/s [BEST: throughput=  0.35 GB/s doc_throughput=   556 docs/s instructions=       21848 cycles=       83173 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=       100 avg_time=   1862771 ns]
distinct_user_id<simdjson_ondemand>/manual_time    1768015 ns      2179034 ns        23779 best_branch_miss=0 best_bytes_per_sec=371.549M best_cache_miss=0 best_cache_ref=0 best_cycles=53.074k best_cycles_per_byte=0.0840423 best_docs_per_sec=588.345 best_frequency=31.2258M best_instructions=17.778k best_instructions_per_byte=0.0281514 best_instructions_per_cycle=0.334966 best_items_per_sec=67.6597k branch_miss=0 bytes=631.515k bytes_per_second=340.642Mi/s cache_miss=0 cache_ref=0 cycles=93.2712k cycles_per_byte=0.147694 docs_per_sec=565.606/s frequency=52.7548M/s instructions=22.582k instructions_per_byte=0.0357584 instructions_per_cycle=0.242111 items=115 items_per_second=65.0447k/s [BEST: throughput=  0.37 GB/s doc_throughput=   588 docs/s instructions=       17778 cycles=       53074 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=       115 avg_time=   1768014 ns]
find_tweet<simdjson_ondemand>/manual_time          1220429 ns      1653801 ns        34444 best_branch_miss=0 best_bytes_per_sec=538.587M best_cache_miss=0 best_cache_ref=0 best_cycles=48.469k best_cycles_per_byte=0.0767504 best_docs_per_sec=852.849 best_frequency=41.3368M best_instructions=18.21k best_instructions_per_byte=0.0288354 best_instructions_per_cycle=0.375704 best_items_per_sec=852.849 branch_miss=0 bytes=631.515k bytes_per_second=493.482Mi/s cache_miss=0 cache_ref=0 cycles=76.2932k cycles_per_byte=0.12081 docs_per_sec=819.384/s frequency=62.5135M/s instructions=19.823k instructions_per_byte=0.0313895 instructions_per_cycle=0.259826 items=1 items_per_second=819.384/s [BEST: throughput=  0.54 GB/s doc_throughput=   852 docs/s instructions=       18210 cycles=       48469 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=         1 avg_time=   1220428 ns]
top_tweet<simdjson_ondemand>/manual_time           1770737 ns      2241024 ns        23771 best_branch_miss=0 best_bytes_per_sec=372.902M best_cache_miss=0 best_cache_ref=0 best_cycles=80.992k best_cycles_per_byte=0.12825 best_docs_per_sec=590.488 best_frequency=47.8248M best_instructions=19.938k best_instructions_per_byte=0.0315717 best_instructions_per_cycle=0.246172 best_items_per_sec=590.488 branch_miss=0 bytes=631.515k bytes_per_second=340.118Mi/s cache_miss=0 cache_ref=0 cycles=83.7522k cycles_per_byte=0.132621 docs_per_sec=564.737/s frequency=47.2979M/s instructions=20.3413k instructions_per_byte=0.0322103 instructions_per_cycle=0.242874 items=1 items_per_second=564.737/s [BEST: throughput=  0.37 GB/s doc_throughput=   590 docs/s instructions=       19938 cycles=       80992 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=         1 avg_time=   1770736 ns]
Creating a source file spanning 134087 KB
kostya<simdjson_ondemand>/manual_time            485458298 ns    537202012 ns           87 best_branch_miss=0 best_bytes_per_sec=284.104M best_cache_miss=0 best_cache_ref=0 best_cycles=4.34635M best_cycles_per_byte=0.0316547 best_docs_per_sec=2.06914 best_frequency=8.99322M best_instructions=635.983k best_instructions_per_byte=4.6319m best_instructions_per_cycle=0.146326 best_items_per_sec=1.08483M branch_miss=0 bytes=137.305M bytes_per_second=269.733Mi/s cache_miss=0 cache_ref=0 cycles=4.38624M cycles_per_byte=0.0319453 docs_per_sec=2.05991/s frequency=9.03527M/s instructions=637.71k instructions_per_byte=4.64448m instructions_per_cycle=0.145389 items=524.288k items_per_second=1.07999M/s [BEST: throughput=  0.28 GB/s doc_throughput=     2 docs/s instructions=      635983 cycles=     4346349 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=    524288 avg_time= 485458298 ns]
Creating a source file spanning 44921 KB
large_random<simdjson_ondemand>/manual_time      390553309 ns    407920899 ns          108 best_branch_miss=0 best_bytes_per_sec=118.347M best_cache_miss=0 best_cache_ref=0 best_cycles=3.311M best_cycles_per_byte=0.0719802 best_docs_per_sec=2.57283 best_frequency=8.51864M best_instructions=503.369k best_instructions_per_byte=0.0109431 best_instructions_per_cycle=0.152029 best_items_per_sec=2.57283M branch_miss=0 bytes=45.9988M bytes_per_second=112.322Mi/s cache_miss=0 cache_ref=0 cycles=3.45591M cycles_per_byte=0.0751305 docs_per_sec=2.56047/s frequency=8.84876M/s instructions=519.673k instructions_per_byte=0.0112975 instructions_per_cycle=0.150372 items=1M items_per_second=2.56047M/s [BEST: throughput=  0.12 GB/s doc_throughput=     2 docs/s instructions=      503369 cycles=     3311003 branch_miss=       0 cache_miss=       0 cache_ref=         0 items=   1000000 avg_time= 390553309 ns]

lemire · 2026-01-18T20:01:46Z

@camel-cdr This looks amazing.

camel-cdr · 2026-01-18T21:01:06Z

I'm not sure why the workflows are failing, it worked locally and on hardware, I'll look at them tomorrow.

lemire · 2026-01-18T21:47:17Z

Running tests again

lemire · 2026-01-19T00:19:53Z

It seems that at least libyyjson.a was compiled for x64.

[100%] Linking CXX executable bench_ondemand
/usr/bin/riscv64-linux-gnu-ld: ../dependencies/libyyjson.a(yyjson.c.o): Relocations in generic ELF (EM: 62)
/usr/bin/riscv64-linux-gnu-ld: ../dependencies/libyyjson.a(yyjson.c.o): Relocations in generic ELF (EM: 62)
/usr/bin/riscv64-linux-gnu-ld: ../dependencies/libyyjson.a(yyjson.c.o): Relocations in generic ELF (EM: 62)
/usr/bin/riscv64-linux-gnu-ld: ../dependencies/libyyjson.a(yyjson.c.o): Relocations in generic ELF (EM: 62)
/usr/bin/riscv64-linux-gnu-ld: ../dependencies/libyyjson.a: error adding symbols: file in wrong format

If the problem is limited to this one library, we can skip it. But maybe this reflect another issue.

camel-cdr · 2026-01-19T19:00:51Z

@lemire All tests are passing on CI as well now.

I needed to disable simdjson_force_implementation_error when cross-compiling (CMAKE_CROSSCOMPILING_EMULATOR). The test succeeds on hardware.

lemire · 2026-01-19T20:12:33Z

tests/dom/CMakeLists.txt

-  if(CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL x86_64 OR CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL amd64)
+  if((CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL x86_64 OR CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL amd64) AND NOT DEFINED CMAKE_CROSSCOMPILING_EMULATOR)
    add_test(
      NAME simdjson_force_implementation_error


that looks fine to me.

lemire · 2026-01-19T20:17:04Z

@camel-cdr It is not a requirement, but have you considered add a little bit of documentation... maybe there:

https://github.com/simdjson/simdjson/blob/master/doc/implementation-selection.md

and possibly here:

https://github.com/simdjson/simdjson/blob/master/HACKING.md

This is optional and can be done as future PR.

I think your work is a big deal, so maybe it is worth documenting.

lemire

I think we can merge this, it looks excellent, but I will wait to see if @camel-cdr is interested in adding some words of documentation regarding RISC-V support. (optionally)

camel-cdr · 2026-01-20T15:37:19Z

Yeah, I'll write an explanation

lemire · 2026-01-21T15:13:46Z

@camel-cdr I'll merge this now, and I invite you to produce a pull request with some documentation regarding this work you did.

camel-cdr mentioned this pull request Jan 18, 2026

benchmark simdjson on X100 sanderjo/SpacemiT-K3-X100-A100#2

Closed

camel-cdr force-pushed the master branch 2 times, most recently from c074be4 to ebbf345 Compare January 18, 2026 20:42

camel-cdr force-pushed the master branch 2 times, most recently from 305e10e to 912c9eb Compare January 18, 2026 21:37

camel-cdr force-pushed the master branch 4 times, most recently from d309254 to c30d272 Compare January 19, 2026 17:52

add rvv-vls backend

e8bbd24

camel-cdr force-pushed the master branch from c30d272 to e8bbd24 Compare January 19, 2026 18:28

lemire reviewed Jan 19, 2026

View reviewed changes

lemire approved these changes Jan 20, 2026

View reviewed changes

lemire merged commit db93de2 into simdjson:master Jan 21, 2026
85 checks passed

vielmetti mentioned this pull request Jan 22, 2026

RISC-V 64 - Vector Support #1868

Closed

BrewTestBot mentioned this pull request Feb 19, 2026

simdjson 4.3.0 Homebrew/homebrew-core#268291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Vector-Length-Specific RISC-V Vector extension support using generic backend#2593

Add Vector-Length-Specific RISC-V Vector extension support using generic backend#2593
lemire merged 1 commit intosimdjson:masterfrom
camel-cdr:master

camel-cdr commented Jan 18, 2026

Uh oh!

lemire commented Jan 18, 2026

Uh oh!

camel-cdr commented Jan 18, 2026

Uh oh!

lemire commented Jan 18, 2026

Uh oh!

lemire commented Jan 19, 2026

Uh oh!

camel-cdr commented Jan 19, 2026

Uh oh!

lemire Jan 19, 2026

Uh oh!

lemire commented Jan 19, 2026

Uh oh!

lemire left a comment

Uh oh!

camel-cdr commented Jan 20, 2026

Uh oh!

lemire commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

camel-cdr commented Jan 18, 2026

Uh oh!

lemire commented Jan 18, 2026

Uh oh!

camel-cdr commented Jan 18, 2026

Uh oh!

lemire commented Jan 18, 2026

Uh oh!

lemire commented Jan 19, 2026

Uh oh!

camel-cdr commented Jan 19, 2026

Uh oh!

lemire Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

lemire commented Jan 19, 2026

Uh oh!

lemire left a comment

Choose a reason for hiding this comment

Uh oh!

camel-cdr commented Jan 20, 2026

Uh oh!

lemire commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants