Skip to content

Conversation

@erikcorry
Copy link
Collaborator

I don't know if it's faster yet.

@erikcorry
Copy link
Collaborator Author

It's now faster on my machine.

@lemire
Copy link
Member

lemire commented Nov 17, 2025

@erikcorry Thank you. I will review today.

@lemire
Copy link
Member

lemire commented Nov 18, 2025

I'll merge, but I will later change this to memcpy.

      uint32_t straddle1 =
          *reinterpret_cast<const uint32_t*>(in + pos + 1 * N - 1);
      uint32_t straddle2 =
          *reinterpret_cast<const uint32_t*>(in + pos + 2 * N - 1);

As far as I can tell, this can lead to unaligned loads which is UB. It should be safe but will trigger sanitizer warnings. (A memcpy, won't affect the perf.)

@lemire lemire merged commit b42b794 into simdutf:utf16_to_utf8_length_replacement Nov 18, 2025
19 checks passed
lemire added a commit that referenced this pull request Nov 18, 2025
* init

* adding tests.

* initial impl.

* adding comment.

* format

* haswell and westmere

* implemented icelake

* speeding up icelake

* done with icelake

* better documentation.

* fixing portability issue with Windows

* got the name of the intrinsic wrong.

* saving.

* applying an optimization.

* optimized icelake.

* fixing other missed opportunities

* fixing the cast

* Update scripts/README_ADD_FUNCTION.md

Co-authored-by: Paul Dreik <github@pauldreik.se>

* Update CONTRIBUTING.md

Co-authored-by: Paul Dreik <github@pauldreik.se>

* Update CONTRIBUTING.md

Co-authored-by: Paul Dreik <github@pauldreik.se>

* Update scripts/README_ADD_FUNCTION.md

Co-authored-by: Paul Dreik <github@pauldreik.se>

* correcting feature check.

* fixing big-endian issue

* lint.

* typo

* Alternative strategy for UTF-8 length from malformed UTF-16 (#857)

* Alternative strategy for UTF-8 length from malformed UTF-16

* Don't expect any surrogates, skip work in this case

* correct the memcpy

* lint

* more testing and fixing a bug in generic and arm impl.

* adding alignment (workaround for bug in some versions of gcc).

---------

Co-authored-by: Daniel Lemire <dlemire@lemire.me>
Co-authored-by: Paul Dreik <github@pauldreik.se>
Co-authored-by: Erik Corry <erik@arbat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants