Skip to content

Add support for validating and escaping UTF-8 strings.#453

Merged
LinZhihao-723 merged 13 commits into
y-scope:mainfrom
LinZhihao-723:escape_utf8_str
Jun 27, 2024
Merged

Add support for validating and escaping UTF-8 strings.#453
LinZhihao-723 merged 13 commits into
y-scope:mainfrom
LinZhihao-723:escape_utf8_str

Conversation

@LinZhihao-723

Copy link
Copy Markdown
Member

Description

Motivation

When outputting a C++ string to a human-readable (printable) format, we must ensure the string is UTF8 valid, and escape special characters such as the new line character \n or the quote mark ". This is required if we want to serialize string data into a JSON format.
There are third-party libraries that already implement such a validation + escaping mechanism. For example, nlohmann JSON provides string dump to execute the validation and escape. However, using external libraries have the following problems:

  1. Run time overhead. We need to convert a string view into their representation. In our previous benchmark, using nlohmann JSON for escaping purpose can introduce significant overhead.
  2. Library dependencies in ffi libraries: We need extra dependencies on our ffi libraries.

Therefore, we decide to implement our own implementation for validating and escaping UTF8 strings.

Implementation

The functionalities can be breakdown as follows:

  • Validate UTF8 encoding:

    • Validate UTF8 byte sequence
    • Validate code point ranges
  • Escape special characters

    • Escape printable characters by add ''
    • Escape control characters in the form of \\u00xx

    This PR adds a util function to implement the functionalities listed above.

Validation performed

  • Passed clang-tidy checks
  • Added unit tests to cover the functionalities discussed above. Details can be found in the unit test implementations.

Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/src/clp/ffi/utils.hpp Outdated
Comment thread components/core/src/clp/ffi/utils.hpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_utils.cpp
LinZhihao-723 and others added 3 commits June 26, 2024 01:16
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. A few remaining renaming suggestions.

Comment thread components/core/src/clp/utf8_utils.hpp Outdated
Comment thread components/core/src/clp/utf8_utils.hpp Outdated

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring fixes

Comment thread components/core/src/clp/ffi/utils.hpp Outdated
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@kirkrodrigues kirkrodrigues changed the title Add support for validating and escaping UTF8 strings. Add support for validating and escaping UTF-8 strings. Jun 27, 2024
@LinZhihao-723 LinZhihao-723 merged commit 249816b into y-scope:main Jun 27, 2024
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants