Skip to content

Rename rapidjson helper functions; remove in situ parsing#114017

Open
GrabYourPitchforks wants to merge 4 commits intodotnet:mainfrom
GrabYourPitchforks:rapidjson
Open

Rename rapidjson helper functions; remove in situ parsing#114017
GrabYourPitchforks wants to merge 4 commits intodotnet:mainfrom
GrabYourPitchforks:rapidjson

Conversation

@GrabYourPitchforks
Copy link
Copy Markdown
Member

@GrabYourPitchforks GrabYourPitchforks commented Mar 28, 2025

Since rapidjson is a disallowed deserializer within Microsoft, signing off on .NET 10 will require us to attest that every call site into rapidjson processes only fully trusted input.

This is straightforward enough for a point-in-time assessment, but it does represent ongoing cost, and we don't want to risk introducing new call sites into this logic where we can't guarantee that the input is trustworthy. To facilitate this, I recommend renaming our rapidjson wrapping utility methods to parse_fully_trusted_raw_data and parse_fully_trusted_file, which clearly indicate at the invocation site that the caller passes only fully trustworthy data. This should reduce the risk of us violating the trust contract going forward and should simplify future attestations.

Additionally, we received internal bug reports of in situ parsing causing AVs when parsing improperly formatted JSON payloads. rapidjson's in situ parser requires a mutable null-terminated string; and since we use mmap files, we can't guarantee the presence of a null terminator. This means that if the file length happens to exactly match the page size of the underlying OS, and if the underlying JSON blob has an error (like missing the closing bracket), rapidjson will try to read into unmapped memory and crash the process rather than report an actionable error.

This is resolved by passing an explicit length to the parse routine. However, since ParseInsitu doesn't provide an overload that takes an explicit length, we should fall back to normal parsing.

Note to reviewers: If we really do want to enable in-situ parsing, we could always create our own GenericInsituStringStream-like type which takes a length and call ParseStream, passing kParseInsituFlag as a template arg. We can't use MemoryStream directly since it's read-only.

(Jeff, this is what I pinged you about via IM.)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (7)
  • src/native/corehost/comhost/clsidmap.cpp: Language not supported
  • src/native/corehost/fxr/sdk_resolver.cpp: Language not supported
  • src/native/corehost/fxr/standalone/hostpolicy_resolver.cpp: Language not supported
  • src/native/corehost/hostpolicy/deps_format.cpp: Language not supported
  • src/native/corehost/json_parser.cpp: Language not supported
  • src/native/corehost/json_parser.h: Language not supported
  • src/native/corehost/runtime_config.cpp: Language not supported

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

@GrabYourPitchforks
Copy link
Copy Markdown
Member Author

Rebased on latest main to resolve merge conflicts.

Copilot AI review requested due to automatic review settings March 24, 2026 19:01
@GrabYourPitchforks GrabYourPitchforks changed the title Rename rapidjson helper functions Rename rapidjson helper functions; remove in situ parsing Mar 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (4)

src/native/corehost/json_parser.cpp:114

  • In parse_fully_trusted_file, m_data = (char*)pal::mmap_read(...) casts away const from a read-only mapping. Even though current parsing is non-mutating, this makes it easy to accidentally introduce writes later (which would fault on Unix/Windows). Prefer keeping the mapping pointer const throughout and only using mutable mappings when truly required.
    if (m_data == nullptr)
    {
        m_data = (char*)pal::mmap_read(path, &m_size);

        if (m_data == nullptr)
        {
            trace::error(_X("Cannot use file stream for [%s]: %s"), path.c_str(), pal::strerror(errno).c_str());
            return false;
        }

src/native/corehost/json_parser.cpp:113

  • The error text "Cannot use file stream for [...]" is misleading here since the code path is attempting a memory-map (mmap_read), not a file stream. Consider updating the message to reflect the actual operation (mapping the file) to make diagnostics clearer.
        if (m_data == nullptr)
        {
            trace::error(_X("Cannot use file stream for [%s]: %s"), path.c_str(), pal::strerror(errno).c_str());
            return false;

src/native/corehost/json_parser.cpp:36

  • get_line_column_from_offset can read past the end of the buffer: when i == size - 1 (possible when offset == size), the data[i + 1] access in the CRLF check is out-of-bounds. Add a bounds check (e.g., ensure i + 1 < size) before reading data[i + 1] so malformed/edge inputs don’t trigger an OOB read while formatting parse errors.
void get_line_column_from_offset(const char* data, size_t size, size_t offset, int *line, int *column)
{
    assert(offset <= size);

    *line = *column = 1;

    for (size_t i = 0; i < offset; i++)
    {
        (*column)++;

        if (data[i] == '\n')
        {
            (*line)++;
            *column = 1;
        }
        else if (data[i] == '\r' && data[i + 1] == '\n')
        {

src/native/corehost/json_parser.h:52

  • pal::mmap_read returns a const void* (read-only mapping), but m_data is a mutable char* and the code casts away const when assigning the mapping. Since parsing no longer uses in-situ mutation, consider making m_data a const char* and updating parse_fully_trusted_raw_data to take const char* to avoid misleading mutability and accidental writes to a read-only mapping.
        bool parse_fully_trusted_raw_data(char* data, size_t size, const pal::string_t& context);
        bool parse_fully_trusted_file(const pal::string_t& path);

        json_parser_t()
            : m_data(nullptr)
            , m_bundle_location(nullptr) {}

        ~json_parser_t();

    private:
        char* m_data; // The memory mapped bytes of the file
        size_t m_size; // Size of the mapped memory

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/native/corehost/json_parser.cpp:40

  • In get_line_column_from_offset, the CRLF detection now checks (i + 1) < offset instead of guarding against the end of the buffer. This can under-count newlines when the error offset points at the \n of a CRLF pair, and it’s not necessary for bounds safety (the real bound is size). Consider changing the condition to guard with size so CRLF is recognized when present while still avoiding an out-of-bounds read.
        else if (data[i] == '\r' && (i + 1) < offset && data[i + 1] == '\n')
        {
            (*line)++;
            *column = 1;

            i++; // Discard carriage return

Comment thread src/native/corehost/json_parser.cpp
Comment thread src/native/corehost/json_parser.cpp
@pavelsavara
Copy link
Copy Markdown
Member

pavelsavara commented Mar 25, 2026

I will comment from browser/wasm perspective.

  • I believe that main use-case for json parser in coreCLR codebase is to parse runtimeconfig.json
  • in browser the host is statically linked with the runtime, the the part that chooses runtime version doesn't apply.
  • so we only need to know values of configProperties and that's is (string,string)[] list of pair of strings.
  • I think it's like that for all "mobile" targets.
  • in browserhost for CoreCLR we use browser json parser to parse much larger manifest, and configProperties are part of that. The JS side will transform that directly into appctx_keys, appctx_values C UTF8 char**
  • in mono for browser it's also like that

for (const [key, value] of runtimeConfigProperties.entries()) {
const keyPtr = _ems_.dotnetBrowserUtilsExports.stringToUTF8Ptr(key);
const valuePtr = _ems_.dotnetBrowserUtilsExports.stringToUTF8Ptr(value);
_ems_.dotnetApi.setHeapU32((appctx_keys as any) + (propertyCount * sizeOfPtr), keyPtr);
_ems_.dotnetApi.setHeapU32((appctx_values as any) + (propertyCount * sizeOfPtr), valuePtr);
propertyCount++;
buffers.push(keyPtr as any);
buffers.push(valuePtr as any);
}

const runtimeConfigProperties = new Map<string, string>();
if (runtimeHelpers.config.runtimeConfig?.runtimeOptions?.configProperties) {
for (const [key, value] of Object.entries(runtimeHelpers.config.runtimeConfig?.runtimeOptions?.configProperties)) {
runtimeConfigProperties.set(key, "" + value);
}
}
runtimeConfigProperties.set("APP_CONTEXT_BASE_DIRECTORY", "/");
runtimeConfigProperties.set("RUNTIME_IDENTIFIER", "browser-wasm");
const propertyCount = runtimeConfigProperties.size;
const buffers:VoidPtr[] = [];
const appctx_keys = malloc(4 * runtimeConfigProperties.size) as any as CharPtrPtr;
const appctx_values = malloc(4 * runtimeConfigProperties.size) as any as CharPtrPtr;

In the past we avoided need for json parser in the VM, by converting into very simple binary file at compile time.
For browser I got rid of that in #115113

But I still see the same idea on apple/android

if (BundledRuntimeConfig?.ItemSpec != null)
{
dataSymbol = BundledRuntimeConfig.GetMetadata("DataSymbol");
if (string.IsNullOrEmpty(dataSymbol))
{
throw new LogAsErrorException($"'{nameof(BundledRuntimeConfig)}' does not contain 'DataSymbol' metadata.");
}
dataLenSymbol = BundledRuntimeConfig.GetMetadata("DataLenSymbol");
if (string.IsNullOrEmpty(dataLenSymbol))
{
throw new LogAsErrorException($"'{nameof(BundledRuntimeConfig)}' does not contain 'DataLenSymbol' metadata.");
}
externBundledResourcesSymbols.AppendLine();
externBundledResourcesSymbols.AppendLine($"extern uint8_t {dataSymbol}[];");
externBundledResourcesSymbols.AppendLine($"extern const uint32_t {dataLenSymbol};");
}

In the future, we may still need this compile time trick for WASI, but just for the configProperties part.

So my angle is, let's delete it, it makes coreCLR binary larger download for the browser.
(I realize normal OS story is different)

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Mar 25, 2026

it makes coreCLR binary larger download for the browser.

This parser should not be linked into coreclr binary, on any OS. Do you see it compiled into coreclr binary on browser?

@pavelsavara
Copy link
Copy Markdown
Member

This parser should not be linked into coreclr binary, on any OS. Do you see it compiled into coreclr binary on browser?

host and vm are the same binary for wasm, but I stand corrected, we are NOT building/linking it in the browser.

if(NOT CLR_CMAKE_TARGET_BROWSER)
add_library(fxr_resolver INTERFACE)
target_sources(fxr_resolver INTERFACE fxr_resolver.cpp)
target_include_directories(fxr_resolver INTERFACE fxr)
add_compile_definitions(RAPIDJSON_HAS_CXX17)
if ((NOT DEFINED CLR_CMAKE_USE_SYSTEM_RAPIDJSON) OR (NOT CLR_CMAKE_USE_SYSTEM_RAPIDJSON))
include_directories(${CLR_SRC_NATIVE_DIR}/external/)
endif()

I don't know about other single-file hosts.

@GrabYourPitchforks
Copy link
Copy Markdown
Member Author

GrabYourPitchforks commented Mar 27, 2026

Elinor requested offline that I run the startup perf benchmarks. Should get to that next week.

Edit: No measurable change. 60 ~ 62 ms on my machine to launch emptycsconsoletemplate, regardless of whether the runtime comes from main or from this PR's branch. I also confirmed the perf scaffolding drops a runtimeconfig.json file alongside the executable, which I tweaked to target net11:

{
  "runtimeOptions": {
    "tfm": "net11.0",
    "framework": {
      "name": "Microsoft.NETCore.App",
      "version": "11.0.0"
    },
    "configProperties": {
      "System.Reflection.Metadata.MetadataUpdater.IsSupported": false,
      "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization": false
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants