Rename rapidjson helper functions; remove in situ parsing#114017
Rename rapidjson helper functions; remove in situ parsing#114017GrabYourPitchforks wants to merge 4 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (7)
- src/native/corehost/comhost/clsidmap.cpp: Language not supported
- src/native/corehost/fxr/sdk_resolver.cpp: Language not supported
- src/native/corehost/fxr/standalone/hostpolicy_resolver.cpp: Language not supported
- src/native/corehost/hostpolicy/deps_format.cpp: Language not supported
- src/native/corehost/json_parser.cpp: Language not supported
- src/native/corehost/json_parser.h: Language not supported
- src/native/corehost/runtime_config.cpp: Language not supported
|
Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov |
cd59749 to
b1f578c
Compare
|
Rebased on latest main to resolve merge conflicts. |
b1f578c to
dfb6b96
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (4)
src/native/corehost/json_parser.cpp:114
- In
parse_fully_trusted_file,m_data = (char*)pal::mmap_read(...)casts awayconstfrom a read-only mapping. Even though current parsing is non-mutating, this makes it easy to accidentally introduce writes later (which would fault on Unix/Windows). Prefer keeping the mapping pointerconstthroughout and only using mutable mappings when truly required.
if (m_data == nullptr)
{
m_data = (char*)pal::mmap_read(path, &m_size);
if (m_data == nullptr)
{
trace::error(_X("Cannot use file stream for [%s]: %s"), path.c_str(), pal::strerror(errno).c_str());
return false;
}
src/native/corehost/json_parser.cpp:113
- The error text
"Cannot use file stream for [...]"is misleading here since the code path is attempting a memory-map (mmap_read), not a file stream. Consider updating the message to reflect the actual operation (mapping the file) to make diagnostics clearer.
if (m_data == nullptr)
{
trace::error(_X("Cannot use file stream for [%s]: %s"), path.c_str(), pal::strerror(errno).c_str());
return false;
src/native/corehost/json_parser.cpp:36
get_line_column_from_offsetcan read past the end of the buffer: wheni == size - 1(possible whenoffset == size), thedata[i + 1]access in the CRLF check is out-of-bounds. Add a bounds check (e.g., ensurei + 1 < size) before readingdata[i + 1]so malformed/edge inputs don’t trigger an OOB read while formatting parse errors.
void get_line_column_from_offset(const char* data, size_t size, size_t offset, int *line, int *column)
{
assert(offset <= size);
*line = *column = 1;
for (size_t i = 0; i < offset; i++)
{
(*column)++;
if (data[i] == '\n')
{
(*line)++;
*column = 1;
}
else if (data[i] == '\r' && data[i + 1] == '\n')
{
src/native/corehost/json_parser.h:52
pal::mmap_readreturns aconst void*(read-only mapping), butm_datais a mutablechar*and the code casts awayconstwhen assigning the mapping. Since parsing no longer uses in-situ mutation, consider makingm_dataaconst char*and updatingparse_fully_trusted_raw_datato takeconst char*to avoid misleading mutability and accidental writes to a read-only mapping.
bool parse_fully_trusted_raw_data(char* data, size_t size, const pal::string_t& context);
bool parse_fully_trusted_file(const pal::string_t& path);
json_parser_t()
: m_data(nullptr)
, m_bundle_location(nullptr) {}
~json_parser_t();
private:
char* m_data; // The memory mapped bytes of the file
size_t m_size; // Size of the mapped memory
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/native/corehost/json_parser.cpp:40
- In get_line_column_from_offset, the CRLF detection now checks
(i + 1) < offsetinstead of guarding against the end of the buffer. This can under-count newlines when the error offset points at the\nof a CRLF pair, and it’s not necessary for bounds safety (the real bound issize). Consider changing the condition to guard withsizeso CRLF is recognized when present while still avoiding an out-of-bounds read.
else if (data[i] == '\r' && (i + 1) < offset && data[i + 1] == '\n')
{
(*line)++;
*column = 1;
i++; // Discard carriage return
|
I will comment from browser/wasm perspective.
runtime/src/native/libs/Common/JavaScript/host/host.ts Lines 44 to 52 in 5caa14b runtime/src/mono/browser/runtime/startup.ts Lines 518 to 530 in 5caa14b In the past we avoided need for json parser in the VM, by converting into very simple binary file at compile time. But I still see the same idea on apple/android runtime/src/tasks/LibraryBuilder/LibraryBuilder.cs Lines 293 to 308 in 5caa14b In the future, we may still need this compile time trick for WASI, but just for the So my angle is, let's delete it, |
This parser should not be linked into coreclr binary, on any OS. Do you see it compiled into coreclr binary on browser? |
host and vm are the same binary for wasm, but I stand corrected, we are NOT building/linking it in the browser. runtime/src/native/corehost/CMakeLists.txt Lines 82 to 91 in 9d8d3e2 I don't know about other single-file hosts. |
|
Elinor requested offline that I run the startup perf benchmarks. Should get to that next week. Edit: No measurable change. 60 ~ 62 ms on my machine to launch emptycsconsoletemplate, regardless of whether the runtime comes from main or from this PR's branch. I also confirmed the perf scaffolding drops a runtimeconfig.json file alongside the executable, which I tweaked to target net11: {
"runtimeOptions": {
"tfm": "net11.0",
"framework": {
"name": "Microsoft.NETCore.App",
"version": "11.0.0"
},
"configProperties": {
"System.Reflection.Metadata.MetadataUpdater.IsSupported": false,
"System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization": false
}
}
} |
Since rapidjson is a disallowed deserializer within Microsoft, signing off on .NET 10 will require us to attest that every call site into rapidjson processes only fully trusted input.
This is straightforward enough for a point-in-time assessment, but it does represent ongoing cost, and we don't want to risk introducing new call sites into this logic where we can't guarantee that the input is trustworthy. To facilitate this, I recommend renaming our rapidjson wrapping utility methods to
parse_fully_trusted_raw_dataandparse_fully_trusted_file, which clearly indicate at the invocation site that the caller passes only fully trustworthy data. This should reduce the risk of us violating the trust contract going forward and should simplify future attestations.Additionally, we received internal bug reports of in situ parsing causing AVs when parsing improperly formatted JSON payloads. rapidjson's in situ parser requires a mutable null-terminated string; and since we use mmap files, we can't guarantee the presence of a null terminator. This means that if the file length happens to exactly match the page size of the underlying OS, and if the underlying JSON blob has an error (like missing the closing bracket), rapidjson will try to read into unmapped memory and crash the process rather than report an actionable error.
This is resolved by passing an explicit length to the parse routine. However, since
ParseInsitudoesn't provide an overload that takes an explicit length, we should fall back to normal parsing.Note to reviewers: If we really do want to enable in-situ parsing, we could always create our own
GenericInsituStringStream-like type which takes a length and callParseStream, passingkParseInsituFlagas a template arg. We can't useMemoryStreamdirectly since it's read-only.(Jeff, this is what I pinged you about via IM.)