Fix skill frontmatter loading multi-codepoint UTF-8 graphemes at chunk boundary #57466
Merged
rtfeldman merged 3 commits intoMay 22, 2026
Merged
Conversation
Adds a test for `load_skill_frontmatter` which reproduces a bug when a multi-byte grapheme in the skill body crosses the 4096-byte chunk read boundary. As commited, the test fails with message `"SKILL.md is not valid UTF-8: incomplete utf-8 byte sequence from index 4095"`.
…anges the idea with extracting this constant is to keep the test relevant if the chunk size ever changes in the future.
rtfeldman
approved these changes
May 22, 2026
rtfeldman
left a comment
Contributor
There was a problem hiding this comment.
Thanks @roboticsdude60! 🎉
pull Bot
pushed a commit
to kp-forks/zed
that referenced
this pull request
May 27, 2026
…7510) Follow-up to zed-industries#57466 simplifying `load_skill_frontmatter` rather than fixing a separate bug. zed-industries#57466 fixed the chunked-read loop so it truncates accumulated bytes at the frontmatter boundary, preventing `str::from_utf8` from failing on a multi-byte grapheme split across chunks. That fix is correct, but the chunked read it's working around isn't pulling its weight: `MAX_SKILL_FILE_SIZE` is 100KB and there's already a metadata pre-check, so "stop reading early once we've seen the closing `---`" saves at most ~25 pages per file while forcing the use of `open_sync`, a hand-rolled loop, and a long comment about `smol::unblock` vs the GPUI test scheduler. This PR: - replaces the chunked `open_sync` + read loop with a single `fs.load(...)` call (the same primitive `read_skill_body` already uses); - deletes `closing_delimiter_end`, `SKILL_READ_CHUNK_SIZE`, the `std::io::{self, Read}` import, and the `Parking forbidden` paragraph; - tightens the metadata pre-check so a metadata error (permissions, I/O, etc.) bails out instead of silently falling through to a blind read; - removes the now-obsolete `test_load_skill_frontmatter_with_emoji_at_chunk_boundary` test — by construction, a single full read can't split a UTF-8 sequence at any boundary. Closes AI-303 Release Notes: - N/A
MartinYe1234
pushed a commit
that referenced
this pull request
May 27, 2026
…pick to preview) (#57831) Cherry-pick of #57510 to preview ---- Follow-up to #57466 simplifying `load_skill_frontmatter` rather than fixing a separate bug. the frontmatter boundary, preventing `str::from_utf8` from failing on a multi-byte grapheme split across chunks. That fix is correct, but the chunked read it's working around isn't pulling its weight: `MAX_SKILL_FILE_SIZE` is 100KB and there's already a metadata pre-check, so "stop reading early once we've seen the closing `---`" saves at most ~25 pages per file while forcing the use of `open_sync`, a hand-rolled loop, and a long comment about `smol::unblock` vs the GPUI test scheduler. This PR: - replaces the chunked `open_sync` + read loop with a single `fs.load(...)` call (the same primitive `read_skill_body` already uses); - deletes `closing_delimiter_end`, `SKILL_READ_CHUNK_SIZE`, the `std::io::{self, Read}` import, and the `Parking forbidden` paragraph; - tightens the metadata pre-check so a metadata error (permissions, I/O, etc.) bails out instead of silently falling through to a blind read; - removes the now-obsolete `test_load_skill_frontmatter_with_emoji_at_chunk_boundary` test — by construction, a single full read can't split a UTF-8 sequence at any boundary. Closes AI-303 Release Notes: - N/A
TomPlanche
pushed a commit
to TomPlanche/zed
that referenced
this pull request
Jun 2, 2026
…k boundary (zed-industries#57466) Self-Review Checklist: - [X] I've reviewed my own diff for quality, security, and reliability - [X] Unsafe blocks (if any) have justifying comments - [X] The content is consistent with the [UI/UX checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist) - [X] Tests cover the new/changed behavior - [X] Performance impact has been considered and is acceptable Closes zed-industries#57463 Release Notes: - Fixed utf-8 parsing issues when loading skill frontmatter with multi-codepoint graphemes (such as an emoji) crossing file chunk load boundaries.
TomPlanche
pushed a commit
to TomPlanche/zed
that referenced
this pull request
Jun 2, 2026
…7510) Follow-up to zed-industries#57466 simplifying `load_skill_frontmatter` rather than fixing a separate bug. zed-industries#57466 fixed the chunked-read loop so it truncates accumulated bytes at the frontmatter boundary, preventing `str::from_utf8` from failing on a multi-byte grapheme split across chunks. That fix is correct, but the chunked read it's working around isn't pulling its weight: `MAX_SKILL_FILE_SIZE` is 100KB and there's already a metadata pre-check, so "stop reading early once we've seen the closing `---`" saves at most ~25 pages per file while forcing the use of `open_sync`, a hand-rolled loop, and a long comment about `smol::unblock` vs the GPUI test scheduler. This PR: - replaces the chunked `open_sync` + read loop with a single `fs.load(...)` call (the same primitive `read_skill_body` already uses); - deletes `closing_delimiter_end`, `SKILL_READ_CHUNK_SIZE`, the `std::io::{self, Read}` import, and the `Parking forbidden` paragraph; - tightens the metadata pre-check so a metadata error (permissions, I/O, etc.) bails out instead of silently falling through to a blind read; - removes the now-obsolete `test_load_skill_frontmatter_with_emoji_at_chunk_boundary` test — by construction, a single full read can't split a UTF-8 sequence at any boundary. Closes AI-303 Release Notes: - N/A
This was referenced Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Self-Review Checklist:
Closes #57463
Release Notes: