Fix skill frontmatter loading multi-codepoint UTF-8 graphemes at chunk boundary by roboticsdude60 · Pull Request #57466 · zed-industries/zed

roboticsdude60 · 2026-05-22T05:39:23Z

Self-Review Checklist:

I've reviewed my own diff for quality, security, and reliability
Unsafe blocks (if any) have justifying comments
The content is consistent with the UI/UX checklist
Tests cover the new/changed behavior
Performance impact has been considered and is acceptable

Release Notes:

Fixed utf-8 parsing issues when loading skill frontmatter with multi-codepoint graphemes (such as an emoji) crossing file chunk load boundaries.

Adds a test for `load_skill_frontmatter` which reproduces a bug when a multi-byte grapheme in the skill body crosses the 4096-byte chunk read boundary. As commited, the test fails with message `"SKILL.md is not valid UTF-8: incomplete utf-8 byte sequence from index 4095"`.

…anges the idea with extracting this constant is to keep the test relevant if the chunk size ever changes in the future.

rtfeldman

Thanks @roboticsdude60! 🎉

…7510) Follow-up to zed-industries#57466 simplifying `load_skill_frontmatter` rather than fixing a separate bug. zed-industries#57466 fixed the chunked-read loop so it truncates accumulated bytes at the frontmatter boundary, preventing `str::from_utf8` from failing on a multi-byte grapheme split across chunks. That fix is correct, but the chunked read it's working around isn't pulling its weight: `MAX_SKILL_FILE_SIZE` is 100KB and there's already a metadata pre-check, so "stop reading early once we've seen the closing `---`" saves at most ~25 pages per file while forcing the use of `open_sync`, a hand-rolled loop, and a long comment about `smol::unblock` vs the GPUI test scheduler. This PR: - replaces the chunked `open_sync` + read loop with a single `fs.load(...)` call (the same primitive `read_skill_body` already uses); - deletes `closing_delimiter_end`, `SKILL_READ_CHUNK_SIZE`, the `std::io::{self, Read}` import, and the `Parking forbidden` paragraph; - tightens the metadata pre-check so a metadata error (permissions, I/O, etc.) bails out instead of silently falling through to a blind read; - removes the now-obsolete `test_load_skill_frontmatter_with_emoji_at_chunk_boundary` test — by construction, a single full read can't split a UTF-8 sequence at any boundary. Closes AI-303 Release Notes: - N/A

…pick to preview) (#57831) Cherry-pick of #57510 to preview ---- Follow-up to #57466 simplifying `load_skill_frontmatter` rather than fixing a separate bug. the frontmatter boundary, preventing `str::from_utf8` from failing on a multi-byte grapheme split across chunks. That fix is correct, but the chunked read it's working around isn't pulling its weight: `MAX_SKILL_FILE_SIZE` is 100KB and there's already a metadata pre-check, so "stop reading early once we've seen the closing `---`" saves at most ~25 pages per file while forcing the use of `open_sync`, a hand-rolled loop, and a long comment about `smol::unblock` vs the GPUI test scheduler. This PR: - replaces the chunked `open_sync` + read loop with a single `fs.load(...)` call (the same primitive `read_skill_body` already uses); - deletes `closing_delimiter_end`, `SKILL_READ_CHUNK_SIZE`, the `std::io::{self, Read}` import, and the `Parking forbidden` paragraph; - tightens the metadata pre-check so a metadata error (permissions, I/O, etc.) bails out instead of silently falling through to a blind read; - removes the now-obsolete `test_load_skill_frontmatter_with_emoji_at_chunk_boundary` test — by construction, a single full read can't split a UTF-8 sequence at any boundary. Closes AI-303 Release Notes: - N/A

…k boundary (zed-industries#57466) Self-Review Checklist: - [X] I've reviewed my own diff for quality, security, and reliability - [X] Unsafe blocks (if any) have justifying comments - [X] The content is consistent with the [UI/UX checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist) - [X] Tests cover the new/changed behavior - [X] Performance impact has been considered and is acceptable Closes zed-industries#57463 Release Notes: - Fixed utf-8 parsing issues when loading skill frontmatter with multi-codepoint graphemes (such as an emoji) crossing file chunk load boundaries.

…7510) Follow-up to zed-industries#57466 simplifying `load_skill_frontmatter` rather than fixing a separate bug. zed-industries#57466 fixed the chunked-read loop so it truncates accumulated bytes at the frontmatter boundary, preventing `str::from_utf8` from failing on a multi-byte grapheme split across chunks. That fix is correct, but the chunked read it's working around isn't pulling its weight: `MAX_SKILL_FILE_SIZE` is 100KB and there's already a metadata pre-check, so "stop reading early once we've seen the closing `---`" saves at most ~25 pages per file while forcing the use of `open_sync`, a hand-rolled loop, and a long comment about `smol::unblock` vs the GPUI test scheduler. This PR: - replaces the chunked `open_sync` + read loop with a single `fs.load(...)` call (the same primitive `read_skill_body` already uses); - deletes `closing_delimiter_end`, `SKILL_READ_CHUNK_SIZE`, the `std::io::{self, Read}` import, and the `Parking forbidden` paragraph; - tightens the metadata pre-check so a metadata error (permissions, I/O, etc.) bails out instead of silently falling through to a blind read; - removes the now-obsolete `test_load_skill_frontmatter_with_emoji_at_chunk_boundary` test — by construction, a single full read can't split a UTF-8 sequence at any boundary. Closes AI-303 Release Notes: - N/A

roboticsdude60 added 3 commits May 21, 2026 21:41

Truncate accumulated bytes to frontmatter boundary after chunk read

810f06d

extract chunk size to constant, keeping test relevant if size ever ch…

c8153c0

…anges the idea with extracting this constant is to keep the test relevant if the chunk size ever changes in the future.

cla-bot Bot added the cla-signed The user has signed the Contributor License Agreement label May 22, 2026

zed-community-bot Bot added the first contribution the author's first pull request to Zed. NOTE: the label application is automated via github actions label May 22, 2026

roboticsdude60 changed the title ~~Fix skill chunk boundary utf8~~ Fix skill frontmatter loading multi-codepoint utf-8 graphemes at chunk boundary May 22, 2026

maxdeviant changed the title ~~Fix skill frontmatter loading multi-codepoint utf-8 graphemes at chunk boundary~~ Fix skill frontmatter loading multi-codepoint UTF-8 graphemes at chunk boundary May 22, 2026

rtfeldman enabled auto-merge May 22, 2026 15:15

rtfeldman approved these changes May 22, 2026

View reviewed changes

rtfeldman added this pull request to the merge queue May 22, 2026

Merged via the queue into zed-industries:main with commit d3e27ab May 22, 2026
46 checks passed

rtfeldman mentioned this pull request May 22, 2026

agent_skills: Read SKILL.md in one shot via fs.load #57510

Merged

rtfeldman mentioned this pull request May 27, 2026

agent_skills: Read SKILL.md in one shot via fs.load (#57510) (cherry-pick to preview) #57831

Merged

roboticsdude60 deleted the fix-skill-chunk-boundary-utf8 branch May 28, 2026 16:34

This was referenced Jun 3, 2026

Review upstream Zed v1.5.3 neozed-industries/neo-zed#41

Closed

Review upstream Zed v1.5.4 neozed-industries/neo-zed#42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix skill frontmatter loading multi-codepoint UTF-8 graphemes at chunk boundary #57466

Fix skill frontmatter loading multi-codepoint UTF-8 graphemes at chunk boundary #57466
rtfeldman merged 3 commits into
zed-industries:mainfrom
roboticsdude60:fix-skill-chunk-boundary-utf8

roboticsdude60 commented May 22, 2026

Uh oh!

rtfeldman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

roboticsdude60 commented May 22, 2026

Uh oh!

rtfeldman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants