Skip to content

git: Compress diff for commit message generation#42835

Merged
osyvokon merged 6 commits intozed-industries:mainfrom
11happy:commit_message
Nov 20, 2025
Merged

git: Compress diff for commit message generation#42835
osyvokon merged 6 commits intozed-industries:mainfrom
11happy:commit_message

Conversation

@11happy
Copy link
Contributor

@11happy 11happy commented Nov 16, 2025

This PR compresses diff capped at 20000 bytes by:

  • Truncation of all lines to 256 chars
  • Iteratively removing last hunks from each file until size <= 20000 bytes.

Closes #34486

Release Notes:

  • Improved: Compress large diffs for commit message generation (thanks @11happy)

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Nov 16, 2025
@maxdeviant maxdeviant changed the title git: diff compressed efficiently for commit message generation git: Compress diff for commit message generation Nov 16, 2025
@cole-miller cole-miller assigned osyvokon and unassigned cole-miller Nov 17, 2025
11happy and others added 3 commits November 18, 2025 20:55
Signed-off-by: 11happy <soni5happy@gmail.com>

feat: implement diff compress

Signed-off-by: 11happy <soni5happy@gmail.com>
Diffy lines already include the newline character, so calls like
`.join("\n")` result in doubled newlines, breaking the patch correctness
and making `compress_commit_diff` return an empty string.
@osyvokon
Copy link
Contributor

Thanks for implementing this!

I've added some tests, and they didn't pass initially. Turns out, diffy lines already contain newlines character in them, so things like .join("\n") and writeln!() would add extra newlines, breaking the patch. I've made a fix, the tests are green now.

It's too bad that diffy doesn't support mutating hunks. This makes the code overly complex and less efficient, since we're reformatting the patch on every calculate_size() call. I know I was the one who pointed you to a reference implementation that uses diffy, but now I think it's not a good fit.

I think we can get rid of the dependency on diffy. Instead, let's just do simple diff string parsing. Essentially, we need two levels of parsing:

  • Split by file (we already do this)
  • Split by hunk headers (which always start with @@ )

Then we can keep a list of hunks as a Vec[String]. This way we don't have to reconstruct the patch, and computing lengths becomes much easier.

Are you willing to implement this change or should I go ahead? We can also pair on it, if you like.

@11happy
Copy link
Contributor Author

11happy commented Nov 18, 2025

Are you willing to implement this change or should I go ahead? We can also pair on it, if you like.

Yes, I am willing to implement this, I would request for a pair session if I face any difficulties.
Thank you for your time : )

Signed-off-by: 11happy <soni5happy@gmail.com>
Signed-off-by: 11happy <soni5happy@gmail.com>
@11happy
Copy link
Contributor Author

11happy commented Nov 20, 2025

@osyvokon would appreciate your review!
Thank you : )

@osyvokon
Copy link
Contributor

Looks much better now, thanks! I've made a few nit renames while reviewing this. Ready to merge.

@osyvokon osyvokon enabled auto-merge (squash) November 20, 2025 11:23
@osyvokon osyvokon merged commit 9094eb8 into zed-industries:main Nov 20, 2025
24 checks passed
mikayla-maki pushed a commit that referenced this pull request Nov 20, 2025
This PR compresses diff capped at 20000 bytes by:
- Truncation of all lines to 256 chars
- Iteratively removing last hunks from each file until size <= 20000
bytes.


Closes #34486

Release Notes:

- Improved: Compress large diffs for commit message generation (thanks
@11happy)

---------

Signed-off-by: 11happy <soni5happy@gmail.com>
Co-authored-by: Oleksiy Syvokon <oleksiy@zed.dev>
@JosephTLyons JosephTLyons moved this to 🚢 Shipped by Community in Git board Nov 26, 2025
11happy added a commit to 11happy/zed that referenced this pull request Dec 1, 2025
This PR compresses diff capped at 20000 bytes by:
- Truncation of all lines to 256 chars
- Iteratively removing last hunks from each file until size <= 20000
bytes.


Closes zed-industries#34486

Release Notes:

- Improved: Compress large diffs for commit message generation (thanks
@11happy)

---------

Signed-off-by: 11happy <soni5happy@gmail.com>
Co-authored-by: Oleksiy Syvokon <oleksiy@zed.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement

Projects

Status: 🚢 Shipped by Community

Development

Successfully merging this pull request may close these issues.

Generate commit message exceeding token limit

3 participants