chore(deps): update chroma, regexp2 v2, replace dimiro1/reply#37858
Conversation
chroma v2.25.0 requires regexp2/v2, so move our single regexp2 call site (models/issues/pull.go) to the /v2 import and drop the renovate pin that held regexp2 on v1 (linux/386 build issue, fixed in v2). regexp2 v1 stays as an indirect dep via github.com/dimiro1/reply. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Drop the unmaintained github.com/dimiro1/reply, the last consumer of regexp2 v1 in our own code, and strip incoming-mail reply quotes, signatures and forwarded headers with a small stdlib-regexp parser (no lookahead, RE2-only). Covers the common mail-client formats and languages; bottom posting and forwarded bodies are out of scope. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates Go dependencies to newer major versions and removes an unmaintained incoming-mail reply parsing dependency by replacing it with an internal, stdlib-only reply extractor.
Changes:
- Bump
github.com/alecthomas/chroma/v2tov2.25.0and migrategithub.com/dlclark/regexp2usage to the/v2module path. - Remove
github.com/dimiro1/replyand introduce an internal plain-text email reply extractor for inbound mail processing. - Update Renovate constraints and dependency metadata files (
go.mod,go.sum,assets/go-licenses.json) accordingly.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| services/mailer/incoming/reply.go | Adds internal reply/signature/forward-header stripping logic using stdlib regexes. |
| services/mailer/incoming/incoming.go | Switches inbound mail body processing from reply.FromText to extractReply. |
| services/mailer/incoming/incoming_test.go | Adds unit tests covering many common reply parsing formats/localizations. |
| renovate.json5 | Removes the regexp2 v1 pin now that /v2 is used. |
| models/issues/pull.go | Updates the regexp2 import to github.com/dlclark/regexp2/v2. |
| go.mod | Updates direct requirements (chroma, regexp2/v2) and removes dimiro1/reply. |
| go.sum | Updates checksums for the bumped/removed modules. |
| assets/go-licenses.json | Updates recorded license entries to reflect dependency changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
licensing wise LGTM. Have not looked at logic yet, just wanted to check on potentially missing attribution but I see no similarities to the library so we're good I think. |
|
It's partially based on dimiro1/reply, with various improvements done. Could add a small attribution. |
Trim leading whitespace in headerBlock so indented forwarded header blocks are detected, consistent with the other boundary checks. Note in the doc comment that extractReply is based on dimiro1/reply (MIT). Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
Maybe it's better to fork and maintain https://github.com/dimiro1/reply |
There exist a few similar modules, but none are comprehensive and all have their own class of bugs. I think it's best to maintain it in-repo for now, at least I wouldn't commit to maintaining such a lib. |
Address review feedback: - compile the reply patterns via sync.OnceValue so they are built on first use rather than at startup - normalize line endings with util.NormalizeEOL - recognize common CJK mobile signatures, kept specific enough to avoid matching ordinary prose Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Signed-off-by: wxiaoguang <wxiaoguang@gmail.com>
Detect forwarded-mail header blocks via a data-driven marker list spanning more locales (incl. Chinese/Japanese/Korean and the CJK fullwidth colon), replacing the header regexes. Inline the single-use boundary helper so the patterns are fetched once per call. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
some more cjk fixes in 9bbc5a8. |
|
Worth noting this does not pass all upstream tests. |
Thought it did, will check again, maybe the last change did something to that. |
|
Sorry the conflict caused by the import path changed. |
* origin/main: chore: remove mssql `x509negativeserial` workaround (go-gitea#37853) [skip ci] Updated translations via Crowdin
Align the new file with the repo-wide module rename merged from main. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Currently it's passing 48/60 of those tests, the remaining 12 cases are not possible to fix without introducing regressions. I will check what can be improved but full coverage is neither desireable not beneficial here. |
Recognize "Name <email> wrote/schrieb …" attribution lines (the email must immediately precede the verb, so prose like "Bob <b@x> and he wrote" is untouched) and strip the trailing "ᐧ" marker some mobile clients add. Extract the verb list to a constant. Adds regression tests including a guard that an email mention in prose is preserved. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Recognize the verb that ends a quoted-history attribution in Chinese (写道/寫道), Japanese (書きました) and Korean (작성), matched without a preceding word-separator since CJK scripts are unspaced. Adds tests. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
- attribution lead-word branch now requires a day number or weekday, so prose like "On the 2024 roadmap … at 10:00" is not mistaken for an attribution and cut - CJK mobile signatures now require a device name, so "发自我的内心" / "会議から送信" / "회사에서 보냄" are kept - simplify: drop the single-use quote regex (use strings.HasPrefix) and the per-line double TrimSpace in header detection Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
dimiro1 suite is now at 51/60, this is as far as it can go without regressions. |
* origin/main: [skip ci] Updated translations via Crowdin chore: Move gitea sdk from code.gitea.io/sdk/gitea -> gitea.dev/sdk (go-gitea#37855) chore(deps): update `chroma`, `regexp2` v2, replace `dimiro1/reply` (go-gitea#37858) chore: clarify SSH clone URL related config options (go-gitea#37877) chore: remove mssql `x509negativeserial` workaround (go-gitea#37853) [skip ci] Updated translations via Crowdin chore: Move import path from code.gitea.io/gitea to gitea.dev (go-gitea#37873) # Conflicts: # renovate.json5
github.com/alecthomas/chroma/v2tov2.25.0.github.com/dlclark/regexp2to/v2(incorporates chore(deps): updateregexp2to v2 #37664); drop the renovate pin.github.com/dimiro1/reply(the last consumer ofregexp2v1 in our own code) with a small built-in reply parser for incoming mail.github.com/dlclark/regexp2 v1 is gone from the binary with this.
This PR was written with the help of Claude Opus 4.7