Docs: add zh-CN entrypoint translations#6300
Conversation
scripts/docs-i18n/process.go
Outdated
| absPath, err := filepath.Abs(filePath) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if !strings.HasPrefix(absPath, docsRoot) { | ||
| return fmt.Errorf("file %s not under docs root %s", absPath, docsRoot) | ||
| } |
There was a problem hiding this comment.
[P1] strings.HasPrefix(absPath, docsRoot) is not a safe “is under directory” check: it will accept paths like /docs-root-evil/... when docsRoot is /docs-root, and can also behave unexpectedly with path separator differences. This can let inputs outside the docs root pass the guard (depending on how the tool is invoked). Prefer checking with filepath.Rel and ensuring the result doesn’t start with .. (and isn’t absolute).
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/docs-i18n/process.go
Line: 15:21
Comment:
[P1] `strings.HasPrefix(absPath, docsRoot)` is not a safe “is under directory” check: it will accept paths like `/docs-root-evil/...` when `docsRoot` is `/docs-root`, and can also behave unexpectedly with path separator differences. This can let inputs outside the docs root pass the guard (depending on how the tool is invoked). Prefer checking with `filepath.Rel` and ensuring the result doesn’t start with `..` (and isn’t absolute).
How can I resolve this? If you propose a fix, please make it concise.
scripts/docs-i18n/tm.go
Outdated
| scanner := bufio.NewScanner(file) | ||
| for scanner.Scan() { | ||
| line := strings.TrimSpace(scanner.Text()) | ||
| if line == "" { | ||
| continue | ||
| } | ||
| var entry TMEntry | ||
| if err := json.Unmarshal([]byte(line), &entry); err != nil { | ||
| return nil, fmt.Errorf("translation memory decode failed: %w", err) |
There was a problem hiding this comment.
[P1] bufio.Scanner has a default token limit (~64K). Since TM entries can include long text/translated fields, a larger line will cause scanner.Err() and make LoadTranslationMemory fail on real-world docs. Using a bufio.Reader (ReadString/ReadBytes) or increasing the scanner buffer would avoid hard failures on large segments.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/docs-i18n/tm.go
Line: 44:52
Comment:
[P1] `bufio.Scanner` has a default token limit (~64K). Since TM entries can include long `text`/`translated` fields, a larger line will cause `scanner.Err()` and make `LoadTranslationMemory` fail on real-world docs. Using a `bufio.Reader` (ReadString/ReadBytes) or increasing the scanner buffer would avoid hard failures on large segments.
How can I resolve this? If you propose a fix, please make it concise.
scripts/docs-i18n/html_translate.go
Outdated
| return stack[:len(stack)-1] | ||
| } | ||
| for i := len(stack) - 2; i >= 0; i-- { | ||
| if stack[i] == tag { | ||
| return stack[:i] | ||
| } | ||
| } |
There was a problem hiding this comment.
[P1] popTag returns stack[:i] when it finds a matching tag below the top; this drops the matching tag and everything above it, but also keeps any earlier unmatched tags. For nested/mismatched HTML, this can leave the stack in an inconsistent state and cause later text to be translated or skipped incorrectly (e.g., prematurely leaving a code/pre context). Consider trimming to stack[:i] vs stack[:i]/stack[:i+?] semantics carefully based on intended behavior.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/docs-i18n/html_translate.go
Line: 160:166
Comment:
[P1] `popTag` returns `stack[:i]` when it finds a matching tag below the top; this drops the matching tag *and* everything above it, but also keeps any earlier unmatched tags. For nested/mismatched HTML, this can leave the stack in an inconsistent state and cause later text to be translated or skipped incorrectly (e.g., prematurely leaving a `code`/`pre` context). Consider trimming to `stack[:i]` vs `stack[:i]`/`stack[:i+?]` semantics carefully based on intended behavior.
How can I resolve this? If you propose a fix, please make it concise.|
Greptile fixes are in
Let me know if you want me to re-run the translator outputs after these changes. |
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing (cherry picked from commit 0e0e395)
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing (cherry picked from commit 0e0e395)
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
* Docs: add zh-CN entrypoint translations * Docs: harden docs-i18n parsing
Summary
Testing
Greptile Overview
Greptile Summary
This PR adds a Go-based documentation i18n pipeline (
scripts/docs-i18n) that segments Markdown/HTML, preserves placeholders, consults a glossary + translation memory, and emits translated docs underdocs/zh-CN/. It also updatesdocs/docs.jsonto include a topbar link to the new zh-CN entrypoint pages and adds i18n asset files (glossary + TM).The overall approach fits the repo by keeping translation artifacts under
docs/.i18n/and generating localized markdown into a language-prefixed docs tree, while leaving existing English content intact.Confidence Score: 3/5
bufio.Scannerlimits, and HTML tag stack handling may lead to incorrect translate/skip decisions on mismatched HTML blocks.