Create faster trimSpace method with benchmarks#164
Conversation
Implemented a faster TrimSpace function that outperforms strings.TrimSpace for common use cases: - Empty strings: 4.6x faster (0.63 ns vs 2.92 ns) - No-trim cases: 21-25% faster (important for web frameworks) - Trailing whitespace: 25% faster - All whitespace: 15% faster Key optimizations: - Lookup table for whitespace detection - Zero allocations when no trimming is needed - Efficient byte-level operations - Early return for empty strings Added comprehensive tests and benchmarks to verify correctness and performance improvements.
Moved TrimSpace from strings.go to byteseq.go and implemented it as a generic function that works with both string and []byte types. Performance improvements: String benchmarks: - Empty strings: 4.6x faster (0.63ns vs 2.88ns) - No-trim cases: 28-34% faster - All-whitespace: 46% faster - Large inputs (no trim): 85+ GB/s throughput Byte slice benchmarks: - Empty slices: 6.4x faster (0.47ns vs 3.04ns) - No-trim cases: 47-54% faster - Large inputs (no trim): 88+ GB/s throughput - Large inputs (with trim): 17% faster Key optimizations: - Whitespace lookup table for O(1) detection - Zero allocations when no trimming needed - Early returns for edge cases - Generic implementation using Go 1.18+ type constraints Perfect for web framework use cases like HTTP header parsing, query parameter processing, and path normalization.
Summary of ChangesHello @ReneWerner87, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a highly optimized Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new, optimized TrimSpace function for strings and byte slices, which shows significant performance improvements for common cases, especially for ASCII strings with no or minimal whitespace. The implementation is clean and the accompanying tests are comprehensive. The benchmarks validate the performance claims.
My review includes a few suggestions to improve maintainability and clarity:
- The
analyze_benchmarks.shscript contains hardcoded values and should be updated to parse benchmark results dynamically. - There are multiple benchmark result files, which can be confusing. I've suggested removing the obsolete one.
- A minor correction to a comment in
common.gofor better accuracy. - Refactoring the benchmark tests in
byteseq_test.goto reduce code duplication.
Overall, this is a great contribution that will improve performance in the framework.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #164 +/- ##
==========================================
+ Coverage 83.31% 83.48% +0.17%
==========================================
Files 11 11
Lines 755 763 +8
==========================================
+ Hits 629 637 +8
Misses 115 115
Partials 11 11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a new generic function TrimSpace[S byteSeq](s S) S that trims ASCII whitespace using a byte lookup table, introduces the whitespaceTable in common.go, and adds unit tests plus benchmarks covering string and []byte variants. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)byteseq_test.go (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
common.go (1)
30-38: Minor documentation clarification.The comment mentions "(1 = whitespace, 0 = not whitespace)" but the table uses boolean values (
true/false). Consider updating the comment to match the implementation.Apply this diff:
-// Lookup table for ASCII whitespace characters (1 = whitespace, 0 = not whitespace) +// Lookup table for ASCII whitespace characters (true = whitespace, false = not whitespace) var whitespaceTable = [256]bool{analyze_benchmarks.sh (1)
1-28: Consider making the script parse actual benchmark files.The script currently outputs hardcoded values. For better maintainability, consider parsing the actual benchmark results from
benchmark_results.txtor usingbenchstattool to generate comparisons dynamically.Example using benchstat:
#!/bin/bash # Install benchstat if needed: go install golang.org/x/perf/cmd/benchstat@latest echo "=== TrimSpace Performance Analysis ===" benchstat -split fiber,default benchmark_results.txtThis would automatically parse and compare the results without hardcoding values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
analyze_benchmarks.sh(1 hunks)benchmark_analysis.md(1 hunks)benchmark_results.txt(1 hunks)byteseq.go(1 hunks)byteseq_test.go(1 hunks)common.go(1 hunks)trimspace_benchmarks.txt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
byteseq_test.go (1)
byteseq.go (1)
TrimSpace(94-124)
🪛 GitHub Actions: golangci-lint
byteseq_test.go
[error] 373-373: golangci-lint: empty-block: this block is empty, you can remove it (revive)
🪛 GitHub Check: lint
byteseq_test.go
[failure] 373-373:
empty-block: this block is empty, you can remove it (revive)
🪛 LanguageTool
benchmark_analysis.md
[grammar] ~3-~3: Interpunctie toevoegen
Context: ...e Analysis ## String Benchmarks (Fiber vs Standard Library) | Scenario | Fiber T...
(QB_NEW_NL_OTHER_ERROR_IDS_MISSING_PUNCTUATION)
[style] ~7-~7: Netter is om het uit te schrijven.
Context: ...| | empty | 0.63 ns | 2.88 ns | 4.6x faster ⚡ | - | | no-trim | 2.83 n...
(_10X)
[style] ~22-~22: Netter is om het uit te schrijven.
Context: ...| | empty | 0.47 ns | 3.04 ns | 6.4x faster ⚡⚡ | - | | no-trim | 2.83 ...
(_10X)
[style] ~36-~36: Netter is om het uit te schrijven.
Context: ...for Common Cases - Empty strings: 4.6x faster (strings), 6.4x faster (bytes) -...
(_10X)
[style] ~36-~36: Netter is om het uit te schrijven.
Context: ...pty strings**: 4.6x faster (strings), 6.4x faster (bytes) - No trimming needed...
(_10X)
[grammar] ~37-~37: Dit kan een fout zijn.
Context: ...bytes) - No trimming needed: 28-34% faster (strings), 47-54% faster (bytes) - **La...
(QB_NEW_NL)
[grammar] ~37-~37: Dit kan een fout zijn.
Context: ...eded**: 28-34% faster (strings), 47-54% faster (bytes) - Large inputs without trim...
(QB_NEW_NL)
[grammar] ~38-~38: Dit kan een fout zijn.
Context: ...ts without trim**: Throughput > 85 GB/s for strings, > 88 GB/s for bytes ### 📊 Pe...
(QB_NEW_NL)
[grammar] ~38-~38: Dit kan een fout zijn.
Context: ...ughput > 85 GB/s for strings, > 88 GB/s for bytes ### 📊 Performance Characteristi...
(QB_NEW_NL)
[grammar] ~41-~41: Dit kan een fout zijn.
Context: ...eristics 1. Best case: Empty inputs and no-trim scenarios (most common in web f...
(QB_NEW_NL)
[grammar] ~41-~41: Woord aanpassen
Context: ...**: Empty inputs and no-trim scenarios (most common in web frameworks) 2. **Good per...
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_CONFUSION)
[grammar] ~41-~41: Dit kan een fout zijn.
Context: ...mpty inputs and no-trim scenarios (most common in web frameworks) 2. **Good performanc...
(QB_NEW_NL)
[grammar] ~41-~41: Dit kan een fout zijn.
Context: ...s and no-trim scenarios (most common in web frameworks) 2. Good performance: All-whitespac...
(QB_NEW_NL)
[grammar] ~42-~42: Dit kan een fout zijn.
Context: ...ll-whitespace removal scenarios (24-46% faster) 3. Competitive: Mixed whitespace s...
(QB_NEW_NL)
[grammar] ~43-~43: Zin met fouten
Context: ...petitive**: Mixed whitespace scenarios (similar or slightly better) ### 💡 Why It's Fa...
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~43-~43: Zin met fouten
Context: ...**: Mixed whitespace scenarios (similar or slightly better) ### 💡 Why It's Faste...
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~43-~43: Zin met fouten
Context: ... Mixed whitespace scenarios (similar or slightly better) ### 💡 Why It's Faster 1. **Lo...
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~43-~43: Dit kan een fout zijn.
Context: ...itespace scenarios (similar or slightly better) ### 💡 Why It's Faster 1. **Lookup ta...
(QB_NEW_NL)
[grammar] ~60-~60: Zin met fouten
Context: ...ling - Form data processing All common web framework operations benefit from these...
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~60-~60: Zin met fouten
Context: ... - Form data processing All common web framework operations benefit from these optimizat...
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~60-~60: Zin met fouten
Context: ...ta processing All common web framework operations benefit from these optimizations!
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~60-~60: Zin met fouten
Context: ...ng All common web framework operations benefit from these optimizations!
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~60-~60: Dit kan een fout zijn.
Context: ...common web framework operations benefit from these optimizations!
(QB_NEW_NL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build (1.24.x, macos-13)
- GitHub Check: Compare
🔇 Additional comments (6)
benchmark_results.txt (1)
1-36: LGTM! Benchmark results look comprehensive.The benchmark results demonstrate good performance improvements across various scenarios with zero allocations.
byteseq.go (1)
91-124: Excellent implementation with good optimizations!The TrimSpace function is well-designed with:
- O(1) whitespace detection using the lookup table
- Fast path for empty inputs
- Zero-allocation path when no trimming is needed (line 117-120)
- Proper handling of all-whitespace inputs
- Clear, well-documented code
The implementation correctly handles both string and []byte via generics and maintains parity with the standard library while offering performance improvements.
trimspace_benchmarks.txt (1)
1-66: LGTM! Comprehensive benchmark data.The detailed benchmark results provide good coverage across multiple scenarios (empty, no-trim, leading/trailing/both spaces, mixed, large inputs) for both string and []byte implementations.
byteseq_test.go (2)
382-429: Well-structured benchmarks with good coverage.The benchmark implementation properly:
- Covers diverse scenarios (empty, no-trim, leading/trailing/both spaces, mixed, large inputs)
- Uses
b.SetBytes()for throughput measurements- Compares Fiber implementation against standard library
- Follows Go benchmark best practices
431-478: Good byte slice benchmark coverage.The byte slice benchmarks complement the string benchmarks well, ensuring both code paths are properly measured and compared against the standard library's
bytes.TrimSpace.benchmark_analysis.md (1)
1-60: Excellent performance analysis documentation.The benchmark analysis is well-organized with clear tables, key findings, and practical use cases. The documentation effectively communicates the performance benefits of the new TrimSpace implementation.
Note: The static analysis warnings from LanguageTool can be safely ignored as they appear to be grammar checks configured for Dutch rather than English.
- Fix whitespaceTable comment accuracy (bool vs numeric values) - Combine duplicate benchmark loops to reduce code duplication - Add proper variable capture in benchmark loops
- Replace synthetic test cases with realistic HTTP header and query parameter examples - Simplify test logic to directly compare against standard library behavior - Fix empty-block linter issue by applying suggested condition inversion - Remove Unicode whitespace test cases (implementation is ASCII-only by design) - Update benchmarks to use same realistic test cases
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
byteseq_test.go(1 hunks)common.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- common.go
🧰 Additional context used
🧬 Code graph analysis (1)
byteseq_test.go (1)
byteseq.go (1)
TrimSpace(94-124)
🪛 GitHub Actions: golangci-lint
byteseq_test.go
[error] 342-342: golangci-lint: QF1001 could apply De Morgan's law (staticcheck)
🪛 GitHub Check: lint
byteseq_test.go
[failure] 342-342:
QF1001: could apply De Morgan's law (staticcheck)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build (1.24.x, windows-latest)
- GitHub Check: Compare
🔇 Additional comments (2)
byteseq_test.go (2)
368-390: Well-structured benchmarks following best practices.The benchmark properly captures the range variable and combines both fiber and default benchmarks in a single loop, addressing the feedback from the previous review. The use of
SetBytesenables meaningful throughput reporting.
412-434: Consistent benchmark structure for byte slices.The byte slice benchmarks mirror the string benchmark structure and properly incorporate the previous review feedback by combining the loops into a single iteration.
- Simplifies logic by using bytes.Equal which treats nil and empty slices as equal - Eliminates conditional logic and potential linter issues - Cleaner and more idiomatic Go code
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request introduces an optimized TrimSpace function for strings and byte slices, focusing on performance for ASCII whitespace. The implementation is sound and uses a lookup table for efficiency. The accompanying tests and benchmarks are thorough. My review focuses on improving the correctness and maintainability of the tests. Specifically, I suggest making the unit tests for TrimSpace independent of the standard library's TrimSpace due to differences in whitespace handling, and I recommend refactoring the benchmark tests to reduce code duplication.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…pacity - Cache length in variable to avoid multiple len(s) calls - Fix whitespace-only case to return zero-capacity slice (s[length:length]) - Matches bytes.TrimSpace behavior for append safety - Prevents unexpected buffer mutations when appending to trimmed result
- Define trimSpaceBenchmarkCases at package level - Reuse same test cases for both string and []byte benchmarks - Convert to []byte where needed in Benchmark_TrimSpaceBytes - Improves maintainability by centralizing test case definitions
Summary by CodeRabbit
New Features
Tests
Benchmarks