Skip to content

refactor(regex): replace string manipulation with Pattern matching for improved readability#359

Merged
balazs-szucs merged 6 commits into
grimmory-tools:developfrom
balazs-szucs:minor-refactor
Apr 8, 2026
Merged

refactor(regex): replace string manipulation with Pattern matching for improved readability#359
balazs-szucs merged 6 commits into
grimmory-tools:developfrom
balazs-szucs:minor-refactor

Conversation

@balazs-szucs

@balazs-szucs balazs-szucs commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Description

Linked Issue: Fixes #

Changes

I can't read REGEX patterns very well, this helps me to understand code better. Mostly was done via static analyzer though.

Summary by CodeRabbit

Refactor

  • Centralized trailing-slash trimming for provider URLs to ensure consistent issuer/discovery handling.
  • Replaced repeatedly compiled regexes with precompiled patterns for filename matching, duplicate detection, audio track title parsing, and identifier extraction to improve performance and consistency.
  • Switched to a more appropriate collection construction for file-format filtering for steadier, slightly more efficient file processing.

@coderabbitai

coderabbitai Bot commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aac81524-72d1-491b-b114-41a540cf3e26

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1b14e and a62bd64.

📒 Files selected for processing (1)
  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test Suite / Backend Tests

📝 Walkthrough

Walkthrough

Replaced inline regex/string operations with shared utilities and precompiled regex patterns: added FileUtils.trimTrailingSlashes, introduced cached Pattern constants for normalization and ISBN extraction, and replaced a HashSet creation with EnumSet.copyOf for allowed-format filtering.

Changes

Cohort / File(s) Summary
OIDC / File util
booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java, booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java, booklore-api/src/main/java/org/booklore/util/FileUtils.java
Replaced inline trailing-slash trimming with FileUtils.trimTrailingSlashes(...); added the new public utility method. Inspect issuer/discovery URL normalization and null handling.
String / filename normalization
booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java, booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java
Added precompiled Pattern constants and replaced repeated replaceAll calls with Pattern.matcher(...).replaceAll(...) for filename/track-title normalization.
ISBN extraction
booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
Added NON_ISBN_CHAR_PATTERN and use it to strip non-ISBN chars, then .toUpperCase(); ensure ISBN10/13 extraction logic and 'X' handling preserved.
Collections optimization
booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java
Switched construction from new HashSet<>(allowedFormats) to EnumSet.copyOf(allowedFormats) when building allowed-format set; verify behavior for null/empty inputs remains unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

enhancement

Poem

🐇 I trimmed the tail of every slash,
Patted patterns warm in a cached little stash,
Enums lined up neat and spry,
Filenames hop and ISBNs fly,
Off to the library—a joyful dash!


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description contains the required template sections but lacks detail in the Changes section, providing only a brief personal note instead of a substantive summary of the refactoring changes. Expand the Changes section to detail the refactored files, the regex patterns that were extracted as constants, and the performance/readability benefits achieved by pre-compiling patterns.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title follows conventional commit format with type(scope): description and accurately reflects the main refactoring work to replace inline string manipulation with compiled Pattern matching.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java`:
- Line 40: The NON_ISBN_CHAR_PATTERN in PhysicalBookService currently only
preserves uppercase 'X' so lowercase 'x' gets stripped; update the pattern to
allow 'x' as well (for example use "[^0-9Xx]" or compile with CASE_INSENSITIVE)
and then ensure the ISBN extraction/normalization methods in the class (the ISBN
extraction/normalization logic used around the existing extraction methods)
convert the final check-digit to uppercase before length/check-digit validation
so both 'x' and 'X' are accepted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 18d9be79-4618-45a6-8b54-decb0bc32e16

📥 Commits

Reviewing files that changed from the base of the PR and between 77ff10a and 87fa8d8.

📒 Files selected for processing (6)
  • booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
  • booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java
  • booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
  • booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java
  • booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
  • booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Packaging Smoke Test
🧰 Additional context used
📓 Path-based instructions (1)
booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

  • booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
  • booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java
  • booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
  • booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
  • booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java
  • booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java
🧠 Learnings (2)
📚 Learning: 2026-03-24T18:46:47.249Z
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
📚 Learning: 2026-03-24T18:46:10.869Z
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:220-225
Timestamp: 2026-03-24T18:46:10.869Z
Learning: In the Hardcover GraphQL API, empty strings (`""`) are not valid ISBN values and querying with `_eq: ""` returns no results. Therefore, binding a missing ISBN to `""` in `_or` edition filters in HardcoverSyncService.java is safe and will not accidentally match unrelated editions or books.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
🪛 GitHub Check: CodeQL
booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java

[failure] 42-42: Polynomial regular expression used on uncontrolled data
This regular expression that depends on a user-provided value may run slow on strings with many repetitions of '/'.

🔇 Additional comments (6)
booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java (1)

23-24: Static analysis false positive: The regex /+$ is safe.

CodeQL flags "Polynomial regular expression used on uncontrolled data" but this is a false positive. The pattern /+$ is linear — it simply matches one or more trailing / characters at the end of the string. There are no nested quantifiers, alternation with overlap, or other constructs that could cause catastrophic backtracking (ReDoS).

The worst-case time complexity is O(n) where n is the number of trailing slashes, which is benign.

Also applies to: 42-42

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java (1)

44-44: Clean refactor using precompiled Pattern constant.

The introduction of TRAILING_SLASH_PATTERN improves readability and avoids repeated regex compilation. The pattern /+$ is linear and safe from catastrophic backtracking.

Note: Line 107 will throw NullPointerException if expectedIssuer is null, whereas Line 108 correctly handles a null issuer. Since this is a private method and issuerUri flows from configuration, this is likely acceptable, but consider adding a null guard for defensive consistency.

Also applies to: 107-108

booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java (1)

151-159: Good use of EnumSet.copyOf for enum-based filtering.

The change from HashSet to EnumSet is appropriate for BookFileType enums and provides better performance characteristics. The guard on line 152 (allowedFormats.isEmpty()) ensures we never call EnumSet.copyOf() with an empty collection, which would otherwise throw IllegalArgumentException.

booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java (1)

26-26: LGTM!

The precompiled SEPARATOR_PATTERN with [_-]+ correctly collapses consecutive underscores/hyphens into a single space. The implementation preserves existing behavior while improving readability.

Also applies to: 70-76

booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java (2)

30-32: Pattern constants look good.

The three patterns correctly handle the normalization pipeline: underscores/hyphens to spaces, strip non-alphanumeric, collapse whitespace. The escaping of hyphen (\\-) in UNDERSCORE_HYPHEN_PATTERN is unnecessary when hyphen is at the end of a character class, but it's harmless and doesn't affect correctness.


279-282: LGTM - Normalization chain correctly refactored.

The sequence (lowercase → replace separators → remove non-alphanumeric → collapse whitespace → trim) preserves the original behavior. The intermediate double spaces from single-char replacement (e.g., foo__barfoo bar) are correctly collapsed by WHITESPACE_PATTERN in the final step.

Comment thread booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java (1)

104-112: ⚠️ Potential issue | 🟡 Minor

Behavioral change in null handling for security-sensitive validation.

The original replaceAll("/+$", "") would throw NullPointerException if expectedIssuer was null. The new FileUtils.trimTrailingSlashes(expectedIssuer) returns "" for null input.

This means if expectedIssuer is null (misconfiguration) and issuer claim is also null or empty, validation would silently pass instead of failing. In security-sensitive code, fail-fast is preferable.

🛡️ Proposed fix to fail fast on null expectedIssuer
 private void validateIssuer(JWTClaimsSet claims, String expectedIssuer) {
+    if (expectedIssuer == null || expectedIssuer.isBlank()) {
+        throw ApiError.OIDC_INVALID_TOKEN.createException("Expected issuer not configured");
+    }
     String issuer = claims.getIssuer();
     String normalizedExpected = FileUtils.trimTrailingSlashes(expectedIssuer);
     String normalizedActual = issuer != null ? FileUtils.trimTrailingSlashes(issuer) : "";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java`
around lines 104 - 112, The validateIssuer method now silently accepts a null
expectedIssuer due to FileUtils.trimTrailingSlashes returning ""—make it fail
fast: in validateIssuer(JWTClaimsSet claims, String expectedIssuer) first check
if expectedIssuer is null or empty and throw an appropriate ApiError (e.g.,
ApiError.OIDC_INVALID_TOKEN.createException with a clear message like "ID token
expected issuer not configured") before calling FileUtils.trimTrailingSlashes;
then proceed to normalize issuer from claims and compare as currently
implemented so misconfiguration cannot pass validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java`:
- Around line 104-112: The validateIssuer method now silently accepts a null
expectedIssuer due to FileUtils.trimTrailingSlashes returning ""—make it fail
fast: in validateIssuer(JWTClaimsSet claims, String expectedIssuer) first check
if expectedIssuer is null or empty and throw an appropriate ApiError (e.g.,
ApiError.OIDC_INVALID_TOKEN.createException with a clear message like "ID token
expected issuer not configured") before calling FileUtils.trimTrailingSlashes;
then proceed to normalize issuer from claims and compare as currently
implemented so misconfiguration cannot pass validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2979412b-270d-4442-888c-7a2fa5d6ab17

📥 Commits

Reviewing files that changed from the base of the PR and between 87fa8d8 and 2e1b14e.

📒 Files selected for processing (4)
  • booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
  • booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
  • booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
  • booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Packaging Smoke Test
🧰 Additional context used
📓 Path-based instructions (1)
booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
  • booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
🧠 Learnings (1)
📚 Learning: 2026-03-26T01:46:48.863Z
Learnt from: CR
Repo: grimmory-tools/grimmory PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-26T01:46:48.863Z
Learning: Applies to booklore-api/src/**/*.java : Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce `Autowired` field injection

Applied to files:

  • booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
🔇 Additional comments (2)
booklore-api/src/main/java/org/booklore/util/FileUtils.java (1)

290-303: Implementation is correct and efficient.

The while-loop approach to trim trailing slashes is clear and avoids regex overhead. The null-safety returning "" is a reasonable choice for a utility method.

One minor organizational note: this method is used for URL/URI normalization (OIDC issuer comparison), which is semantically unrelated to file operations. Consider placing it in a StringUtils or UriUtils class if one exists, or creating one. This is purely optional.

[approve_code_changes, suggest_optional_refactor]

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java (1)

20-20: Import is appropriate for the refactor.

The import aligns with the PR objective of consolidating string normalization utilities.

@balazs-szucs balazs-szucs merged commit 60440ec into grimmory-tools:develop Apr 8, 2026
14 checks passed
zachyale pushed a commit to zachyale/grimmory that referenced this pull request Apr 17, 2026
…r improved readability (grimmory-tools#359)

* refactor(regex): replace string manipulation with Pattern matching for improved readability

* refactor(duplicate-detection): improve filename normalization using regex patterns for clarity

* refactor(oidc): replace regex trailing slash handling with utility method

* refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output
zachyale pushed a commit to zachyale/grimmory that referenced this pull request Apr 17, 2026
…r improved readability (grimmory-tools#359)

* refactor(regex): replace string manipulation with Pattern matching for improved readability

* refactor(duplicate-detection): improve filename normalization using regex patterns for clarity

* refactor(oidc): replace regex trailing slash handling with utility method

* refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output
zachyale pushed a commit that referenced this pull request Apr 17, 2026
…r improved readability (#359)

* refactor(regex): replace string manipulation with Pattern matching for improved readability

* refactor(duplicate-detection): improve filename normalization using regex patterns for clarity

* refactor(oidc): replace regex trailing slash handling with utility method

* refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output
zachyale pushed a commit that referenced this pull request Apr 22, 2026
…r improved readability (#359)

* refactor(regex): replace string manipulation with Pattern matching for improved readability

* refactor(duplicate-detection): improve filename normalization using regex patterns for clarity

* refactor(oidc): replace regex trailing slash handling with utility method

* refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output
dsmouse pushed a commit to dsmouse/grimmory that referenced this pull request May 11, 2026
…r improved readability (grimmory-tools#359)

* refactor(regex): replace string manipulation with Pattern matching for improved readability

* refactor(duplicate-detection): improve filename normalization using regex patterns for clarity

* refactor(oidc): replace regex trailing slash handling with utility method

* refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants