refactor(regex): replace string manipulation with Pattern matching for improved readability by balazs-szucs · Pull Request #359 · grimmory-tools/grimmory

balazs-szucs · 2026-04-03T11:57:36Z

Description

Linked Issue: Fixes #

Changes

I can't read REGEX patterns very well, this helps me to understand code better. Mostly was done via static analyzer though.

Summary by CodeRabbit

Refactor

Centralized trailing-slash trimming for provider URLs to ensure consistent issuer/discovery handling.
Replaced repeatedly compiled regexes with precompiled patterns for filename matching, duplicate detection, audio track title parsing, and identifier extraction to improve performance and consistency.
Switched to a more appropriate collection construction for file-format filtering for steadier, slightly more efficient file processing.

…r improved readability

…egex patterns for clarity

coderabbitai · 2026-04-03T11:57:55Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aac81524-72d1-491b-b114-41a540cf3e26

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1b14e and a62bd64.

📒 Files selected for processing (1)

booklore-api/src/main/java/org/booklore/util/FileUtils.java

🚧 Files skipped from review as they are similar to previous changes (1)

booklore-api/src/main/java/org/booklore/util/FileUtils.java

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test Suite / Backend Tests

📝 Walkthrough

Walkthrough

Replaced inline regex/string operations with shared utilities and precompiled regex patterns: added FileUtils.trimTrailingSlashes, introduced cached Pattern constants for normalization and ISBN extraction, and replaced a HashSet creation with EnumSet.copyOf for allowed-format filtering.

Changes

Cohort / File(s)	Summary
OIDC / File util `booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java`, `booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java`, `booklore-api/src/main/java/org/booklore/util/FileUtils.java`	Replaced inline trailing-slash trimming with `FileUtils.trimTrailingSlashes(...)`; added the new public utility method. Inspect issuer/discovery URL normalization and null handling.
String / filename normalization `booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java`, `booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java`	Added precompiled `Pattern` constants and replaced repeated `replaceAll` calls with `Pattern.matcher(...).replaceAll(...)` for filename/track-title normalization.
ISBN extraction `booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java`	Added `NON_ISBN_CHAR_PATTERN` and use it to strip non-ISBN chars, then `.toUpperCase()`; ensure ISBN10/13 extraction logic and 'X' handling preserved.
Collections optimization `booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java`	Switched construction from `new HashSet<>(allowedFormats)` to `EnumSet.copyOf(allowedFormats)` when building allowed-format set; verify behavior for null/empty inputs remains unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

enhancement

Poem

🐇 I trimmed the tail of every slash,
Patted patterns warm in a cached little stash,
Enums lined up neat and spry,
Filenames hop and ISBNs fly,
Off to the library—a joyful dash!

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description contains the required template sections but lacks detail in the Changes section, providing only a brief personal note instead of a substantive summary of the refactoring changes.	Expand the Changes section to detail the refactored files, the regex patterns that were extracted as constants, and the performance/readability benefits achieved by pre-compiling patterns.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title follows conventional commit format with type(scope): description and accurately reflects the main refactoring work to replace inline string manipulation with compiled Pattern matching.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…thod

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java`:
- Line 40: The NON_ISBN_CHAR_PATTERN in PhysicalBookService currently only
preserves uppercase 'X' so lowercase 'x' gets stripped; update the pattern to
allow 'x' as well (for example use "[^0-9Xx]" or compile with CASE_INSENSITIVE)
and then ensure the ISBN extraction/normalization methods in the class (the ISBN
extraction/normalization logic used around the existing extraction methods)
convert the final check-digit to uppercase before length/check-digit validation
so both 'x' and 'X' are accepted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 18d9be79-4618-45a6-8b54-decb0bc32e16

📥 Commits

Reviewing files that changed from the base of the PR and between 77ff10a and 87fa8d8.

📒 Files selected for processing (6)

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java
booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java
booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Packaging Smoke Test

🧰 Additional context used

📓 Path-based instructions (1)

booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java
booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java
booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java

🧠 Learnings (2)

📚 Learning: 2026-03-24T18:46:47.249Z

Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java

📚 Learning: 2026-03-24T18:46:10.869Z

Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:220-225
Timestamp: 2026-03-24T18:46:10.869Z
Learning: In the Hardcover GraphQL API, empty strings (`""`) are not valid ISBN values and querying with `_eq: ""` returns no results. Therefore, binding a missing ISBN to `""` in `_or` edition filters in HardcoverSyncService.java is safe and will not accidentally match unrelated editions or books.

Applied to files:

booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java

🪛 GitHub Check: CodeQL

booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java

[failure] 42-42: Polynomial regular expression used on uncontrolled data
This regular expression that depends on a user-provided value may run slow on strings with many repetitions of '/'.

🔇 Additional comments (6)

booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java (1)

23-24: Static analysis false positive: The regex /+$ is safe.

CodeQL flags "Polynomial regular expression used on uncontrolled data" but this is a false positive. The pattern /+$ is linear — it simply matches one or more trailing / characters at the end of the string. There are no nested quantifiers, alternation with overlap, or other constructs that could cause catastrophic backtracking (ReDoS).

The worst-case time complexity is O(n) where n is the number of trailing slashes, which is benign.

Also applies to: 42-42

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java (1)

44-44: Clean refactor using precompiled Pattern constant.

The introduction of TRAILING_SLASH_PATTERN improves readability and avoids repeated regex compilation. The pattern /+$ is linear and safe from catastrophic backtracking.

Note: Line 107 will throw NullPointerException if expectedIssuer is null, whereas Line 108 correctly handles a null issuer. Since this is a private method and issuerUri flows from configuration, this is likely acceptable, but consider adding a null guard for defensive consistency.

Also applies to: 107-108

booklore-api/src/main/java/org/booklore/service/library/LibraryFileHelper.java (1)

151-159: Good use of EnumSet.copyOf for enum-based filtering.

The change from HashSet to EnumSet is appropriate for BookFileType enums and provides better performance characteristics. The guard on line 152 (allowedFormats.isEmpty()) ensures we never call EnumSet.copyOf() with an empty collection, which would otherwise throw IllegalArgumentException.

booklore-api/src/main/java/org/booklore/service/reader/AudioFileUtilityService.java (1)

26-26: LGTM!

The precompiled SEPARATOR_PATTERN with [_-]+ correctly collapses consecutive underscores/hyphens into a single space. The implementation preserves existing behavior while improving readability.

Also applies to: 70-76

booklore-api/src/main/java/org/booklore/service/book/DuplicateDetectionService.java (2)

30-32: Pattern constants look good.

The three patterns correctly handle the normalization pipeline: underscores/hyphens to spaces, strip non-alphanumeric, collapse whitespace. The escaping of hyphen (\\-) in UNDERSCORE_HYPHEN_PATTERN is unnecessary when hyphen is at the end of a character class, but it's harmless and doesn't affect correctness.

279-282: LGTM - Normalization chain correctly refactored.

The sequence (lowercase → replace separators → remove non-alphanumeric → collapse whitespace → trim) preserves the original behavior. The intermediate double spaces from single-char replacement (e.g., foo__bar → foo bar) are correctly collapsed by WHITESPACE_PATTERN in the final step.

…d output

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java (1)
104-112: ⚠️ Potential issue | 🟡 Minor

Behavioral change in null handling for security-sensitive validation.

The original replaceAll("/+$", "") would throw NullPointerException if expectedIssuer was null. The new FileUtils.trimTrailingSlashes(expectedIssuer) returns "" for null input.

This means if expectedIssuer is null (misconfiguration) and issuer claim is also null or empty, validation would silently pass instead of failing. In security-sensitive code, fail-fast is preferable.
🛡️ Proposed fix to fail fast on null expectedIssuer
 private void validateIssuer(JWTClaimsSet claims, String expectedIssuer) {
+    if (expectedIssuer == null || expectedIssuer.isBlank()) {
+        throw ApiError.OIDC_INVALID_TOKEN.createException("Expected issuer not configured");
+    }
     String issuer = claims.getIssuer();
     String normalizedExpected = FileUtils.trimTrailingSlashes(expectedIssuer);
     String normalizedActual = issuer != null ? FileUtils.trimTrailingSlashes(issuer) : "";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java`
around lines 104 - 112, The validateIssuer method now silently accepts a null
expectedIssuer due to FileUtils.trimTrailingSlashes returning ""—make it fail
fast: in validateIssuer(JWTClaimsSet claims, String expectedIssuer) first check
if expectedIssuer is null or empty and throw an appropriate ApiError (e.g.,
ApiError.OIDC_INVALID_TOKEN.createException with a clear message like "ID token
expected issuer not configured") before calling FileUtils.trimTrailingSlashes;
then proceed to normalize issuer from claims and compare as currently
implemented so misconfiguration cannot pass validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java`:
- Around line 104-112: The validateIssuer method now silently accepts a null
expectedIssuer due to FileUtils.trimTrailingSlashes returning ""—make it fail
fast: in validateIssuer(JWTClaimsSet claims, String expectedIssuer) first check
if expectedIssuer is null or empty and throw an appropriate ApiError (e.g.,
ApiError.OIDC_INVALID_TOKEN.createException with a clear message like "ID token
expected issuer not configured") before calling FileUtils.trimTrailingSlashes;
then proceed to normalize issuer from claims and compare as currently
implemented so misconfiguration cannot pass validation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2979412b-270d-4442-888c-7a2fa5d6ab17

📥 Commits

Reviewing files that changed from the base of the PR and between 87fa8d8 and 2e1b14e.

📒 Files selected for processing (4)

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java
booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java
booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
booklore-api/src/main/java/org/booklore/util/FileUtils.java

🚧 Files skipped from review as they are similar to previous changes (2)

booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java
booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Packaging Smoke Test

🧰 Additional context used

📓 Path-based instructions (1)

booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

booklore-api/src/main/java/org/booklore/util/FileUtils.java
booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java

🧠 Learnings (1)

📚 Learning: 2026-03-26T01:46:48.863Z

Learnt from: CR
Repo: grimmory-tools/grimmory PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-26T01:46:48.863Z
Learning: Applies to booklore-api/src/**/*.java : Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce `Autowired` field injection

Applied to files:

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java

🔇 Additional comments (2)

booklore-api/src/main/java/org/booklore/util/FileUtils.java (1)

290-303: Implementation is correct and efficient.

The while-loop approach to trim trailing slashes is clear and avoids regex overhead. The null-safety returning "" is a reasonable choice for a utility method.

One minor organizational note: this method is used for URL/URI normalization (OIDC issuer comparison), which is semantically unrelated to file operations. Consider placing it in a StringUtils or UriUtils class if one exists, or creating one. This is purely optional.

[approve_code_changes, suggest_optional_refactor]

booklore-api/src/main/java/org/booklore/config/security/oidc/OidcTokenValidator.java (1)

20-20: Import is appropriate for the refactor.

The import aligns with the PR objective of consolidating string normalization utilities.

…r improved readability (grimmory-tools#359) * refactor(regex): replace string manipulation with Pattern matching for improved readability * refactor(duplicate-detection): improve filename normalization using regex patterns for clarity * refactor(oidc): replace regex trailing slash handling with utility method * refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output

…r improved readability (#359) * refactor(regex): replace string manipulation with Pattern matching for improved readability * refactor(duplicate-detection): improve filename normalization using regex patterns for clarity * refactor(oidc): replace regex trailing slash handling with utility method * refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output

…r improved readability (grimmory-tools#359) * refactor(regex): replace string manipulation with Pattern matching for improved readability * refactor(duplicate-detection): improve filename normalization using regex patterns for clarity * refactor(oidc): replace regex trailing slash handling with utility method * refactor(isbn): enhance ISBN extraction by normalizing case in cleaned output

balazs-szucs and others added 3 commits April 3, 2026 13:54

refactor(regex): replace string manipulation with Pattern matching fo…

08d9422

…r improved readability

refactor(duplicate-detection): improve filename normalization using r…

ce556fc

…egex patterns for clarity

Merge branch 'develop' into minor-refactor

87fa8d8

github-advanced-security AI found potential problems Apr 3, 2026

View reviewed changes

Comment thread booklore-api/src/main/java/org/booklore/service/oidc/OidcDiagnosticService.java Fixed

coderabbitai Bot added backend chore labels Apr 3, 2026

refactor(oidc): replace regex trailing slash handling with utility me…

3edb3c3

…thod

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread booklore-api/src/main/java/org/booklore/service/book/PhysicalBookService.java Outdated

refactor(isbn): enhance ISBN extraction by normalizing case in cleane…

2e1b14e

…d output

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Merge branch 'develop' into minor-refactor

a62bd64

coderabbitai Bot added enhancement and removed backend chore labels Apr 3, 2026

balazs-szucs requested a review from zachyale April 3, 2026 20:30

zachyale approved these changes Apr 8, 2026

View reviewed changes

balazs-szucs merged commit 60440ec into grimmory-tools:develop Apr 8, 2026
14 checks passed

This was referenced Apr 27, 2026

refactor(metadata): avoid polynomial regexes in ComicvineBookParser #930

Merged

refactor(regex): replace string manipulation with regex patterns for improved clarity and performance #869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(regex): replace string manipulation with Pattern matching for improved readability#359

refactor(regex): replace string manipulation with Pattern matching for improved readability#359
balazs-szucs merged 6 commits into
grimmory-tools:developfrom
balazs-szucs:minor-refactor

balazs-szucs commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Pre-merge checks failed

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

balazs-szucs commented Apr 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Summary by CodeRabbit

Refactor

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Pre-merge checks failed

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

balazs-szucs commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading