refactor(regex): replace string manipulation with regex patterns for improved clarity and performance by balazs-szucs · Pull Request #869 · grimmory-tools/grimmory

balazs-szucs · 2026-04-25T16:04:11Z

Description

Linked Issue: Fixes #

Changes

Summary by CodeRabbit

Refactor
- Centralized and reused regular-expression handling across backend features (CORS/origin handling, auth origin resolution, filenames/ZIP sanitization, href normalization, metadata extraction/parsing/writing, audiobook/audio parsing, EPUB CFI cleanup).
- No user-facing behavior changes; improves performance, consistency and maintainability.
Tests
- Updated internal test regex handling to match the refactor.

…improved clarity and performance

coderabbitai · 2026-04-25T16:04:24Z

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f6e1356-96ff-4ab1-9021-71bda7ced036

📥 Commits

Reviewing files that changed from the base of the PR and between 2ce740a and d61ea8f.

📒 Files selected for processing (1)

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Suite / Frontend Tests
GitHub Check: Test Suite / Backend Tests
GitHub Check: Analyze (java-kotlin)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Frontend Lint Threshold Check

🧰 Additional context used

📓 Path-based instructions (3)

backend/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

backend/src/**/*.java: Use 4-space indentation and match surrounding Java style in backend code
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection in backend code
Use MapStruct for entity/DTO mapping in backend code
Keep JPA entities on the *Entity suffix in backend code

Files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

**/*

⚙️ CodeRabbit configuration file

**/*: This project is being developed using current and future-facing technologies:

Java 25 with --enable-preview (preview features are INTENTIONAL and encouraged)

Spring Boot 4 (latest major version, check APIs accordingly)

Jackson 3 (new package: tools.jackson.* instead of com.fasterxml.jackson.*)

Hibernate 7.3.x (Jakarta Persistence 3.2, new APIs; avoid deprecated Hibernate 5/6 patterns)

Angular 21 (signals-based reactivity, no NgModules unless legacy)

Grimmory Internal Tools

epub4j and pdfium4j are our own internal tools developed by the Grimmory team.

Always verify behavior and API changes against the upstream repositories:

epub4j: https://github.com/grimmory-tools/epub4j

pdfium4j: https://github.com/grimmory-tools/pdfium4j

If you encounter issues with these libraries, check if a fix exists in the upstream grimmory-tools organization.

Metadata Standards and Compliance

For all metadata writing and parsing logic, double-check against Dublin Core and ANSI standards to ensure perfect official compliance.

We strictly follow the widespread and official XML-compliant methods for EPUB2, EPUB3, CBX, and PDF formats.

General Java and Spring rules

ALWAYS prefer modern, idiomatic Java 25 constructs over legacy patterns.

Preview features (--enable-preview) are enabled and intentional; do NOT flag them as risky unless there is a concrete runtime issue.

Prefer: records, sealed classes/interfaces, pattern matching (switch expressions, instanceof), structured concurrency (StructuredTaskScope), scoped values, string templates, unnamed patterns/variables.

Prefer virtual threads (Thread.ofVirtual(), Executors.newVirtualThreadPerTaskExecutor()) over platform threads for I/O-bound work.

Prefer the new Sequenced Collections API (SequencedCollection, SequencedMap) where applicable.

Prefer var for local variables when the type is obvious from context.

Use stream().toList() instead of stream().collect(Collectors.toList()) for imm...

Files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

**/service/**/*.java

⚙️ CodeRabbit configuration file

**/service/**/*.java: Spring Framework 7 service layer review:

Flag missing @Transactional on methods that perform multiple writes.

Prefer constructor injection over @Autowired field injection. Use @AllArgsConstructor.

Use ApiError enum for throwing exceptions.

Flag checked exceptions swallowed silently; must log or rethrow.

Flag Thread.sleep(); prefer Duration-based overloads or ScheduledExecutorService.

Prefer virtual threads (Thread.ofVirtual()) for I/O-bound operations.

Flag mutable shared state in singleton beans.

Files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

🧠 Learnings (4)

📚 Learning: 2026-04-22T01:56:39.495Z

Learnt from: CR
Repo: grimmory-tools/grimmory PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-22T01:56:39.495Z
Learning: Applies to backend/src/**/*.java : Use 4-space indentation and match surrounding Java style in backend code

Applied to files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

📚 Learning: 2026-04-14T22:11:02.436Z

Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 513
File: booklore-api/src/main/java/org/booklore/service/reader/AudiobookReaderService.java:38-39
Timestamp: 2026-04-14T22:11:02.436Z
Learning: In `grimmory-tools/grimmory`, for the `AudiobookReaderService` pattern of wrapping a DB lookup + file I/O under a single `Transactional` boundary, the preferred refactoring approach is **Option B**: extract the `BookFileEntity` lookup into a dedicated `Service` bean (e.g., `AudiobookEntityLoader`) annotated with `Transactional(readOnly = true)`, so that the proxy always narrows the transaction to the DB access only and all filesystem I/O runs outside the persistence context. Do NOT use `TransactionTemplate` (Option A) for this pattern in this codebase.

Applied to files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

📚 Learning: 2026-04-10T08:15:37.436Z

Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

📚 Learning: 2026-04-14T12:43:08.698Z

Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 502
File: booklore-api/src/main/java/org/booklore/service/reader/ChapterCacheService.java:0-0
Timestamp: 2026-04-14T12:43:08.698Z
Learning: For this codebase (booklore-api), target Java 25 with `--enable-preview`, so `_` is intentionally used as an unnamed/ignored variable (e.g., lambda parameter or pattern variable) per Java’s preview feature JEP 456. Do not flag `_` in those contexts as an invalid/reserved identifier; only flag it if it’s used in a non-supported position (e.g., where an unnamed variable is not applicable for the Java preview rules).

Applied to files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

🔇 Additional comments (2)

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java (2)

36-39: Field indentation inconsistency already flagged.

The indentation mismatch between lines 36-37 (4 spaces) and lines 38-39 (2 spaces) has already been raised by previous reviewers.

388-388: Effective refactoring with measurable performance benefit.

The change from inline String.replaceAll("[^0-9]", "") to the precompiled NON_DIGIT_PATTERN.matcher(channels).replaceAll("") is correct and beneficial. Since parseChannels is called within the track-processing loop (line 160) for folder-based audiobooks, precompiling the pattern avoids repeated regex compilation when processing multiple audio files.

Walkthrough

Replaced numerous inline regular-expression usages with precompiled private static java.util.regex.Pattern constants across multiple backend Java classes; no public APIs or control flow logic were changed.

Changes

Cohort / File(s)	Summary
Configuration `backend/src/main/java/org/booklore/config/WebSocketConfig.java`, `backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java`	Introduce precompiled Patterns for origin splitting and trailing-slash stripping; replace inline regex calls.
Book & Download `backend/src/main/java/org/booklore/service/book/BookDownloadService.java`	Add ASCII-only and non-alphanumeric Patterns for filename charset detection and ZIP filename sanitization; swap inline matches/replacements for matcher-based calls.
Kobo / EPUB / Reader `backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java`, `backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java`, `backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java`	Precompile patterns for href fragment/leading-slash removal, CFI cleanup, and non-digit stripping; replace inline replaceFirst/replaceAll and replaceAll-based channel parsing.
Metadata Extractors `backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java`, `backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java`, `backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java`	Add compiled Patterns for chapter-ID detection, digit stripping, ISBN cleaning/validation, and year detection; convert inline regex usage to Pattern constants.
Metadata Parsers & Writers `backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java`, `backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java`, `backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java`	Extract category-splitting, cover-image URL cleanup, calibre prefix removal, whitespace and numeric normalization into static Pattern constants; replace inline regex calls.
Tests `backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java`	Refactor test regex to use a precompiled Pattern for empty-line/format validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

refactor(regex): replace string manipulation with Pattern matching for improved readability #359 — similar cross-cutting refactor replacing inline regex and string-manipulation with precompiled Pattern usage, including issuer/trailing-slash normalization.

Suggested reviewers

zachyale
imajes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Description check	❌ Error	The description uses the required template structure but lacks implementation details in the Changes section, leaving it incomplete beyond the template headings.	Provide meaningful content in the Changes section describing the refactoring scope, affected files, and the rationale for using precompiled patterns.
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows conventional commit format with 'refactor(regex)' prefix and clearly describes the main refactoring objective across all changed files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

# Conflicts: # backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java

…gex definitions

…g trailing zeros pattern

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java (1)

35-42: Group new constants outside the “URL Patterns” section.

BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and ISBN_CLEANER_PATTERN are placed under the // URL Patterns comment but aren't URL patterns. Consider moving them above the comment (next to the other general patterns) or adding their own header for clarity.

♻️ Suggested grouping

     private static final Pattern LEADING_ZEROS_PATTERN = Pattern.compile("^0+");
     private static final Pattern COMMA_SEMICOLON_PATTERN = Pattern.compile("[,;]");
     private static final Pattern BOOKLORE_NOTE_PATTERN = Pattern.compile("\\[BookLore:([^\\]]+)\\]\\s*(.*)");
+    private static final Pattern BOOKLORE_TAG_PATTERN = Pattern.compile("\\[BookLore:[^\\]]+\\][^\\n]*(\n|$)");
     private static final Pattern WEB_SPLIT_PATTERN = Pattern.compile("[,;\\s]+");
+    private static final Pattern ISBN13_PATTERN = Pattern.compile("\\d{13}");
+    private static final Pattern ISBN_CLEANER_PATTERN = Pattern.compile("[- ]");

     // URL Patterns
     private static final Pattern GOODREADS_URL_PATTERN = Pattern.compile("goodreads\\.com/book/show/(\\d+)(?:-[\\w-]+)?");
     private static final Pattern AMAZON_URL_PATTERN = Pattern.compile("amazon\\.com/dp/([A-Z0-9]{10})");
     private static final Pattern COMICVINE_URL_PATTERN = Pattern.compile("comicvine\\.gamespot\\.com/issue/(?:[^/]+/)?([\\w-]+)");
     private static final Pattern HARDCOVER_URL_PATTERN = Pattern.compile("hardcover\\.app/books/([\\w-]+)");
-    private static final Pattern BOOKLORE_TAG_PATTERN = Pattern.compile("\\[BookLore:[^\\]]+\\][^\\n]*(\n|$)");
-    private static final Pattern ISBN13_PATTERN = Pattern.compile("\\d{13}");
-    private static final Pattern ISBN_CLEANER_PATTERN = Pattern.compile("[- ]");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java`
around lines 35 - 42, The constants BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and
ISBN_CLEANER_PATTERN are misgrouped under the URL patterns comment; move these
three fields out from under the "// URL Patterns" section and place them either
above that comment with the other general patterns or create a new
comment/header (e.g., "// Misc/Tag and ISBN Patterns") and place
BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and ISBN_CLEANER_PATTERN there so grouping
matches intent and improves readability in CbxMetadataExtractor.

backend/src/main/java/org/booklore/config/WebSocketConfig.java (1)

28-28: LGTM — precompiled split pattern is a faithful drop-in.

Pattern.compile("\\s*,\\s*") and Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins)) preserve the prior String.split semantics exactly.

Optional micro-refactor: Pattern.splitAsStream avoids the intermediate array.

♻️ Optional simplification

-            String[] origins = Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins))
-                    .filter(s -> !s.isEmpty())
-                    .toArray(String[]::new);
+            String[] origins = COMMA_SPLIT_PATTERN.splitAsStream(allowedOrigins)
+                    .filter(s -> !s.isEmpty())
+                    .toArray(String[]::new);

Also applies to: 72-74

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/src/main/java/org/booklore/config/WebSocketConfig.java` at line 28,
Replace uses of Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins)) with
the Pattern API's streaming method to avoid the intermediate array: use
COMMA_SPLIT_PATTERN.splitAsStream(allowedOrigins) wherever you currently split
allowedOrigins (and the other occurrences referenced around lines 72-74). Update
the code paths that construct origin lists to consume the Stream directly (same
mapping/filtering/collect operations) so behavior remains identical while
eliminating the temporary array allocation.

backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java (1)

42-43: Consider consolidating the duplicated NON_DIGIT_PATTERN constant.

The exact same pattern Pattern.compile("[^0-9]") is declared in AudioMetadataService.java (line 36). Both classes also have very similar parseChannels logic. If further duplicates appear elsewhere in the refactor, consider promoting NON_DIGIT_PATTERN (and possibly parseChannels) to a shared utility (e.g., an AudioFileUtilityService helper or a regex-constants holder) to avoid diverging definitions over time. Not blocking for this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java`
around lines 42 - 43, The NON_DIGIT_PATTERN regex is duplicated between
AudiobookMetadataExtractor and AudioMetadataService (and their parseChannels
logic is similar); extract and centralize this constant (and optionally the
parseChannels method) into a shared utility class (e.g., AudioFileUtils or
RegexConstants) and replace the local declarations in AudiobookMetadataExtractor
(remove NON_DIGIT_PATTERN and update parseChannels to call the shared utility)
and in AudioMetadataService to reference the centralized symbol, ensuring both
classes import and use the single source to prevent divergence.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java`:
- Around line 36-39: The four field declarations (NON_DIGIT_PATTERN,
audioFileUtility, audiobookMetadataExtractor, bookFileRepository) have
inconsistent indentation; update their leading spaces so they are all aligned
with 4-space indentation like the surrounding Java style, making the block
visually consistent (adjust indent of the two 2-space lines to 4 spaces so all
four fields align).

---

Nitpick comments:
In `@backend/src/main/java/org/booklore/config/WebSocketConfig.java`:
- Line 28: Replace uses of
Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins)) with the Pattern API's
streaming method to avoid the intermediate array: use
COMMA_SPLIT_PATTERN.splitAsStream(allowedOrigins) wherever you currently split
allowedOrigins (and the other occurrences referenced around lines 72-74). Update
the code paths that construct origin lists to consume the Stream directly (same
mapping/filtering/collect operations) so behavior remains identical while
eliminating the temporary array allocation.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java`:
- Around line 42-43: The NON_DIGIT_PATTERN regex is duplicated between
AudiobookMetadataExtractor and AudioMetadataService (and their parseChannels
logic is similar); extract and centralize this constant (and optionally the
parseChannels method) into a shared utility class (e.g., AudioFileUtils or
RegexConstants) and replace the local declarations in AudiobookMetadataExtractor
(remove NON_DIGIT_PATTERN and update parseChannels to call the shared utility)
and in AudioMetadataService to reference the centralized symbol, ensuring both
classes import and use the single source to prevent divergence.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java`:
- Around line 35-42: The constants BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and
ISBN_CLEANER_PATTERN are misgrouped under the URL patterns comment; move these
three fields out from under the "// URL Patterns" section and place them either
above that comment with the other general patterns or create a new
comment/header (e.g., "// Misc/Tag and ISBN Patterns") and place
BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and ISBN_CLEANER_PATTERN there so grouping
matches intent and improves readability in CbxMetadataExtractor.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 7baf2fdc-5798-4d98-a6e9-c791603074f4

📥 Commits

Reviewing files that changed from the base of the PR and between 3fda81b and e7c4cc9.

📒 Files selected for processing (13)

backend/src/main/java/org/booklore/config/WebSocketConfig.java
backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
backend/src/main/java/org/booklore/service/book/BookDownloadService.java
backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Analyze (java-kotlin)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Test Suite / Backend Tests
GitHub Check: Test Suite / Frontend Tests
GitHub Check: Frontend Lint Threshold Check

🧰 Additional context used

📓 Path-based instructions (2)

backend/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

backend/src/**/*.java: Use 4-space indentation and match surrounding Java style in backend code
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection in backend code
Use MapStruct for entity/DTO mapping in backend code
Keep JPA entities on the *Entity suffix in backend code

Files:

backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
backend/src/main/java/org/booklore/config/WebSocketConfig.java
backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
backend/src/main/java/org/booklore/service/book/BookDownloadService.java
backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java

backend/src/test/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

Prefer focused unit tests; use @SpringBootTest only when the Spring context is required in backend code

Files:

backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java

🧠 Learnings (6)

📚 Learning: 2026-04-10T08:15:37.436Z

Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
backend/src/main/java/org/booklore/config/WebSocketConfig.java
backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
backend/src/main/java/org/booklore/service/book/BookDownloadService.java
backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java

📚 Learning: 2026-04-14T12:43:08.698Z

Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 502
File: booklore-api/src/main/java/org/booklore/service/reader/ChapterCacheService.java:0-0
Timestamp: 2026-04-14T12:43:08.698Z
Learning: For this codebase (booklore-api), target Java 25 with `--enable-preview`, so `_` is intentionally used as an unnamed/ignored variable (e.g., lambda parameter or pattern variable) per Java’s preview feature JEP 456. Do not flag `_` in those contexts as an invalid/reserved identifier; only flag it if it’s used in a non-supported position (e.g., where an unnamed variable is not applicable for the Java preview rules).

Applied to files:

backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
backend/src/main/java/org/booklore/config/WebSocketConfig.java
backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
backend/src/main/java/org/booklore/service/book/BookDownloadService.java
backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java

📚 Learning: 2026-04-02T09:25:37.417Z

Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 334
File: booklore-api/src/main/java/org/booklore/service/reader/EpubReaderService.java:402-407
Timestamp: 2026-04-02T09:25:37.417Z
Learning: When reviewing epub-related file processing code in this project, remember that grimmory-tools/grimmory uses the `org.grimmory.epub4j` library (from grimmory-tools/epub4j) and that it is a grimmory-maintained fork fundamentally different from `io.documentnode.epub4j`. Do not assume the same API behavior, limitations, or semantics as `io.documentnode.epub4j`. Before flagging issues or proposing changes, verify the current API/capabilities in the grimmory-tools/epub4j repository.

Applied to files:

backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java

📚 Learning: 2026-04-02T09:12:50.790Z

Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 334
File: booklore-api/build.gradle.kts:0-0
Timestamp: 2026-04-02T09:12:50.790Z
Learning: In `grimmory-tools/grimmory`, nightcompress (`com.github.gotson.nightcompress:nightcompress`) is the preferred library for archive/unarchiving operations (including RAR and other formats). Flag any PR changes that remove or replace nightcompress with an alternative (e.g., junrar, commons-compress, JNI-based libs, or standard Java ZipInputStream/ZipFile) for unarchiving purposes. Nightcompress is considered better than JNI Java alternatives and generally better than standard Java unzip. Exceptions are acceptable when nightcompress cannot handle a specific format, but such trade-offs should be made explicit and confirmed by the maintainer.

Applied to files:

backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java

📚 Learning: 2026-04-14T22:11:02.436Z

Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 513
File: booklore-api/src/main/java/org/booklore/service/reader/AudiobookReaderService.java:38-39
Timestamp: 2026-04-14T22:11:02.436Z
Learning: In `grimmory-tools/grimmory`, for the `AudiobookReaderService` pattern of wrapping a DB lookup + file I/O under a single `Transactional` boundary, the preferred refactoring approach is **Option B**: extract the `BookFileEntity` lookup into a dedicated `Service` bean (e.g., `AudiobookEntityLoader`) annotated with `Transactional(readOnly = true)`, so that the proxy always narrows the transaction to the DB access only and all filesystem I/O runs outside the persistence context. Do NOT use `TransactionTemplate` (Option A) for this pattern in this codebase.

Applied to files:

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java

📚 Learning: 2026-03-24T18:46:47.249Z

Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java

🔇 Additional comments (21)

backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java (2)

26-32: Good refactor to precompile and reuse the CFI cleanup regex.

This is a clean performance/readability win, and the static Pattern constant is correctly scoped.

194-199: Normalization behavior is preserved while eliminating repeated regex compilation.

The null guard remains intact and using CFI_CLEANER_PATTERN.matcher(cfi).replaceFirst("!") keeps semantics consistent.

backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java (4)

26-26: Good regex precompilation for repeated usage paths.

Moving these expressions to static Pattern constants reduces repeated compilation and keeps intent clear.

Also applies to: 33-36

197-199: Author normalization update is clean and behavior-preserving.

Using WHITESPACE_PATTERN.matcher(...).replaceAll(" ").trim() is a solid replacement for inline regex here.

414-415: Series number formatting refactor looks correct.

The new matcher-based cleanup for trailing zeros and trailing dot keeps output semantics intact while improving efficiency.

435-437: Goodreads numeric-part validation is correctly migrated.

NUMERIC_PATTERN.matcher(numericPart).matches() is equivalent to the prior string regex check and improves reuse.

backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java (4)

44-47: Good refactor to precompiled regex constants.

Line 44–Line 47 cleanly centralize regex patterns and avoid repeated compilation in hot paths.

415-418: Category split change is solid.

Line 415 keeps behavior intact while reusing the shared pattern; nice cleanup.

478-481: URL normalization refactor looks correct.

Line 478 and Line 481 are straightforward substitutions to shared matchers with unchanged intent.

547-549: Date parsing update is clean and equivalent.

Line 547 keeps the full-date gate explicit and readable by reusing the precompiled pattern.

backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java (2)

38-38: LGTM! Clean pattern declaration.

The precompiled YEAR_PATTERN follows the established pattern in this file and improves performance by avoiding repeated regex compilation during date parsing.

555-555: LGTM! Correct usage of precompiled pattern.

The refactored code correctly uses the precompiled YEAR_PATTERN, maintaining the same behavior while improving performance by avoiding repeated regex compilation.

backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java (1)

43-43: Excellent refactoring—precompiled pattern improves both performance and clarity.

The introduction of the CALIBRE_PREFIX_PATTERN constant and its use at Line 1116 is a solid improvement over the inline regex string. By precompiling the pattern once as a static constant, the regex engine no longer needs to recompile it on each invocation of cleanupCalibreArtifacts, reducing overhead. The constant name is also self-documenting, making the intent clear to future maintainers.

Also applies to: 58-58, 1116-1116

backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java (2)

123-131: LGTM — GTIN normalization and validation are correct.

Stripping [- ] then matching \d{13} with Pattern.matches() (anchored full match) is the right pattern, and logging the original gtin on rejection preserves useful context.

286-292: LGTM — BookLore tag stripping behavior preserved.

BOOKLORE_TAG_PATTERN consumes the full line up to \n or end-of-string, so CRLF endings are handled correctly (the trailing \r is consumed by [^\n]*). The fallback to cleaned notes only when no Summary/Description exists is sound.

backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java (1)

42-42: LGTM — equivalent semantics, less recompilation.

TRAILING_SLASH_PATTERN.matcher(origin).replaceAll("") is a faithful replacement for the previous inline replaceAll("/+$", ""), and origin equality comparison in validateRedirectUri continues to work as before.

Also applies to: 159-159

backend/src/main/java/org/booklore/service/book/BookDownloadService.java (1)

46-47: LGTM — both regex sites are behavior-preserving.

ASCII_ONLY_PATTERN.matcher(filename).matches() is equivalent to the prior filename.matches("\\p{ASCII}*") (both require a full match), so the US_ASCII vs UTF-8 charset selection is unchanged.

NON_ALPHANUMERIC_PATTERN.matcher(bookTitle).replaceAll("_") faithfully reproduces the previous replaceAll("[^a-zA-Z0-9\\-_]", "_") for the ZIP filename sanitization.

Promoting both regexes to precompiled constants avoids repeated compilation on hot download paths.

Also applies to: 143-144, 188-188

backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java (1)

14-15: LGTM — anchor and leading-slash regexes preserved.

ANCHOR_PATTERN.matcher(href).replaceFirst("") and LEADING_SLASH_PATTERN.matcher(decodedHref).replaceFirst("") are direct equivalents of the prior inline replaceFirst calls. Using replaceFirst is correct here since both patterns are anchored / single-occurrence.

Minor nit (no change needed): the trailing $ in "#.*$" is redundant given the greedy .*, but it preserves the original literal exactly.

Also applies to: 30-31, 46-46

backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java (1)

26-26: LGTM! Clean refactor with minor performance benefit.

The change correctly replaces the inline regex with a precompiled Pattern constant:

Behavioral equivalence is maintained (String.matches(regex) → Pattern.matcher(str).matches())

Avoids recompiling the regex on each test execution

The descriptive constant name EMPTY_LINES_PATTERN improves code clarity

Aligns well with the PR's goal of standardizing on precompiled patterns

Also applies to: 37-37, 122-122

backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java (1)

201-220: LGTM — semantics preserved.

parseChannels and extractChapterTitle behavior is unchanged; the precompiled patterns are equivalent to the previous inline regexes and avoid recompilation on every call. Thread-safety is fine since Pattern is immutable and a new Matcher is created per invocation.

Also applies to: 326-338

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java (1)

383-392: LGTM.

Refactor is semantically equivalent to the prior String.replaceAll("[^0-9]", ""). (Aside: this parseChannels doesn't handle "5.1"/"surround" like its counterpart in AudiobookMetadataExtractor — pre-existing and out of scope here, but worth aligning eventually.)

…improved clarity and performance (grimmory-tools#869)

refactor(regex): replace string manipulation with regex patterns for …

cfc8b80

…improved clarity and performance

Merge branch 'develop' into pattern-variables

1c015ae

# Conflicts: # backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java

coderabbitai Bot requested a review from zachyale April 25, 2026 16:06

balazs-szucs added 2 commits April 25, 2026 18:06

refactor(EpubCfiService): simplify CFI patterns by removing unused re…

bf66e2c

…gex definitions

refactor(PdfMetadataWriter): enhance decimal formatting by introducin…

e7c4cc9

…g trailing zeros pattern

coderabbitai Bot added backend enhancement labels Apr 25, 2026

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java Outdated

balazs-szucs added 2 commits April 26, 2026 22:06

Merge branch 'develop' into pattern-variables

f27e272

Merge branch 'develop' into pattern-variables

2ce740a

imnotjames approved these changes Apr 30, 2026

View reviewed changes

Comment thread backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java Outdated

refactor(AudioMetadataService): adjust formatting for consistency

d61ea8f

coderabbitai Bot requested a review from imajes April 30, 2026 06:25

balazs-szucs enabled auto-merge (squash) April 30, 2026 06:26

balazs-szucs merged commit b22fa38 into grimmory-tools:develop Apr 30, 2026
14 of 15 checks passed

coderabbitai Bot mentioned this pull request May 4, 2026

perf(api): initialize StringBuilder with specified capacities for performance #1087

Closed

7 tasks

dsmouse pushed a commit to dsmouse/grimmory that referenced this pull request May 11, 2026

refactor(regex): replace string manipulation with regex patterns for …

e703e04

…improved clarity and performance (grimmory-tools#869)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(regex): replace string manipulation with regex patterns for improved clarity and performance#869

refactor(regex): replace string manipulation with regex patterns for improved clarity and performance#869
balazs-szucs merged 7 commits into
grimmory-tools:developfrom
balazs-szucs:pattern-variables

balazs-szucs commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Review failed

Grimmory Internal Tools

Metadata Standards and Compliance

General Java and Spring rules

Pre-merge checks failed

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

balazs-szucs commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Grimmory Internal Tools

Metadata Standards and Compliance

General Java and Spring rules

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

balazs-szucs commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading