Skip to content

refactor(regex): replace string manipulation with regex patterns for improved clarity and performance#869

Merged
balazs-szucs merged 7 commits into
grimmory-tools:developfrom
balazs-szucs:pattern-variables
Apr 30, 2026
Merged

refactor(regex): replace string manipulation with regex patterns for improved clarity and performance#869
balazs-szucs merged 7 commits into
grimmory-tools:developfrom
balazs-szucs:pattern-variables

Conversation

@balazs-szucs

@balazs-szucs balazs-szucs commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Description

Linked Issue: Fixes #

Changes

Summary by CodeRabbit

  • Refactor
    • Centralized and reused regular-expression handling across backend features (CORS/origin handling, auth origin resolution, filenames/ZIP sanitization, href normalization, metadata extraction/parsing/writing, audiobook/audio parsing, EPUB CFI cleanup).
    • No user-facing behavior changes; improves performance, consistency and maintainability.
  • Tests
    • Updated internal test regex handling to match the refactor.

@coderabbitai

coderabbitai Bot commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f6e1356-96ff-4ab1-9021-71bda7ced036

📥 Commits

Reviewing files that changed from the base of the PR and between 2ce740a and d61ea8f.

📒 Files selected for processing (1)
  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Suite / Frontend Tests
  • GitHub Check: Test Suite / Backend Tests
  • GitHub Check: Analyze (java-kotlin)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Frontend Lint Threshold Check
🧰 Additional context used
📓 Path-based instructions (3)
backend/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

backend/src/**/*.java: Use 4-space indentation and match surrounding Java style in backend code
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection in backend code
Use MapStruct for entity/DTO mapping in backend code
Keep JPA entities on the *Entity suffix in backend code

Files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
**/*

⚙️ CodeRabbit configuration file

**/*: This project is being developed using current and future-facing technologies:

  • Java 25 with --enable-preview (preview features are INTENTIONAL and encouraged)
  • Spring Boot 4 (latest major version, check APIs accordingly)
  • Jackson 3 (new package: tools.jackson.* instead of com.fasterxml.jackson.*)
  • Hibernate 7.3.x (Jakarta Persistence 3.2, new APIs; avoid deprecated Hibernate 5/6 patterns)
  • Angular 21 (signals-based reactivity, no NgModules unless legacy)

Grimmory Internal Tools

Metadata Standards and Compliance

  • For all metadata writing and parsing logic, double-check against Dublin Core and ANSI standards to ensure perfect official compliance.
  • We strictly follow the widespread and official XML-compliant methods for EPUB2, EPUB3, CBX, and PDF formats.

General Java and Spring rules

  • ALWAYS prefer modern, idiomatic Java 25 constructs over legacy patterns.
  • Preview features (--enable-preview) are enabled and intentional; do NOT flag them as risky unless there is a concrete runtime issue.
  • Prefer: records, sealed classes/interfaces, pattern matching (switch expressions, instanceof), structured concurrency (StructuredTaskScope), scoped values, string templates, unnamed patterns/variables.
  • Prefer virtual threads (Thread.ofVirtual(), Executors.newVirtualThreadPerTaskExecutor()) over platform threads for I/O-bound work.
  • Prefer the new Sequenced Collections API (SequencedCollection, SequencedMap) where applicable.
  • Prefer var for local variables when the type is obvious from context.
  • Use stream().toList() instead of stream().collect(Collectors.toList()) for imm...

Files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
**/service/**/*.java

⚙️ CodeRabbit configuration file

**/service/**/*.java: Spring Framework 7 service layer review:

  • Flag missing @Transactional on methods that perform multiple writes.
  • Prefer constructor injection over @Autowired field injection. Use @AllArgsConstructor.
  • Use ApiError enum for throwing exceptions.
  • Flag checked exceptions swallowed silently; must log or rethrow.
  • Flag Thread.sleep(); prefer Duration-based overloads or ScheduledExecutorService.
  • Prefer virtual threads (Thread.ofVirtual()) for I/O-bound operations.
  • Flag mutable shared state in singleton beans.

Files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
🧠 Learnings (4)
📚 Learning: 2026-04-22T01:56:39.495Z
Learnt from: CR
Repo: grimmory-tools/grimmory PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-22T01:56:39.495Z
Learning: Applies to backend/src/**/*.java : Use 4-space indentation and match surrounding Java style in backend code

Applied to files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
📚 Learning: 2026-04-14T22:11:02.436Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 513
File: booklore-api/src/main/java/org/booklore/service/reader/AudiobookReaderService.java:38-39
Timestamp: 2026-04-14T22:11:02.436Z
Learning: In `grimmory-tools/grimmory`, for the `AudiobookReaderService` pattern of wrapping a DB lookup + file I/O under a single `Transactional` boundary, the preferred refactoring approach is **Option B**: extract the `BookFileEntity` lookup into a dedicated `Service` bean (e.g., `AudiobookEntityLoader`) annotated with `Transactional(readOnly = true)`, so that the proxy always narrows the transaction to the DB access only and all filesystem I/O runs outside the persistence context. Do NOT use `TransactionTemplate` (Option A) for this pattern in this codebase.

Applied to files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
📚 Learning: 2026-04-14T12:43:08.698Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 502
File: booklore-api/src/main/java/org/booklore/service/reader/ChapterCacheService.java:0-0
Timestamp: 2026-04-14T12:43:08.698Z
Learning: For this codebase (booklore-api), target Java 25 with `--enable-preview`, so `_` is intentionally used as an unnamed/ignored variable (e.g., lambda parameter or pattern variable) per Java’s preview feature JEP 456. Do not flag `_` in those contexts as an invalid/reserved identifier; only flag it if it’s used in a non-supported position (e.g., where an unnamed variable is not applicable for the Java preview rules).

Applied to files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
🔇 Additional comments (2)
backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java (2)

36-39: Field indentation inconsistency already flagged.

The indentation mismatch between lines 36-37 (4 spaces) and lines 38-39 (2 spaces) has already been raised by previous reviewers.


388-388: Effective refactoring with measurable performance benefit.

The change from inline String.replaceAll("[^0-9]", "") to the precompiled NON_DIGIT_PATTERN.matcher(channels).replaceAll("") is correct and beneficial. Since parseChannels is called within the track-processing loop (line 160) for folder-based audiobooks, precompiling the pattern avoids repeated regex compilation when processing multiple audio files.


Walkthrough

Replaced numerous inline regular-expression usages with precompiled private static java.util.regex.Pattern constants across multiple backend Java classes; no public APIs or control flow logic were changed.

Changes

Cohort / File(s) Summary
Configuration
backend/src/main/java/org/booklore/config/WebSocketConfig.java, backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
Introduce precompiled Patterns for origin splitting and trailing-slash stripping; replace inline regex calls.
Book & Download
backend/src/main/java/org/booklore/service/book/BookDownloadService.java
Add ASCII-only and non-alphanumeric Patterns for filename charset detection and ZIP filename sanitization; swap inline matches/replacements for matcher-based calls.
Kobo / EPUB / Reader
backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java, backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java, backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
Precompile patterns for href fragment/leading-slash removal, CFI cleanup, and non-digit stripping; replace inline replaceFirst/replaceAll and replaceAll-based channel parsing.
Metadata Extractors
backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java, backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java, backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
Add compiled Patterns for chapter-ID detection, digit stripping, ISBN cleaning/validation, and year detection; convert inline regex usage to Pattern constants.
Metadata Parsers & Writers
backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java, backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java, backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
Extract category-splitting, cover-image URL cleanup, calibre prefix removal, whitespace and numeric normalization into static Pattern constants; replace inline regex calls.
Tests
backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
Refactor test regex to use a precompiled Pattern for empty-line/format validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • zachyale
  • imajes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Description check ❌ Error The description uses the required template structure but lacks implementation details in the Changes section, leaving it incomplete beyond the template headings. Provide meaningful content in the Changes section describing the refactoring scope, affected files, and the rationale for using precompiled patterns.
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title follows conventional commit format with 'refactor(regex)' prefix and clearly describes the main refactoring objective across all changed files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

# Conflicts:
#	backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
@coderabbitai coderabbitai Bot requested a review from zachyale April 25, 2026 16:06

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java (1)

35-42: Group new constants outside the “URL Patterns” section.

BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and ISBN_CLEANER_PATTERN are placed under the // URL Patterns comment but aren't URL patterns. Consider moving them above the comment (next to the other general patterns) or adding their own header for clarity.

♻️ Suggested grouping
     private static final Pattern LEADING_ZEROS_PATTERN = Pattern.compile("^0+");
     private static final Pattern COMMA_SEMICOLON_PATTERN = Pattern.compile("[,;]");
     private static final Pattern BOOKLORE_NOTE_PATTERN = Pattern.compile("\\[BookLore:([^\\]]+)\\]\\s*(.*)");
+    private static final Pattern BOOKLORE_TAG_PATTERN = Pattern.compile("\\[BookLore:[^\\]]+\\][^\\n]*(\n|$)");
     private static final Pattern WEB_SPLIT_PATTERN = Pattern.compile("[,;\\s]+");
+    private static final Pattern ISBN13_PATTERN = Pattern.compile("\\d{13}");
+    private static final Pattern ISBN_CLEANER_PATTERN = Pattern.compile("[- ]");

     // URL Patterns
     private static final Pattern GOODREADS_URL_PATTERN = Pattern.compile("goodreads\\.com/book/show/(\\d+)(?:-[\\w-]+)?");
     private static final Pattern AMAZON_URL_PATTERN = Pattern.compile("amazon\\.com/dp/([A-Z0-9]{10})");
     private static final Pattern COMICVINE_URL_PATTERN = Pattern.compile("comicvine\\.gamespot\\.com/issue/(?:[^/]+/)?([\\w-]+)");
     private static final Pattern HARDCOVER_URL_PATTERN = Pattern.compile("hardcover\\.app/books/([\\w-]+)");
-    private static final Pattern BOOKLORE_TAG_PATTERN = Pattern.compile("\\[BookLore:[^\\]]+\\][^\\n]*(\n|$)");
-    private static final Pattern ISBN13_PATTERN = Pattern.compile("\\d{13}");
-    private static final Pattern ISBN_CLEANER_PATTERN = Pattern.compile("[- ]");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java`
around lines 35 - 42, The constants BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and
ISBN_CLEANER_PATTERN are misgrouped under the URL patterns comment; move these
three fields out from under the "// URL Patterns" section and place them either
above that comment with the other general patterns or create a new
comment/header (e.g., "// Misc/Tag and ISBN Patterns") and place
BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and ISBN_CLEANER_PATTERN there so grouping
matches intent and improves readability in CbxMetadataExtractor.
backend/src/main/java/org/booklore/config/WebSocketConfig.java (1)

28-28: LGTM — precompiled split pattern is a faithful drop-in.

Pattern.compile("\\s*,\\s*") and Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins)) preserve the prior String.split semantics exactly.

Optional micro-refactor: Pattern.splitAsStream avoids the intermediate array.

♻️ Optional simplification
-            String[] origins = Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins))
-                    .filter(s -> !s.isEmpty())
-                    .toArray(String[]::new);
+            String[] origins = COMMA_SPLIT_PATTERN.splitAsStream(allowedOrigins)
+                    .filter(s -> !s.isEmpty())
+                    .toArray(String[]::new);

Also applies to: 72-74

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/src/main/java/org/booklore/config/WebSocketConfig.java` at line 28,
Replace uses of Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins)) with
the Pattern API's streaming method to avoid the intermediate array: use
COMMA_SPLIT_PATTERN.splitAsStream(allowedOrigins) wherever you currently split
allowedOrigins (and the other occurrences referenced around lines 72-74). Update
the code paths that construct origin lists to consume the Stream directly (same
mapping/filtering/collect operations) so behavior remains identical while
eliminating the temporary array allocation.
backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java (1)

42-43: Consider consolidating the duplicated NON_DIGIT_PATTERN constant.

The exact same pattern Pattern.compile("[^0-9]") is declared in AudioMetadataService.java (line 36). Both classes also have very similar parseChannels logic. If further duplicates appear elsewhere in the refactor, consider promoting NON_DIGIT_PATTERN (and possibly parseChannels) to a shared utility (e.g., an AudioFileUtilityService helper or a regex-constants holder) to avoid diverging definitions over time. Not blocking for this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java`
around lines 42 - 43, The NON_DIGIT_PATTERN regex is duplicated between
AudiobookMetadataExtractor and AudioMetadataService (and their parseChannels
logic is similar); extract and centralize this constant (and optionally the
parseChannels method) into a shared utility class (e.g., AudioFileUtils or
RegexConstants) and replace the local declarations in AudiobookMetadataExtractor
(remove NON_DIGIT_PATTERN and update parseChannels to call the shared utility)
and in AudioMetadataService to reference the centralized symbol, ensuring both
classes import and use the single source to prevent divergence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java`:
- Around line 36-39: The four field declarations (NON_DIGIT_PATTERN,
audioFileUtility, audiobookMetadataExtractor, bookFileRepository) have
inconsistent indentation; update their leading spaces so they are all aligned
with 4-space indentation like the surrounding Java style, making the block
visually consistent (adjust indent of the two 2-space lines to 4 spaces so all
four fields align).

---

Nitpick comments:
In `@backend/src/main/java/org/booklore/config/WebSocketConfig.java`:
- Line 28: Replace uses of
Arrays.stream(COMMA_SPLIT_PATTERN.split(allowedOrigins)) with the Pattern API's
streaming method to avoid the intermediate array: use
COMMA_SPLIT_PATTERN.splitAsStream(allowedOrigins) wherever you currently split
allowedOrigins (and the other occurrences referenced around lines 72-74). Update
the code paths that construct origin lists to consume the Stream directly (same
mapping/filtering/collect operations) so behavior remains identical while
eliminating the temporary array allocation.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java`:
- Around line 42-43: The NON_DIGIT_PATTERN regex is duplicated between
AudiobookMetadataExtractor and AudioMetadataService (and their parseChannels
logic is similar); extract and centralize this constant (and optionally the
parseChannels method) into a shared utility class (e.g., AudioFileUtils or
RegexConstants) and replace the local declarations in AudiobookMetadataExtractor
(remove NON_DIGIT_PATTERN and update parseChannels to call the shared utility)
and in AudioMetadataService to reference the centralized symbol, ensuring both
classes import and use the single source to prevent divergence.

In
`@backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java`:
- Around line 35-42: The constants BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and
ISBN_CLEANER_PATTERN are misgrouped under the URL patterns comment; move these
three fields out from under the "// URL Patterns" section and place them either
above that comment with the other general patterns or create a new
comment/header (e.g., "// Misc/Tag and ISBN Patterns") and place
BOOKLORE_TAG_PATTERN, ISBN13_PATTERN, and ISBN_CLEANER_PATTERN there so grouping
matches intent and improves readability in CbxMetadataExtractor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 7baf2fdc-5798-4d98-a6e9-c791603074f4

📥 Commits

Reviewing files that changed from the base of the PR and between 3fda81b and e7c4cc9.

📒 Files selected for processing (13)
  • backend/src/main/java/org/booklore/config/WebSocketConfig.java
  • backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
  • backend/src/main/java/org/booklore/service/book/BookDownloadService.java
  • backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
  • backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
  • backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
  • backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
  • backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Analyze (java-kotlin)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Test Suite / Backend Tests
  • GitHub Check: Test Suite / Frontend Tests
  • GitHub Check: Frontend Lint Threshold Check
🧰 Additional context used
📓 Path-based instructions (2)
backend/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

backend/src/**/*.java: Use 4-space indentation and match surrounding Java style in backend code
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection in backend code
Use MapStruct for entity/DTO mapping in backend code
Keep JPA entities on the *Entity suffix in backend code

Files:

  • backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
  • backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
  • backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
  • backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
  • backend/src/main/java/org/booklore/config/WebSocketConfig.java
  • backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
  • backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
  • backend/src/main/java/org/booklore/service/book/BookDownloadService.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
  • backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
backend/src/test/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

Prefer focused unit tests; use @SpringBootTest only when the Spring context is required in backend code

Files:

  • backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
🧠 Learnings (6)
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
  • backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
  • backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
  • backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
  • backend/src/main/java/org/booklore/config/WebSocketConfig.java
  • backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
  • backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
  • backend/src/main/java/org/booklore/service/book/BookDownloadService.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
  • backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
📚 Learning: 2026-04-14T12:43:08.698Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 502
File: booklore-api/src/main/java/org/booklore/service/reader/ChapterCacheService.java:0-0
Timestamp: 2026-04-14T12:43:08.698Z
Learning: For this codebase (booklore-api), target Java 25 with `--enable-preview`, so `_` is intentionally used as an unnamed/ignored variable (e.g., lambda parameter or pattern variable) per Java’s preview feature JEP 456. Do not flag `_` in those contexts as an invalid/reserved identifier; only flag it if it’s used in a non-supported position (e.g., where an unnamed variable is not applicable for the Java preview rules).

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java
  • backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java
  • backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java
  • backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
  • backend/src/main/java/org/booklore/config/WebSocketConfig.java
  • backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
  • backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
  • backend/src/main/java/org/booklore/service/book/BookDownloadService.java
  • backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java
  • backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
📚 Learning: 2026-04-02T09:25:37.417Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 334
File: booklore-api/src/main/java/org/booklore/service/reader/EpubReaderService.java:402-407
Timestamp: 2026-04-02T09:25:37.417Z
Learning: When reviewing epub-related file processing code in this project, remember that grimmory-tools/grimmory uses the `org.grimmory.epub4j` library (from grimmory-tools/epub4j) and that it is a grimmory-maintained fork fundamentally different from `io.documentnode.epub4j`. Do not assume the same API behavior, limitations, or semantics as `io.documentnode.epub4j`. Before flagging issues or proposing changes, verify the current API/capabilities in the grimmory-tools/epub4j repository.

Applied to files:

  • backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java
  • backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
  • backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java
📚 Learning: 2026-04-02T09:12:50.790Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 334
File: booklore-api/build.gradle.kts:0-0
Timestamp: 2026-04-02T09:12:50.790Z
Learning: In `grimmory-tools/grimmory`, nightcompress (`com.github.gotson.nightcompress:nightcompress`) is the preferred library for archive/unarchiving operations (including RAR and other formats). Flag any PR changes that remove or replace nightcompress with an alternative (e.g., junrar, commons-compress, JNI-based libs, or standard Java ZipInputStream/ZipFile) for unarchiving purposes. Nightcompress is considered better than JNI Java alternatives and generally better than standard Java unzip. Exceptions are acceptable when nightcompress cannot handle a specific format, but such trade-offs should be made explicit and confirmed by the maintainer.

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java
📚 Learning: 2026-04-14T22:11:02.436Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 513
File: booklore-api/src/main/java/org/booklore/service/reader/AudiobookReaderService.java:38-39
Timestamp: 2026-04-14T22:11:02.436Z
Learning: In `grimmory-tools/grimmory`, for the `AudiobookReaderService` pattern of wrapping a DB lookup + file I/O under a single `Transactional` boundary, the preferred refactoring approach is **Option B**: extract the `BookFileEntity` lookup into a dedicated `Service` bean (e.g., `AudiobookEntityLoader`) annotated with `Transactional(readOnly = true)`, so that the proxy always narrows the transaction to the DB access only and all filesystem I/O runs outside the persistence context. Do NOT use `TransactionTemplate` (Option A) for this pattern in this codebase.

Applied to files:

  • backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java
📚 Learning: 2026-03-24T18:46:47.249Z
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java
  • backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java
🔇 Additional comments (21)
backend/src/main/java/org/booklore/util/koreader/EpubCfiService.java (2)

26-32: Good refactor to precompile and reuse the CFI cleanup regex.

This is a clean performance/readability win, and the static Pattern constant is correctly scoped.


194-199: Normalization behavior is preserved while eliminating repeated regex compilation.

The null guard remains intact and using CFI_CLEANER_PATTERN.matcher(cfi).replaceFirst("!") keeps semantics consistent.

backend/src/main/java/org/booklore/service/metadata/writer/PdfMetadataWriter.java (4)

26-26: Good regex precompilation for repeated usage paths.

Moving these expressions to static Pattern constants reduces repeated compilation and keeps intent clear.

Also applies to: 33-36


197-199: Author normalization update is clean and behavior-preserving.

Using WHITESPACE_PATTERN.matcher(...).replaceAll(" ").trim() is a solid replacement for inline regex here.


414-415: Series number formatting refactor looks correct.

The new matcher-based cleanup for trailing zeros and trailing dot keeps output semantics intact while improving efficiency.


435-437: Goodreads numeric-part validation is correctly migrated.

NUMERIC_PATTERN.matcher(numericPart).matches() is equivalent to the prior string regex check and improves reuse.

backend/src/main/java/org/booklore/service/metadata/parser/GoogleParser.java (4)

44-47: Good refactor to precompiled regex constants.

Line 44–Line 47 cleanly centralize regex patterns and avoid repeated compilation in hot paths.


415-418: Category split change is solid.

Line 415 keeps behavior intact while reusing the shared pattern; nice cleanup.


478-481: URL normalization refactor looks correct.

Line 478 and Line 481 are straightforward substitutions to shared matchers with unchanged intent.


547-549: Date parsing update is clean and equivalent.

Line 547 keeps the full-date gate explicit and readable by reusing the precompiled pattern.

backend/src/main/java/org/booklore/service/metadata/extractor/PdfMetadataExtractor.java (2)

38-38: LGTM! Clean pattern declaration.

The precompiled YEAR_PATTERN follows the established pattern in this file and improves performance by avoiding repeated regex compilation during date parsing.


555-555: LGTM! Correct usage of precompiled pattern.

The refactored code correctly uses the precompiled YEAR_PATTERN, maintaining the same behavior while improving performance by avoiding repeated regex compilation.

backend/src/main/java/org/booklore/service/metadata/writer/EpubMetadataWriter.java (1)

43-43: Excellent refactoring—precompiled pattern improves both performance and clarity.

The introduction of the CALIBRE_PREFIX_PATTERN constant and its use at Line 1116 is a solid improvement over the inline regex string. By precompiling the pattern once as a static constant, the regex engine no longer needs to recompile it on each invocation of cleanupCalibreArtifacts, reducing overhead. The constant name is also self-documenting, making the intent clear to future maintainers.

Also applies to: 58-58, 1116-1116

backend/src/main/java/org/booklore/service/metadata/extractor/CbxMetadataExtractor.java (2)

123-131: LGTM — GTIN normalization and validation are correct.

Stripping [- ] then matching \d{13} with Pattern.matches() (anchored full match) is the right pattern, and logging the original gtin on rejection preserves useful context.


286-292: LGTM — BookLore tag stripping behavior preserved.

BOOKLORE_TAG_PATTERN consumes the full line up to \n or end-of-string, so CRLF endings are handled correctly (the trailing \r is consumed by [^\n]*). The fallback to cleaned notes only when no Summary/Description exists is sound.

backend/src/main/java/org/booklore/config/security/oidc/OidcAuthService.java (1)

42-42: LGTM — equivalent semantics, less recompilation.

TRAILING_SLASH_PATTERN.matcher(origin).replaceAll("") is a faithful replacement for the previous inline replaceAll("/+$", ""), and origin equality comparison in validateRedirectUri continues to work as before.

Also applies to: 159-159

backend/src/main/java/org/booklore/service/book/BookDownloadService.java (1)

46-47: LGTM — both regex sites are behavior-preserving.

  • ASCII_ONLY_PATTERN.matcher(filename).matches() is equivalent to the prior filename.matches("\\p{ASCII}*") (both require a full match), so the US_ASCII vs UTF-8 charset selection is unchanged.
  • NON_ALPHANUMERIC_PATTERN.matcher(bookTitle).replaceAll("_") faithfully reproduces the previous replaceAll("[^a-zA-Z0-9\\-_]", "_") for the ZIP filename sanitization.

Promoting both regexes to precompiled constants avoids repeated compilation on hot download paths.

Also applies to: 143-144, 188-188

backend/src/main/java/org/booklore/service/kobo/KoboEpubUtils.java (1)

14-15: LGTM — anchor and leading-slash regexes preserved.

ANCHOR_PATTERN.matcher(href).replaceFirst("") and LEADING_SLASH_PATTERN.matcher(decodedHref).replaceFirst("") are direct equivalents of the prior inline replaceFirst calls. Using replaceFirst is correct here since both patterns are anchored / single-occurrence.

Minor nit (no change needed): the trailing $ in "#.*$" is redundant given the greedy .*, but it preserves the original literal exactly.

Also applies to: 30-31, 46-46

backend/src/test/java/org/booklore/service/metadata/writer/CbxComicInfoComplianceTest.java (1)

26-26: LGTM! Clean refactor with minor performance benefit.

The change correctly replaces the inline regex with a precompiled Pattern constant:

  • Behavioral equivalence is maintained (String.matches(regex)Pattern.matcher(str).matches())
  • Avoids recompiling the regex on each test execution
  • The descriptive constant name EMPTY_LINES_PATTERN improves code clarity
  • Aligns well with the PR's goal of standardizing on precompiled patterns

Also applies to: 37-37, 122-122

backend/src/main/java/org/booklore/service/metadata/extractor/AudiobookMetadataExtractor.java (1)

201-220: LGTM — semantics preserved.

parseChannels and extractChapterTitle behavior is unchanged; the precompiled patterns are equivalent to the previous inline regexes and avoid recompilation on every call. Thread-safety is fine since Pattern is immutable and a new Matcher is created per invocation.

Also applies to: 326-338

backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java (1)

383-392: LGTM.

Refactor is semantically equivalent to the prior String.replaceAll("[^0-9]", ""). (Aside: this parseChannels doesn't handle "5.1"/"surround" like its counterpart in AudiobookMetadataExtractor — pre-existing and out of scope here, but worth aligning eventually.)

Comment thread backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java Outdated
Comment thread backend/src/main/java/org/booklore/service/reader/AudioMetadataService.java Outdated
@coderabbitai coderabbitai Bot requested a review from imajes April 30, 2026 06:25
@balazs-szucs balazs-szucs enabled auto-merge (squash) April 30, 2026 06:26
@balazs-szucs balazs-szucs merged commit b22fa38 into grimmory-tools:develop Apr 30, 2026
14 of 15 checks passed
dsmouse pushed a commit to dsmouse/grimmory that referenced this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants