Skip to content

fix(bookdrop): clean bookdrop metadata before persisting#462

Merged
balazs-szucs merged 4 commits into
grimmory-tools:developfrom
imnotjames:fix/452/bookdrop-clean-metadata
Apr 11, 2026
Merged

fix(bookdrop): clean bookdrop metadata before persisting#462
balazs-szucs merged 4 commits into
grimmory-tools:developfrom
imnotjames:fix/452/bookdrop-clean-metadata

Conversation

@imnotjames

@imnotjames imnotjames commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Description

updates the bookdrop initial metadata to truncate values - similar to the FileProcessor - so that data suggested in bookdrop metadata are of valid lengths

Linked Issue: Fixes #452

Changes

  • adds a cleanInitialMetadata method for truncating initial

Summary by CodeRabbit

  • Bug Fixes
    • Metadata extraction now cleans and truncates long text and identifier fields (titles, descriptions, publishers, ISBNs, language, external IDs) and treats null extractor results gracefully so stored original metadata is consistent.
  • Tests
    • Added and updated tests to verify metadata truncation and correct serialization behavior during initial metadata handling.

@coderabbitai

coderabbitai Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Introduced metadata sanitization in BookdropMetadataService: extracted BookMetadata is passed through a new cleaner that clones and truncates several text and identifier fields (title, subtitle, description, publisher, seriesName, language, ASIN/ISBNs/external IDs) before serialization/persistence.

Changes

Cohort / File(s) Summary
Metadata Cleaning Logic
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
Added cleanInitialMetadata(BookMetadata) that returns null for null input or a cloned BookMetadata with truncated text and identifier fields. Updated extractInitialMetadata(...) to use the cleaner; added static import for FileService.truncate.
Test Coverage
booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java
Relaxed a mock to use a generic matcher in attachInitialMetadata_shouldHandleNullCoverGracefully. Added attachInitialMetadata_shouldTruncateFields to capture serialized metadata and assert truncation of asin, isbn10, isbn13, and language before serialization.

Sequence Diagram(s)

sequenceDiagram
    participant Extractor as MetadataExtractor
    participant Service as BookdropMetadataService
    participant Ser as ObjectMapper
    participant DB as Persistence

    Extractor->>Service: extractMetadata(file)
    Service->>Service: cleanInitialMetadata(extracted)  %% truncates fields
    Service->>Ser: writeValueAsString(cleanedMetadata)
    Service->>DB: attachInitialMetadata(cleanedMetadata)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I found a long ASIN in a heap,
I nibbled and trimmed so the DB can sleep.
Titles and codes now fit quite neat,
I hop away pleased on tiny feet —
Metadata tidy, my work complete.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description includes both required template sections: a description explaining what the PR does and a linked issue reference (Fixes #452), though the Changes section is incomplete but present.
Linked Issues check ✅ Passed The code changes directly address issue #452 by implementing metadata truncation in BookdropMetadataService to prevent 'Data too long' database errors when persisting long ASIN and other metadata values.
Out of Scope Changes check ✅ Passed All changes are focused on the bookdrop metadata cleaning functionality required by issue #452; both the production code and test updates are directly related to truncating metadata fields.
Title check ✅ Passed The pull request title follows the conventional commit format with a type prefix (fix) and a clear, descriptive subject that accurately summarizes the main change of adding metadata cleaning/truncation before persistence.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@imnotjames

imnotjames commented Apr 11, 2026

Copy link
Copy Markdown
Contributor Author

Bookdrop metadata shows a UUID ASIN truncated:

image

@imnotjames imnotjames force-pushed the fix/452/bookdrop-clean-metadata branch from f3f522a to c801156 Compare April 11, 2026 01:35
@imnotjames imnotjames marked this pull request as ready for review April 11, 2026 01:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java (1)

136-161: Consider extending truncation assertions to all cleaned ID fields

The new test is solid for ASIN/ISBN/language. Adding assertions for other cleaned external IDs (e.g., doubanId, audibleId) would better protect against metadata-loss regressions in the cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java`
around lines 136 - 161, The test attachInitialMetadata_shouldTruncateFields
should also assert truncation/cleaning for other external ID fields to prevent
regressions: after capturing the BookMetadata via the existing
ArgumentCaptor<BookMetadata> (argument) and obtaining actual, add assertions for
fields like getDoubanId() and getAudibleId() (and any other cleaned ID getters
present on BookMetadata) to verify they match expected truncated/cleaned values
based on the input metadata used in metadataExtractorFactory.extractMetadata;
keep the same mocking flow (when(...).thenReturn(metadata)) and verify calls as
already implemented.
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (1)

136-179: Reduce drift risk by centralizing truncation limits

The cleaner hardcodes many max lengths inline. Consider extracting named constants (or reusing a shared sanitizer aligned with FileProcessor) to keep schema-limit updates consistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`
around lines 136 - 179, The builder in BookdropMetadataService uses many inline
numeric truncation limits (e.g., truncate(extracted.getTitle(), 1000),
truncate(..., 10), truncate(..., 5000)) which risks drift; refactor by
introducing descriptive constants (e.g., MAX_TITLE_LEN, MAX_SUBTITLE_LEN,
MAX_DESCRIPTION_LEN, MAX_ISBN_LEN, etc.) at the top of BookdropMetadataService
or by calling the shared sanitizer utilities used by FileProcessor, then replace
each numeric literal in the BookMetadata.builder() calls with the appropriate
constant or shared sanitizer method to centralize schema limits and keep them
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`:
- Around line 154-163: cleanInitialMetadata drops doubanId and audibleId from
the cleaned object, losing extracted identifiers; update the builder chain in
cleanInitialMetadata to include .doubanId(truncate(extracted.getDoubanId(),
100)) and .audibleId(truncate(extracted.getAudibleId(), 100)) (matching the
existing pattern that uses truncate and extracted getters) so those fields are
preserved during metadata cleaning.

---

Nitpick comments:
In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`:
- Around line 136-179: The builder in BookdropMetadataService uses many inline
numeric truncation limits (e.g., truncate(extracted.getTitle(), 1000),
truncate(..., 10), truncate(..., 5000)) which risks drift; refactor by
introducing descriptive constants (e.g., MAX_TITLE_LEN, MAX_SUBTITLE_LEN,
MAX_DESCRIPTION_LEN, MAX_ISBN_LEN, etc.) at the top of BookdropMetadataService
or by calling the shared sanitizer utilities used by FileProcessor, then replace
each numeric literal in the BookMetadata.builder() calls with the appropriate
constant or shared sanitizer method to centralize schema limits and keep them
consistent.

In
`@booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java`:
- Around line 136-161: The test attachInitialMetadata_shouldTruncateFields
should also assert truncation/cleaning for other external ID fields to prevent
regressions: after capturing the BookMetadata via the existing
ArgumentCaptor<BookMetadata> (argument) and obtaining actual, add assertions for
fields like getDoubanId() and getAudibleId() (and any other cleaned ID getters
present on BookMetadata) to verify they match expected truncated/cleaned values
based on the input metadata used in metadataExtractorFactory.extractMetadata;
keep the same mocking flow (when(...).thenReturn(metadata)) and verify calls as
already implemented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5978bad1-1ffc-4cc2-b428-6c0314cc7b35

📥 Commits

Reviewing files that changed from the base of the PR and between 66dcc44 and c801156.

📒 Files selected for processing (2)
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
  • booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test Suite / Backend Tests
🧰 Additional context used
📓 Path-based instructions (2)
booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

  • booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
booklore-api/src/test/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

Prefer focused unit tests; use @SpringBootTest only when the Spring context is required

Files:

  • booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 372
File: frontend/src/app/features/book/service/app-books-api.service.ts:150-178
Timestamp: 2026-04-04T14:03:28.295Z
Learning: In `frontend/src/app/features/book/service/app-books-api.service.ts`, the `summaryToBook` function intentionally uses `as unknown as Book` to cast a partial object to the `Book` type. The returned object omits some required `Book` fields (e.g. `metadata.bookId`, `pdfProgress.page`) and includes an extra `bookFiles: []` property not in the `Book` interface. This is by design — consumers of `AppBooksApiService.books()` (e.g. `BookCardComponent`) only access the subset of properties that are populated from `AppBookSummary`, so no runtime issues occur. Do not flag this as a type safety error.
📚 Learning: 2026-03-24T18:46:47.249Z
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

  • booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
🔇 Additional comments (2)
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (1)

186-186: Good sanitization hook point

Applying cleanup at Line 186 ensures originalMetadata is normalized before serialization/persistence.

booklore-api/src/test/java/org/booklore/service/BookdropMetadataServiceTest.java (1)

127-127: Nice improvement to mock stubbing robustness

Using any() here removes brittle coupling to a specific BookMetadata instance and keeps the test focused on behavior.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (1)

136-165: Centralize max-length limits to avoid drift.

This block hardcodes many size limits. Consider moving them to shared constants (or a shared sanitizer) so Bookdrop and other ingestion paths stay consistent as schema/provider rules evolve.

Refactor sketch
+private static final int MAX_TITLE_LENGTH = 1000;
+private static final int MAX_SUBTITLE_LENGTH = 1000;
+private static final int MAX_DESCRIPTION_LENGTH = 5000;
+private static final int MAX_PUBLISHER_LENGTH = 1000;
+private static final int MAX_LANGUAGE_LENGTH = 10;
+private static final int MAX_ISBN10_LENGTH = 10;
+private static final int MAX_ISBN13_LENGTH = 13;
+private static final int MAX_ASIN_LENGTH = 10;
+private static final int MAX_EXTERNAL_ID_LENGTH = 100;
...
-                .title(truncate(extracted.getTitle(), 1000))
+                .title(truncate(extracted.getTitle(), MAX_TITLE_LENGTH))
...
-                .isbn10(truncate(extracted.getIsbn10(), 10))
+                .isbn10(truncate(extracted.getIsbn10(), MAX_ISBN10_LENGTH))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`
around lines 136 - 165, The builder in BookdropMetadataService that calls
truncate(...) with many hardcoded numeric limits (e.g., title, subtitle,
description, isbn10, isbn13, goodreadsId, etc.) should use centralized constants
so limits don't drift; create or use a shared constants holder (e.g.,
FieldLengthLimits or MetadataSanitizer.MAX_TITLE_LENGTH) and replace the literal
numbers in BookMetadata.builder(...) truncate calls with those constants, update
the truncate(...) signature if needed to accept the constant, and refactor other
ingestion paths to import the same constants so all providers use the same
max-length rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`:
- Around line 136-165: The builder in BookdropMetadataService that calls
truncate(...) with many hardcoded numeric limits (e.g., title, subtitle,
description, isbn10, isbn13, goodreadsId, etc.) should use centralized constants
so limits don't drift; create or use a shared constants holder (e.g.,
FieldLengthLimits or MetadataSanitizer.MAX_TITLE_LENGTH) and replace the literal
numbers in BookMetadata.builder(...) truncate calls with those constants, update
the truncate(...) signature if needed to accept the constant, and refactor other
ingestion paths to import the same constants so all providers use the same
max-length rules.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 07826e1b-e19a-4171-9787-cd579dac9019

📥 Commits

Reviewing files that changed from the base of the PR and between c801156 and a84ae6e.

📒 Files selected for processing (1)
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Suite / Frontend Tests
  • GitHub Check: Test Suite / Backend Tests
  • GitHub Check: Analyze (java-kotlin)
🧰 Additional context used
📓 Path-based instructions (1)
booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.
📚 Learning: 2026-03-24T18:46:47.249Z
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
📚 Learning: 2026-04-04T15:36:56.558Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 372
File: booklore-api/src/main/java/org/booklore/app/service/AppBookService.java:357-359
Timestamp: 2026-04-04T15:36:56.558Z
Learning: In `booklore-api/src/main/java/org/booklore/app/service/AppBookService.java`, `shelfId` and `magicShelfId` are mutually exclusive navigation contexts in `getFilterOptions`. A request will never supply both at the same time, so the `else if (shelfId != null)` branch after the `magicBookIds != null` check is intentional and correct — there is no missing shelf-∩-magicShelf intersection case. Do not flag this pattern as a bug.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
🔀 Multi-repo context grimmory-tools/grimmory-docs

[::grimmory-tools/grimmory-docs::] docs/audit-logs.md:123 — notes "Descriptions longer than 1024 characters are truncated." (indicates existing truncation policy for some metadata fields)

[::grimmory-tools/grimmory-docs::] docs/magic-shelf.md:134-136 — documents ISBN-13/ISBN-10 fields and names (isbn13, isbn10); line 205 — lists asin, isbn13, isbn10, language, and many external IDs as recognized metadata fields (shows these identifiers are first-class and surface in UI/filtering)

[::grimmory-tools/grimmory-docs::] docs/metadata/metadata-settings.md:62,75,134 — references provider identifiers including Amazon ASIN and related provider metadata (confirms ASIN is a configured provider identifier)

[::grimmory-tools/grimmory-docs::] docs/metadata/sidecar-files.md:27,41 — examples and mention of external identifiers (ASIN included) in sidecar files (implies persisted metadata flows into file-sidecar usage)

[::grimmory-tools/grimmory-docs::] docs/bookdrop.md & docs/bookdrop-advanced.md (multiple lines) — describe Bookdrop workflow, referencing ASIN and identifier extraction/features and placeholders including {ASIN} (confirms Bookdrop metadata is used in import/finalize flows and filename patterns)

Summary: Documentation shows ASIN, isbn10, isbn13, language and other external IDs are first-class metadata fields used across Bookdrop, sidecar files, filename patterns, and UI filters; there is at least one documented truncation policy (description → 1024 chars). This PR's truncation of initial Bookdrop metadata (ASIN/ISBN/language/etc.) aligns with existing concerns in docs and is relevant to import/finalize flows and sidecar/filename usage.

🔇 Additional comments (2)
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (2)

131-165: Good sanitization coverage for the issue path.

Lines 131–165 correctly null-guard and truncate the high-risk metadata fields before persistence, including ASIN/ISBN and external IDs, which matches the Bookdrop finalize failure scenario.


192-192: Nice boundary placement for cleaning.

Line 192 applies cleaning immediately after extraction, so persisted originalMetadata is consistently sanitized for downstream finalize/import flow.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (1)

142-164: Centralize truncation limits to avoid drift across metadata flows.

The inline numeric limits are correct but scattered and easy to desync from other sanitization paths (e.g., FileProcessor/schema constraints). Prefer named constants (or a shared metadata sanitizer/constraints class) to keep behavior consistent long-term.

Proposed refactor
+    private static final int MAX_TITLE_LENGTH = 1000;
+    private static final int MAX_SUBTITLE_LENGTH = 1000;
+    private static final int MAX_DESCRIPTION_LENGTH = 5000;
+    private static final int MAX_PUBLISHER_LENGTH = 1000;
+    private static final int MAX_SERIES_NAME_LENGTH = 1000;
+    private static final int MAX_LANGUAGE_LENGTH = 10;
+    private static final int MAX_ISBN10_LENGTH = 10;
+    private static final int MAX_ISBN13_LENGTH = 13;
+    private static final int MAX_ASIN_LENGTH = 10;
+    private static final int MAX_AUDIBLE_ID_LENGTH = 10;
+    private static final int MAX_EXTERNAL_ID_LENGTH = 100;
...
-                .title(truncate(extracted.getTitle(), 1000))
-                .subtitle(truncate(extracted.getSubtitle(), 1000))
-                .description(truncate(extracted.getDescription(), 5000))
-                .publisher(truncate(extracted.getPublisher(), 1000))
+                .title(truncate(extracted.getTitle(), MAX_TITLE_LENGTH))
+                .subtitle(truncate(extracted.getSubtitle(), MAX_SUBTITLE_LENGTH))
+                .description(truncate(extracted.getDescription(), MAX_DESCRIPTION_LENGTH))
+                .publisher(truncate(extracted.getPublisher(), MAX_PUBLISHER_LENGTH))
...
-                .seriesName(truncate(extracted.getSeriesName(), 1000))
-                .language(truncate(extracted.getLanguage(), 10))
+                .seriesName(truncate(extracted.getSeriesName(), MAX_SERIES_NAME_LENGTH))
+                .language(truncate(extracted.getLanguage(), MAX_LANGUAGE_LENGTH))
...
-                .isbn10(truncate(extracted.getIsbn10(), 10))
-                .isbn13(truncate(extracted.getIsbn13(), 13))
+                .isbn10(truncate(extracted.getIsbn10(), MAX_ISBN10_LENGTH))
+                .isbn13(truncate(extracted.getIsbn13(), MAX_ISBN13_LENGTH))
...
-                .asin(truncate(extracted.getAsin(), 10))
-                .audibleId(truncate(extracted.getAudibleId(), 10))
-                .goodreadsId(truncate(extracted.getGoodreadsId(), 100))
+                .asin(truncate(extracted.getAsin(), MAX_ASIN_LENGTH))
+                .audibleId(truncate(extracted.getAudibleId(), MAX_AUDIBLE_ID_LENGTH))
+                .goodreadsId(truncate(extracted.getGoodreadsId(), MAX_EXTERNAL_ID_LENGTH))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`
around lines 142 - 164, Replace the scattered numeric literals used in
truncate(...) calls in BookdropMetadataService (e.g., title, subtitle,
description, publisher, seriesName, language, isbn10, isbn13, asin, audibleId,
goodreadsId, hardcoverId, hardcoverBookId, googleId, comicvineId,
lubimyczytacId, ranobedbId, doubanId) with shared named constants: either add
private static final constants in BookdropMetadataService or, preferably,
reference a new centralized MetadataConstraints (or MetadataSanitizer) class
that exposes descriptive constants like MAX_TITLE_LENGTH,
MAX_DESCRIPTION_LENGTH, MAX_ISBN10_LENGTH, MAX_EXTERNAL_ID_LENGTH,
MAX_LANGUAGE_LENGTH, etc.; update all truncate(...) calls to use those constants
so truncation limits are centralized and consistent with other sanitation/schema
flows (and add unit tests if present to validate the constant values).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`:
- Around line 142-164: Replace the scattered numeric literals used in
truncate(...) calls in BookdropMetadataService (e.g., title, subtitle,
description, publisher, seriesName, language, isbn10, isbn13, asin, audibleId,
goodreadsId, hardcoverId, hardcoverBookId, googleId, comicvineId,
lubimyczytacId, ranobedbId, doubanId) with shared named constants: either add
private static final constants in BookdropMetadataService or, preferably,
reference a new centralized MetadataConstraints (or MetadataSanitizer) class
that exposes descriptive constants like MAX_TITLE_LENGTH,
MAX_DESCRIPTION_LENGTH, MAX_ISBN10_LENGTH, MAX_EXTERNAL_ID_LENGTH,
MAX_LANGUAGE_LENGTH, etc.; update all truncate(...) calls to use those constants
so truncation limits are centralized and consistent with other sanitation/schema
flows (and add unit tests if present to validate the constant values).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e5ec8ea-8e95-4c0e-85a9-d87af2f019af

📥 Commits

Reviewing files that changed from the base of the PR and between a84ae6e and 837e058.

📒 Files selected for processing (1)
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Analyze (java-kotlin)
  • GitHub Check: Test Suite / Frontend Tests
  • GitHub Check: Test Suite / Backend Tests
🧰 Additional context used
📓 Path-based instructions (1)
booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.
📚 Learning: 2026-03-24T18:46:47.249Z
Learnt from: pedronave
Repo: grimmory-tools/grimmory PR: 173
File: booklore-api/src/main/java/org/booklore/service/hardcover/HardcoverSyncService.java:188-199
Timestamp: 2026-03-24T18:46:47.249Z
Learning: In grimmory-tools/grimmory (HardcoverSyncService.java), when a `hardcoverBookId` is stored in book metadata but is malformed (non-numeric), the preferred behavior is to return null and skip the sync rather than falling back to ISBN-based lookup. The reasoning is that an explicitly set bookId represents deliberate user intent, and silently resolving via ISBN could sync progress to a different book than intended.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
🔀 Multi-repo context grimmory-tools/grimmory-docs

Linked repositories findings

grimmory-tools/grimmory-docs

  • docs/tools/audit-logs.md:123 — Documents that "Descriptions longer than 1024 characters are truncated." This shows existing truncation policies for metadata fields. [::grimmory-tools/grimmory-docs::docs/tools/audit-logs.md:123]

  • docs/metadata/metadata-settings.md:62,64,75,134 — Lists provider identifiers including Amazon ASIN and explicitly names Amazon ASIN among provider identifiers; shows ASIN is a first-class provider field exposed in settings. [::grimmory-tools/grimmory-docs::docs/metadata/metadata-settings.md:62,64,75,134]

  • docs/metadata/sidecar-files.md:27,41–42,53,57–70 — Sidecar files include external identifiers (ASIN, Goodreads ID, Google ID, etc.) and sidecar behavior is configurable; implying Bookdrop metadata flows into sidecars and persisted metadata truncation could affect exported sidecar content. [::grimmory-tools/grimmory-docs::docs/metadata/sidecar-files.md:27,41-42,53,57-70]

  • docs/bookdrop.md & docs/bookdrop-advanced.md — (multiple references) Describe the Bookdrop workflow and how extracted metadata (including ASIN and other identifiers) are used during finalize/import flows. This confirms Bookdrop metadata is used in finalize/file renaming and persists into the library. [::grimmory-tools/grimmory-docs::docs/bookdrop.md][::grimmory-tools/grimmory-docs::docs/bookdrop-advanced.md]

  • docs/metadata/file-naming-patterns.md:3,80–99 — File naming patterns use metadata placeholders (e.g., {ASIN}); truncated or cleaned metadata can change filenames/moves applied during finalize/import. [::grimmory-tools/grimmory-docs::docs/metadata/file-naming-patterns.md:3,80-99]

  • docs/magic-shelf.md:134–136,205 — Magic Shelf field lists include isbn13, isbn10, and asin; many UI features and filters depend on these metadata keys. Truncating these fields affects discoverability, filters, and shelves using those keys. [::grimmory-tools/grimmory-docs::docs/magic-shelf.md:134-136,205]

Summary assessment: Documentation confirms ASIN/isbn/language and other external IDs are first-class metadata used across Bookdrop finalize flows, sidecar exports, file-naming patterns, UI filters, and Magic Shelves. The PR's behavior to truncate/clean initial Bookdrop metadata before persisting aligns with existing truncation policies and is relevant for preventing DB column-length errors and for downstream behavior (sidecars, filename patterns, shelves). [::grimmory-tools/grimmory-docs::]

🔇 Additional comments (1)
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (1)

173-173: Sanitization is applied at the right persistence boundary.

Calling cleanInitialMetadata(...) directly in extractInitialMetadata(...) ensures oversized extracted values are normalized before originalMetadata is serialized/saved. Nice fix for the reported ASIN length failure path.

@imnotjames imnotjames changed the title fix(api): clean bookdrop metadata before persisting fix(bookdrop): clean bookdrop metadata before persisting Apr 11, 2026

@balazs-szucs balazs-szucs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this much better, thank you!

@balazs-szucs balazs-szucs merged commit b045191 into grimmory-tools:develop Apr 11, 2026
15 checks passed
@imnotjames imnotjames deleted the fix/452/bookdrop-clean-metadata branch April 11, 2026 23:13
@coderabbitai coderabbitai Bot mentioned this pull request Apr 15, 2026
1 task
zachyale pushed a commit to zachyale/grimmory that referenced this pull request Apr 17, 2026
…ols#462)

* fix(api): clean bookdrop metadata before persisting

* test: add tests for behaviors

* fix(api): add missing ratings + IDs

* fix(api): use `toBuilder()` to reduce risk of dropping fields
zachyale pushed a commit to zachyale/grimmory that referenced this pull request Apr 17, 2026
…ols#462)

* fix(api): clean bookdrop metadata before persisting

* test: add tests for behaviors

* fix(api): add missing ratings + IDs

* fix(api): use `toBuilder()` to reduce risk of dropping fields
zachyale pushed a commit that referenced this pull request Apr 17, 2026
* fix(api): clean bookdrop metadata before persisting

* test: add tests for behaviors

* fix(api): add missing ratings + IDs

* fix(api): use `toBuilder()` to reduce risk of dropping fields
zachyale pushed a commit that referenced this pull request Apr 22, 2026
* fix(api): clean bookdrop metadata before persisting

* test: add tests for behaviors

* fix(api): add missing ratings + IDs

* fix(api): use `toBuilder()` to reduce risk of dropping fields
dsmouse pushed a commit to dsmouse/grimmory that referenced this pull request May 11, 2026
…ols#462)

* fix(api): clean bookdrop metadata before persisting

* test: add tests for behaviors

* fix(api): add missing ratings + IDs

* fix(api): use `toBuilder()` to reduce risk of dropping fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long ASIN results in error to add in library

2 participants