Skip to content

fix(metadata): read proper description field for LubimyCzytac#877

Merged
imnotjames merged 3 commits into
grimmory-tools:developfrom
imnotjames:fix/855/lubimyczytac-descriptions
Apr 25, 2026
Merged

fix(metadata): read proper description field for LubimyCzytac#877
imnotjames merged 3 commits into
grimmory-tools:developfrom
imnotjames:fix/855/lubimyczytac-descriptions

Conversation

@imnotjames

@imnotjames imnotjames commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Description

it seems the description field has changed and now can be targeted with an ID attribute instead. without this, the retrieval was pulling a review for the description.

Linked Issue: Fixes #855

Changes

  • updates the LubimyCzytac metadata extractor to use the expected field
  • adds a test to validate the metadata parser operation
  • drops disabled "integration" test

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability of book description extraction from LubimyCzytac pages by changing how the description is located and parsed.
  • Tests

    • Replaced network-dependent integration tests with deterministic, fixture-based parser tests and stronger assertions to ensure stable, repeatable verification of parsed metadata.

@coderabbitai

coderabbitai Bot commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c628de01-4aa0-4799-8c94-04c845fe5356

📥 Commits

Reviewing files that changed from the base of the PR and between e3207cf and aa3ff74.

📒 Files selected for processing (1)
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
📜 Recent review details
🧰 Additional context used
🔀 Multi-repo context grimmory-tools/grimmory-docs

Findings for grimmory-tools/grimmory-docs

  • docs/metadata/lubimyczytac.md — Documents LubimyCzytac provider and explicitly lists "description" among returned fields; includes a "Web Scraping" note warning that site markup changes may break parsing. [::grimmory-tools/grimmory-docs::docs/metadata/lubimyczytac.md]

  • docs/metadata/metadata-settings.md — References LubimyCzytac in provider list and sidecar fields (mentions LubimyCzytac ID and rating); no selector-level or parser implementation details. [::grimmory-tools/grimmory-docs::docs/metadata/metadata-settings.md]

  • docs/magic-shelf.md — References provider-specific keys including lubimyczytacRating and lubimyczytacId and lists description as a valid metadata field for rules; no dependency on the parser's internal CSS selector. [::grimmory-tools/grimmory-docs::docs/magic-shelf.md]

Conclusion: Documentation already expects a "description" field from LubimyCzytac and warns about scraping fragility. The parser change to target "#book-description" aligns with docs and appears not to require changes in docs or other repos.


📝 Walkthrough

Walkthrough

Book description extraction now targets #book-description instead of .collapse-content. The network-dependent integration test was replaced by a fixture-based unit test that statically mocks Jsoup.connect(...) and asserts exact parsed metadata fields from local HTML fixtures.

Changes

Cohort / File(s) Summary
Parser Fix
backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
Changed CSS selector used to extract book description from .collapse-content to #book-description.
Test Refactor
backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
Replaced disabled network-dependent test with a deterministic fixture-based test: reads local search and book HTML fixtures, uses MockedStatic<Jsoup> to stub Jsoup.connect(...) for two URL prefixes, returns mocked Connection whose get() parses fixtures, and performs strict assertions on title, ISBNs, LubimyCzytac ID, author(s), and description content/length. Setup/teardown updated and Mockito extension added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Poem

🐰 I hopped through nodes and found the tag,
Swapped collapse for description — no more gag.
Fixtures snug, no network snow,
Parsed the book and watched data grow.
📚🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title follows conventional commit format with 'fix' type and descriptive subject about the LubimyCzytac description field update.
Description check ✅ Passed The PR description includes all required sections: clear explanation of what was changed, linked issue reference, and itemized list of changes made.
Linked Issues check ✅ Passed The code changes directly address the linked issue #855 by fixing the DOM selector to target the correct description field ID instead of the review content.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the LubimyCzytac metadata parser and adding corresponding tests, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@imnotjames imnotjames force-pushed the fix/855/lubimyczytac-descriptions branch from 343a5f9 to d3d9139 Compare April 25, 2026 18:16

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java (2)

185-187: Brittle exact-length assertion may fight future fixture maintenance.

assertEquals(827, result.getDescription().length()) couples the test to the precise whitespace/HTML structure of example-book.html. Any unrelated trim/normalize change in the parser (or a fixture re-save that touches whitespace) will require updating the magic number. Consider replacing with a loose lower-bound check (e.g., assertTrue(result.getDescription().length() > 500)) plus the existing prefix assertion, which is what the comment "in the right ballpark" suggests anyway.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`
around lines 185 - 187, The test in LubimyCzytacParserTest is too brittle
because it asserts an exact description length with assertEquals(827,
result.getDescription().length()); change this to a loose bound check to
tolerate harmless whitespace/HTML changes: remove or replace the assertEquals on
result.getDescription().length() with an assertion like
assertTrue(result.getDescription().length() > 500) (or another sensible lower
bound) while keeping the existing prefix check assertEquals("Jedno z
najoryginalniejszych", result.getDescription().substring(0, 28)) to ensure
content is in the right ballpark.

191-195: Replace assert with an explicit null check.

JVM assertions are off by default; if lubimyczytac/example-search.html or lubimyczytac/example-book.html is ever missing from the test classpath, you'll get a confusing NPE on is.readAllBytes() rather than a clear failure pointing at the missing fixture. Surefire typically enables -ea, but relying on that is fragile — prefer Objects.requireNonNull(is, filename) or an explicit throw new FileNotFoundException(filename).

🔧 Proposed fix
     private String readResourceFile(String filename) throws IOException {
         try (InputStream is = getClass().getClassLoader().getResourceAsStream(filename)) {
-            assert is != null;
-
-            return new String(is.readAllBytes());
+            if (is == null) {
+                throw new IOException("Test fixture not found on classpath: " + filename);
+            }
+            return new String(is.readAllBytes(), StandardCharsets.UTF_8);
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`
around lines 191 - 195, Replace the fragile JVM assertion in the
resource-loading block with an explicit null check: in the method that reads the
test fixture (where InputStream is =
getClass().getClassLoader().getResourceAsStream(filename) is used in
LubimyCzytacParserTest), detect if is == null and throw a clear exception (e.g.,
new FileNotFoundException(filename) or use Objects.requireNonNull(is, filename))
before calling is.readAllBytes(), so missing test resources produce an explicit,
informative failure rather than a NPE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`:
- Around line 190-196: The readResourceFile method in LubimyCzytacParserTest
uses new String(is.readAllBytes()) which relies on the platform default charset;
change it to decode bytes explicitly as UTF-8 (e.g., use new
String(is.readAllBytes(), StandardCharsets.UTF_8) or wrap the stream in an
InputStreamReader/BufferedReader with StandardCharsets.UTF_8) so the Polish HTML
fixture is read deterministically; update imports to include StandardCharsets if
needed and keep the method signature and exception behavior unchanged.

---

Nitpick comments:
In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`:
- Around line 185-187: The test in LubimyCzytacParserTest is too brittle because
it asserts an exact description length with assertEquals(827,
result.getDescription().length()); change this to a loose bound check to
tolerate harmless whitespace/HTML changes: remove or replace the assertEquals on
result.getDescription().length() with an assertion like
assertTrue(result.getDescription().length() > 500) (or another sensible lower
bound) while keeping the existing prefix check assertEquals("Jedno z
najoryginalniejszych", result.getDescription().substring(0, 28)) to ensure
content is in the right ballpark.
- Around line 191-195: Replace the fragile JVM assertion in the resource-loading
block with an explicit null check: in the method that reads the test fixture
(where InputStream is =
getClass().getClassLoader().getResourceAsStream(filename) is used in
LubimyCzytacParserTest), detect if is == null and throw a clear exception (e.g.,
new FileNotFoundException(filename) or use Objects.requireNonNull(is, filename))
before calling is.readAllBytes(), so missing test resources produce an explicit,
informative failure rather than a NPE.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 812a3b51-2ee3-4437-a21b-f273c03a4ed1

📥 Commits

Reviewing files that changed from the base of the PR and between e7bc5fe and 343a5f9.

📒 Files selected for processing (4)
  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
  • backend/src/test/resources/lubimyczytac/example-book.html
  • backend/src/test/resources/lubimyczytac/example-search.html
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Suite / Backend Tests
  • GitHub Check: Test Suite / Frontend Tests
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (java-kotlin)
  • GitHub Check: Frontend Lint Threshold Check
🧰 Additional context used
📓 Path-based instructions (2)
backend/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

backend/src/**/*.java: Use 4-space indentation and match surrounding Java style in backend code
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection in backend code
Use MapStruct for entity/DTO mapping in backend code
Keep JPA entities on the *Entity suffix in backend code

Files:

  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
backend/src/test/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

Prefer focused unit tests; use @SpringBootTest only when the Spring context is required in backend code

Files:

  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
🧠 Learnings (2)
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
📚 Learning: 2026-04-14T12:43:08.698Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 502
File: booklore-api/src/main/java/org/booklore/service/reader/ChapterCacheService.java:0-0
Timestamp: 2026-04-14T12:43:08.698Z
Learning: For this codebase (booklore-api), target Java 25 with `--enable-preview`, so `_` is intentionally used as an unnamed/ignored variable (e.g., lambda parameter or pattern variable) per Java’s preview feature JEP 456. Do not flag `_` in those contexts as an invalid/reserved identifier; only flag it if it’s used in a non-supported position (e.g., where an unnamed variable is not applicable for the Java preview rules).

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
🔀 Multi-repo context grimmory-tools/grimmory-docs

Findings

grimmory-tools/grimmory-docs

  • docs/metadata/lubimyczytac.md — provider guide describing fields returned (title, authors, description, cover image, publisher, published date, page count, language, ISBNs, categories, tags, series), notes that LubimyCzytac has no API and Grimmory uses web scraping which "may occasionally break if the LubimyCzytac website structure changes." [::grimmory-tools/grimmory-docs::docs/metadata/lubimyczytac.md]

  • docs/metadata/metadata-settings.md — lists Lubimyczytac as a provider and mentions Lubimyczytac ID and Rating as metadata fields. [::grimmory-tools/grimmory-docs::docs/metadata/metadata-settings.md]

  • docs/metadata/metadata-fetch-configuration.md — references Lubimyczytac in provider lists and fetch configuration. [::grimmory-tools/grimmory-docs::docs/metadata/metadata-fetch-configuration.md]

  • docs/magic-shelf.md — includes lubimyczytacRating and lubimyczytacId among metadata keys used in UI/configuration. [::grimmory-tools/grimmory-docs::docs/magic-shelf.md]

Relevance

  • Documentation explicitly documents that Grimmory scrapes LubimyCzytac and warns that site structure changes can break parsing; this PR fixing the selector from ".collapse-content" to "#book-description" aligns with that risk and should be reflected in docs/known-issues if desired. [::grimmory-tools/grimmory-docs::docs/metadata/lubimyczytac.md]
🔇 Additional comments (2)
backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java (1)

235-241: Selector change LGTM — targeted ID is more robust than the previous class.

Switching to #book-description correctly addresses the issue where .collapse-content was the first match for review content rather than the book description. The surrounding null/blank guards remain intact, so behavior on missing elements is preserved. Worth noting (per linked docs warning that LubimyCzytac scraping "may occasionally break if the LubimyCzytac website structure changes") that an ID-based selector is also more stable than a generic class selector.

backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java (1)

198-212: The static-mock pattern is safe — the parser's call path is fully mocked.

The parser invokes only Jsoup.connect(url).userAgent(...).timeout(...).get() (lines 151–154) with no direct calls to Jsoup.parse() or other static methods. The test mocks the entire chain (mockConnection for userAgent, timeout, and get), and Connection.get() returns a pre-built Document, so no internal re-parsing can occur. The explicit mock of Jsoup.parse(response) with thenCallRealMethod() is only used by the test itself (line 208), not by the parser. Jsoup 1.22.2 confirms the API surface matches the mocking strategy.

it seems the description field has changed and now can be targeted with
an ID attribute instead.  without this, the retrieval was pulling a
review for the description.
@imnotjames imnotjames force-pushed the fix/855/lubimyczytac-descriptions branch from d3d9139 to 04c0077 Compare April 25, 2026 18:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java (2)

146-153: Mismatched Book.title vs. request title is confusing.

Book is built with "Wiedźmin" while the FetchMetadataRequest uses "Sklepy cynamonowe". The parser drives its search off the request title, so the test still works, but the leftover "Wiedźmin" reads like a copy/paste from earlier iterations and could mislead future readers debugging the fixture flow.

✏️ Proposed cleanup
         Book book = Book.builder()
-            .title("Wiedźmin")
+            .title("Sklepy cynamonowe")
             .build();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`
around lines 146 - 153, In LubimyCzytacParserTest update the test fixture so the
Book's title matches the FetchMetadataRequest title to avoid confusion: change
the Book built via Book.builder().title("Wiedźmin") to use the same title string
as the request (e.g., "Sklepy cynamonowe") or otherwise use a clearly labeled
placeholder; ensure the variables referenced (Book.builder(), the Book instance,
and FetchMetadataRequest.builder() / request) are consistent so readers see that
the parser is driven by request.title.

198-212: Reusing a single parsed Document across all book-detail fetches.

mockResponse parses response once and stubs mockConnection.get() to always return that same Document instance. In parseBookDetails, the loop in fetchMetadata invokes fetchWithRetry for every URL returned from search, so every iteration receives the same Document. That works for the current single-result assertions, but:

  • results will contain duplicates (one per search hit, capped at MAX_RESULTS), which is misleading if anyone later asserts on results.size().
  • Jsoup's Document is mutable; if any future parser logic mutates the DOM (e.g., .remove(), .unwrap()), state would leak across iterations and tests.

Consider parsing the fixture fresh on each get() call (e.g., have the stub return Jsoup.parse(response) lazily via thenAnswer) so each invocation gets an independent Document.

♻️ Proposed refactor
-        // There may be a better way to get the parse to work here.
-        // However, this was the quickest and simplest way I could find.
-        mockJsoup.when(() -> Jsoup.parse(response)).thenCallRealMethod();
-        Document document = Jsoup.parse(response);
-
-        when(mockConnection.get()).thenReturn(document);
+        // Re-parse on each call so iterations don't share mutable DOM state.
+        mockJsoup.when(() -> Jsoup.parse(response)).thenCallRealMethod();
+        when(mockConnection.get()).thenAnswer(inv -> Jsoup.parse(response));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`
around lines 198 - 212, The test reuses a single mutable Jsoup Document by
stubbing mockConnection.get() to always return the same parsed instance in
mockResponse, causing duplicated/tainted state across fetchWithRetry iterations
invoked by fetchMetadata/parseBookDetails; change the stub to lazily return a
fresh parse each call (replace when(mockConnection.get()).thenReturn(document)
with a thenAnswer that returns Jsoup.parse(response) on each invocation) and
remove or adjust the earlier mockJsoup.parse stubbing so each get() yields an
independent Document.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java`:
- Around line 146-153: In LubimyCzytacParserTest update the test fixture so the
Book's title matches the FetchMetadataRequest title to avoid confusion: change
the Book built via Book.builder().title("Wiedźmin") to use the same title string
as the request (e.g., "Sklepy cynamonowe") or otherwise use a clearly labeled
placeholder; ensure the variables referenced (Book.builder(), the Book instance,
and FetchMetadataRequest.builder() / request) are consistent so readers see that
the parser is driven by request.title.
- Around line 198-212: The test reuses a single mutable Jsoup Document by
stubbing mockConnection.get() to always return the same parsed instance in
mockResponse, causing duplicated/tainted state across fetchWithRetry iterations
invoked by fetchMetadata/parseBookDetails; change the stub to lazily return a
fresh parse each call (replace when(mockConnection.get()).thenReturn(document)
with a thenAnswer that returns Jsoup.parse(response) on each invocation) and
remove or adjust the earlier mockJsoup.parse stubbing so each get() yields an
independent Document.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 5c087d43-bf85-413a-8830-2d88a857cf2f

📥 Commits

Reviewing files that changed from the base of the PR and between 343a5f9 and d3d9139.

📒 Files selected for processing (4)
  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
  • backend/src/test/resources/lubimyczytac/example-book.html
  • backend/src/test/resources/lubimyczytac/example-search.html
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (java-kotlin)
  • GitHub Check: Test Suite / Backend Tests
🧰 Additional context used
📓 Path-based instructions (2)
backend/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

backend/src/**/*.java: Use 4-space indentation and match surrounding Java style in backend code
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection in backend code
Use MapStruct for entity/DTO mapping in backend code
Keep JPA entities on the *Entity suffix in backend code

Files:

  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
backend/src/test/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

Prefer focused unit tests; use @SpringBootTest only when the Spring context is required in backend code

Files:

  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
🧠 Learnings (3)
📚 Learning: 2026-04-10T08:15:37.436Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 449
File: booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java:139-145
Timestamp: 2026-04-10T08:15:37.436Z
Learning: When using Spring `ContentDisposition.builder(...).filename(name, StandardCharsets.UTF_8).build()` (i.e., explicitly providing UTF-8), the resulting header value should include both the quoted `filename="=?UTF-8?..."` and the RFC 5987 `filename*=` parameters. In this case, any extra ASCII fallback computation (e.g., deriving an ASCII `fallbackFilename` via `NON_ASCII_PATTERN` and calling `.filename(fallbackFilename)`) is likely redundant—prefer calling only `.filename(fallbackName?, StandardCharsets.UTF_8)` as appropriate and let Spring handle the UTF-8 header parameters. Verify by comparing the emitted header for `filename` and `filename*` before deciding to keep an ASCII fallback.

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
📚 Learning: 2026-04-14T12:43:08.698Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 502
File: booklore-api/src/main/java/org/booklore/service/reader/ChapterCacheService.java:0-0
Timestamp: 2026-04-14T12:43:08.698Z
Learning: For this codebase (booklore-api), target Java 25 with `--enable-preview`, so `_` is intentionally used as an unnamed/ignored variable (e.g., lambda parameter or pattern variable) per Java’s preview feature JEP 456. Do not flag `_` in those contexts as an invalid/reserved identifier; only flag it if it’s used in a non-supported position (e.g., where an unnamed variable is not applicable for the Java preview rules).

Applied to files:

  • backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java
  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
📚 Learning: 2026-04-23T14:49:01.089Z
Learnt from: balazs-szucs
Repo: grimmory-tools/grimmory PR: 790
File: backend/src/main/java/org/booklore/nativelib/NativeLibraries.java:71-77
Timestamp: 2026-04-23T14:49:01.089Z
Learning: In grimmory-tools/grimmory, the `catch (Throwable)` in `NativeLibraries.runProbe()` (`backend/src/main/java/org/booklore/nativelib/NativeLibraries.java`) is intentional design: any failure during native library probing (including fatal JVM errors) is treated as "library unavailable" rather than propagating. Do not flag this as an issue in future reviews.

Applied to files:

  • backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java
🔀 Multi-repo context grimmory-tools/grimmory-docs

Findings for grimmory-tools/grimmory-docs

  • docs/metadata/lubimyczytac.md documents LubimyCzytac as a provider, lists description as one of the returned fields, and explicitly warns that Grimmory uses web scraping and that parsing "may occasionally break if the LubimyCzytac website structure changes." This confirms docs already acknowledge fragility of HTML selectors and is relevant context for the parser selector change. [::grimmory-tools/grimmory-docs::docs/metadata/lubimyczytac.md]

  • docs/metadata/metadata-settings.md mentions Lubimyczytac provider fields (Lubimyczytac ID, Lubimyczytac Rating) and lists the provider among available metadata sources; no code-level consumers or references to specific DOM structure or selectors. [::grimmory-tools/grimmory-docs::docs/metadata/metadata-settings.md]

  • docs/magic-shelf.md references lubimyczytacRating and lubimyczytacId as metadata keys used for filtering; nothing depends on the parser's internal HTML selectors or description extraction specifics. [::grimmory-tools/grimmory-docs::docs/magic-shelf.md]

Conclusion: Documentation acknowledges scraping fragility and lists description as a provided field; no other repositories or docs appear to depend on the previous ".collapse-content" selector. The change in the parser (targeting "#book-description") is consistent with recorded risks in docs; you may optionally update docs/metadata/lubimyczytac.md to note the recent parser fix, but no cross-repo code changes are required.

🔇 Additional comments (2)
backend/src/test/java/org/booklore/service/metadata/parser/LubimyCzytacParserTest.java (1)

190-196: Specify UTF-8 explicitly when reading the HTML fixture.

new String(is.readAllBytes()) decodes with the JVM default charset, which is platform-dependent. The Polish fixture contains UTF-8 multi-byte characters (ą/ę/ł/ź/ć), and a non-UTF-8 default will silently corrupt bytes and can flip the length() == 827 assertion or the substring comparison, producing flaky CI failures on contributor machines.

🔧 Proposed fix
-import java.io.IOException;
-import java.io.InputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
     private String readResourceFile(String filename) throws IOException {
         try (InputStream is = getClass().getClassLoader().getResourceAsStream(filename)) {
             assert is != null;

-            return new String(is.readAllBytes());
+            return new String(is.readAllBytes(), StandardCharsets.UTF_8);
         }
     }
backend/src/main/java/org/booklore/service/metadata/parser/LubimyCzytacParser.java (1)

235-241: LGTM — selector change correctly targets the description.

Switching from .collapse-content to #book-description aligns with the actual DOM structure (the fixture confirms the element is <div id="book-description" class="book__description ...">), so the parser will now return the book's description rather than review content. The surrounding null/blank guards are unchanged, so behavior degrades gracefully if the selector ever stops matching again.

Worth noting (no change required here): the linked docs in grimmory-tools/grimmory-docs (docs/metadata/lubimyczytac.md) already warn that LubimyCzytac scraping "may occasionally break if the LubimyCzytac website structure changes," so this is a known fragility class. Optionally, you could update those docs to mention the recent fix.

@imnotjames imnotjames merged commit 79455ff into grimmory-tools:develop Apr 25, 2026
17 of 18 checks passed
@imnotjames imnotjames deleted the fix/855/lubimyczytac-descriptions branch April 28, 2026 20:46
dsmouse pushed a commit to dsmouse/grimmory that referenced this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lubimyczytac pulls reviews for the book description

2 participants