Skip to content

Releases: lncrawl/lightnovel-crawler

v4.8.0

13 Jun 20:48

Choose a tag to compare

Added

  • Job notificationsJobNotificationService dispatches email on job state changes (pending → running → success/failure) via a background TaskManager; triggered from handler helpers (_set_running, _set_success, etc.)
  • Docker healthcheck — server container now exposes a /health probe

Changed

  • Job runner refactored into typed handlersJobRunner now dispatches via a _HANDLER_REGISTRY of BaseHandler/BatchHandler subclasses; each job type has its own module under scheduler/handlers/
  • Web app synced before Docker buildlncrawl-web artifacts are pulled in as part of the Docker build step
  • crawler_version stamped on novel/chapter updates — upserts now use a merge strategy to preserve existing data

Fixed

  • Server hangup — root cause of hang addressed (event lock contention / crawler resource leak)
  • Server crash — crawler resource leak on shutdown fixed; Docker healthcheck added
  • crawl.py (#3030) — regression in CLI crawl flow corrected
  • Torproxy — re-enabled after an unintended regression

Full Changelog: v4.7.0...v4.8.0

v4.7.0

12 Jun 17:00

Choose a tag to compare

Added

  • Background search jobs — novel search is now a proper background job with two new JobType values:
    • SEARCH_SOURCE — searches a single crawlable source; trigger via POST /api/job/create/search-sources?domain=…
    • SEARCH_ALL_SOURCES — fans out across every searchable source, spawning one SEARCH_SOURCE child per source; idempotent on retry
    • JobRunner handles execution: results stored in job.extra, matched URLs create a NOVEL_BATCH child job
    • New Alembic migration (add_jobtype) for PostgreSQL compatibility
  • PAUSED job status — new JobStatus.PAUSED enum value for finer job lifecycle control
  • Per-tier search-job rate limitingBASIC users are capped at 1 concurrent search while the general active-job quota remains independent; search query length validated (2–50 chars); results sorted by match ratio
  • NovelFire searchSEARCH capability added to NovelFireCrawler (#3009)

Changed

  • Removed vendored lncrawl/cloudscraper — the embedded Cloudflare-bypass fork (v1/v2/v3 handlers, captcha integrations, JS interpreters, 7 913-line browsers.json) has been removed; HTTP scraping is now delegated to the lncrawl-scraper package
  • JavaScript engine replaced: PyExecJs → quickjs → exejs — lighter dependency, no Node.js or external runtime required
  • Proxy support in scraper — the lncrawl-scraper integration now supports proxies; build-essentials added to Docker base image (#3014)
  • BrowserTemplate merged into SoupTemplateBrowserTemplate is integrated directly into the soup template hierarchy rather than being a standalone class; all browser-based sources refactored accordingly
  • Job service hardening — event locking and improved update logic in JobService; request timeouts in the scraper adjusted
  • Docker improvements — faster image builds; updated compose.yml and server-compose files; fixed unintended root access in server-compose
  • truyenfull: updated domain and search URL; base_url changed to a list to support multiple domains (#3010)

Fixed

  • Security — path-traversal / static-file exposure vulnerability fixed in app.py and staticfiles.py (#3005)
  • katreadingcafe — chapter link validation logic corrected to filter out non-chapter URLs (#3026)
  • Race condition — parallel search result aggregation could yield inconsistent data under concurrent writes; fixed with proper locking
  • EPUB + NovelFire (#2993):
    • Duplicate chapter title and serial number removed from chapter body content
    • download_chapter_body header extraction improved (regex + text normalisation)
    • EPUB serial heading logic refactored
  • Source loading on restart — a failure loading one source no longer aborts the full reload cycle
  • Cover download — full error stack trace suppressed for non-critical cover fetch failures
  • PyInstaller packaging (setup_pyi) — fixed a regression in frozen-binary builds

New Contributors

Full Changelog: v4.6.0...v4.7.0

v4.6.0

29 May 14:54

Choose a tag to compare

What's Changed

New Features

  • Novel Recommendations — the server now suggests related novels based on what you're reading
  • Machine Translation — full translation service with multiple backends (Bing, Google, Lingva, Baidu) with automatic failover; translates chapter content, chapter titles, and artifacts (EPUB/etc.)
  • Granular translation job types — translation tasks are now split per-resource (chapter, volume, title) instead of one monolithic TRANSLATION job, giving finer progress tracking
  • Referral / invite system — users can invite others via email with a referral link
  • Expanded browser detection — Brave, Vivaldi, Yandex, and Whale are now recognized for app-mode launching alongside Chrome/Edge
  • More supported translation languages

Improvements

  • Browser automation migrated from Selenium to nodriver for more reliable JS-rendered site scraping
  • Switched to a custom caching layer instead of cachetools for better control
  • Announcement banners improved in the web UI
  • Chapter body cleaning improved when downloading
  • User activity tracking added (page visits, static file downloads)
  • Webview fallback now shows just the terminal when no app-mode browser is found
  • Tightened API access control; auth guards now use Security() instead of Depends()
  • Removed initial content when a language is pre-defined
  • Invitation email subject line updated

Bug Fixes

  • Fixed SQLite compatibility issue with migrations (batch_alter_table)
  • Fixed Calibre-based artifact generation when using translations
  • Fixed searching regression
  • Fixed chapter fetch/translate functions not passing user ID correctly
  • Fixed select_descendants typo in security module (#2966)
  • Fixed invalid URL exceptions crashing fetch_chapter and fetch_image
  • Fixed ensure_load crashing when sync thread was already cleaned up
  • Fixed app startup issues

Source Updates

  • wtr-lab.com — multiple fixes and updates
  • novelfire.py — several iterative fixes
  • Chapter title tag removal extended to <h4> elements
  • More sources flagged as rejected/inactive in the index

Internal / Infrastructure

  • lncrawl-web is no longer a git submodule; web build artifacts are bundled directly
  • Removed deprecated fetch-novel API endpoint (replaced by fetch-novels)
  • Python 3.15 excluded from psycopg test matrix (not yet supported upstream)
  • server-compose.yml updated

Full diff: v4.5.0...v4.6.0

v4.5.0

20 May 21:00

Choose a tag to compare

What's Changed

Bug Fixes

  • Fix crash when downloading novels with more than 9 volumes (#2970)
  • Fix artifact download failing with 400 Bad Request when filename contains % (#2963)
  • Fix PostgreSQL database connection broken since v4.2.1 (#2981)
  • Fix storage path directory not being created before writing URL in _build_url
  • Add MIME type handling for file responses in the web server
  • Fix browser detection on Flatpak environments

New Features

  • Windows installer: Added Inno Setup-based installer (lncrawl.exe) for proper install/uninstall on Windows
  • Fallback browser window: When Chrome/Edge is not found, a tkinter window with the app icon is used as fallback
  • Faster Windows startup: Switched to --onedir mode on Windows (vs --onefile on Mac/Linux) for quicker launch
  • Added explicit app subcommand to CLI for launching the webview directly
  • Improved URL building in webview server

New Sources

Updated Sources

Internal Changes

  • Refactored LSP session management and source synchronization logic
  • Enhanced LSP configuration and logging; updated dependencies
  • Fixed ruff format command syntax in lint workflow

Full Changelog: v4.4.0...v4.5.0

v4.4.2

20 May 19:34

Choose a tag to compare

Generate source index

v4.4.1

16 May 07:49

Choose a tag to compare

Generate source index

v4.4.0

16 May 00:27

Choose a tag to compare

What's Changed

New Features

  • LSP server: Implemented a built-in Python Language Server (pylsp) for source code editing, with improved readiness checks and restart logic
  • Source management API: Added API endpoints for source code retrieval, management, and live testing directly from the web UI
  • GitHub integration: Added GitHub token management and enhanced GitHubClient for fetching/editing remote source files; added remote edit link per source
  • Source testing for admins: Admin role check and expanded source testing functionality; non-admins receive a proper error when attempting to run modified source code
  • Domain endpoint: New endpoint to retrieve a source item by domain; extract_host utility for reliable domain extraction in novel creation
  • PageSoup.prettify: Added prettify method to PageSoup for cleaner debug output in crawler tests
  • dev Makefile target: New make dev target added; watch dependency updated
  • Pyright type checking: Added Pyright static analysis to the lint CI workflow

Bug Fixes

  • Fixed app launch inside the webview (#2942 — also fixes webview not starting on Windows, and UV path in Makefile on Windows)
  • Fixed empty chapter bodies produced by the NovelFull template
  • Fixed novelbin and related NovelFull-based sources
  • Fixed chapter list and chapter body parsing in the novelight source
  • Fixed executor initialization in CentralNovelCrawler
  • Fixed port extraction in extract_host when the port value is None

Improvements

  • Faster startup: Refactored initialization path to make CLI/server startup significantly faster
  • Chapter sync: ChapterService.sync now preserves is_done flag and merges extras rather than overwriting
  • BrowserTemplate: Fallback browser now runs in headless mode
  • TaskManager: Refactored to manage progress bars internally; removed unused proxy module
  • EPUB metadata: Corrected group position handling in EPUB metadata (#2905)
  • TextCleaner / Webfic: Enhanced text cleaning and Webfic source processing
  • Crawler versioning: Updated versioning logic; process_info now captures commit time
  • PR models: Refactored PR creation models, added PR fetch endpoint, improved error handling and formatting
  • Type hints: Improved type hint consistency across models, config, json_tools, and scripts
  • User index: Optimized user index file handling
  • History limit: Added configurable history limit to project setup

Source Updates

  • royalroad.com — updated (×2)
  • novelcool.com — updated (×2)
  • freewebnovel — updated
  • asianovel.net — updated

Dependency Updates

  • pyease-grpc1.8.0
  • mako1.3.12 (#2950)
  • Added urllib3 version constraint
  • Updated Dockerfile to sync all extras and groups during build
  • Updated license metadata in pyproject.toml

Full Changelog: v4.3.2...v4.4.0

v4.3.2

06 May 14:26

Choose a tag to compare

Full Changelog: v4.3.1...v4.3.2

v4.3.1

06 May 11:35

Choose a tag to compare

  • Updated the version from 4.3.0 to 4.3.1.
  • Modified the WebView initialization to persist cookies and storage under the APP_DIR, improving user experience and data management.

Full Changelog: v4.3.0...v4.3.1

v4.3.0

06 May 06:47

Choose a tag to compare

What's Changed

  • Refactor core components and enhance crawling functionality by @dipu-bd in #2910
  • Bump mako from 1.3.10 to 1.3.11 by @dependabot[bot] in #2927
  • Bump cryptography from 46.0.6 to 46.0.7 by @dependabot[bot] in #2918
  • fix: fenrirealm.com crawler broken after site migration to SvelteKit by @pathsny in #2928
  • fix: update skydemonorder crawler for Livewire migration by @josegonzalez in #2935
  • Bump lxml from 6.0.2 to 6.1.0 by @dependabot[bot] in #2931
  • Update server configuration and improve database handling
    • Changed the default server port from 8080 to 8181 in the Docker Compose configuration and server command.
    • Enhanced the database connection handling by using engine.begin() for transactions.
    • Updated the database schema verification method to improve clarity and logging.
    • Refactored EPUB generation logic to ensure proper item addition to the book structure.
    • Adjusted the HTML parsing logic in the Freewebnovel template for better selector usage.

New Contributors

Full Changelog: v4.2.1...v4.3.0