Releases: lncrawl/lightnovel-crawler
Releases · lncrawl/lightnovel-crawler
v4.8.0
Added
- Job notifications —
JobNotificationServicedispatches email on job state changes (pending → running → success/failure) via a backgroundTaskManager; triggered from handler helpers (_set_running,_set_success, etc.) - Docker healthcheck — server container now exposes a
/healthprobe
Changed
- Job runner refactored into typed handlers —
JobRunnernow dispatches via a_HANDLER_REGISTRYofBaseHandler/BatchHandlersubclasses; each job type has its own module underscheduler/handlers/ - Web app synced before Docker build —
lncrawl-webartifacts are pulled in as part of the Docker build step crawler_versionstamped on novel/chapter updates — upserts now use a merge strategy to preserve existing data
Fixed
- Server hangup — root cause of hang addressed (event lock contention / crawler resource leak)
- Server crash — crawler resource leak on shutdown fixed; Docker healthcheck added
crawl.py(#3030) — regression in CLI crawl flow corrected- Torproxy — re-enabled after an unintended regression
Full Changelog: v4.7.0...v4.8.0
v4.7.0
Added
- Background search jobs — novel search is now a proper background job with two new
JobTypevalues:SEARCH_SOURCE— searches a single crawlable source; trigger viaPOST /api/job/create/search-sources?domain=…SEARCH_ALL_SOURCES— fans out across every searchable source, spawning oneSEARCH_SOURCEchild per source; idempotent on retryJobRunnerhandles execution: results stored injob.extra, matched URLs create aNOVEL_BATCHchild job- New Alembic migration (
add_jobtype) for PostgreSQL compatibility
- PAUSED job status — new
JobStatus.PAUSEDenum value for finer job lifecycle control - Per-tier search-job rate limiting —
BASICusers are capped at 1 concurrent search while the general active-job quota remains independent; search query length validated (2–50 chars); results sorted by match ratio - NovelFire search —
SEARCHcapability added toNovelFireCrawler(#3009)
Changed
- Removed vendored
lncrawl/cloudscraper— the embedded Cloudflare-bypass fork (v1/v2/v3 handlers, captcha integrations, JS interpreters, 7 913-linebrowsers.json) has been removed; HTTP scraping is now delegated to thelncrawl-scraperpackage - JavaScript engine replaced: PyExecJs → quickjs → exejs — lighter dependency, no Node.js or external runtime required
- Proxy support in scraper — the
lncrawl-scraperintegration now supports proxies;build-essentialsadded to Docker base image (#3014) - BrowserTemplate merged into SoupTemplate —
BrowserTemplateis integrated directly into the soup template hierarchy rather than being a standalone class; all browser-based sources refactored accordingly - Job service hardening — event locking and improved update logic in
JobService; request timeouts in the scraper adjusted - Docker improvements — faster image builds; updated
compose.ymland server-compose files; fixed unintended root access inserver-compose - truyenfull: updated domain and search URL;
base_urlchanged to a list to support multiple domains (#3010)
Fixed
- Security — path-traversal / static-file exposure vulnerability fixed in
app.pyandstaticfiles.py(#3005) - katreadingcafe — chapter link validation logic corrected to filter out non-chapter URLs (#3026)
- Race condition — parallel search result aggregation could yield inconsistent data under concurrent writes; fixed with proper locking
- EPUB + NovelFire (#2993):
- Duplicate chapter title and serial number removed from chapter body content
download_chapter_bodyheader extraction improved (regex + text normalisation)- EPUB serial heading logic refactored
- Source loading on restart — a failure loading one source no longer aborts the full reload cycle
- Cover download — full error stack trace suppressed for non-critical cover fetch failures
- PyInstaller packaging (
setup_pyi) — fixed a regression in frozen-binary builds
New Contributors
- @GabrielCWT made their first contribution in #3009
- @augustanational made their first contribution in #3010
- @templeofshadow made their first contribution in #3026
Full Changelog: v4.6.0...v4.7.0
v4.6.0
What's Changed
New Features
- Novel Recommendations — the server now suggests related novels based on what you're reading
- Machine Translation — full translation service with multiple backends (Bing, Google, Lingva, Baidu) with automatic failover; translates chapter content, chapter titles, and artifacts (EPUB/etc.)
- Granular translation job types — translation tasks are now split per-resource (chapter, volume, title) instead of one monolithic
TRANSLATIONjob, giving finer progress tracking - Referral / invite system — users can invite others via email with a referral link
- Expanded browser detection — Brave, Vivaldi, Yandex, and Whale are now recognized for app-mode launching alongside Chrome/Edge
- More supported translation languages
Improvements
- Browser automation migrated from Selenium to nodriver for more reliable JS-rendered site scraping
- Switched to a custom caching layer instead of
cachetoolsfor better control - Announcement banners improved in the web UI
- Chapter body cleaning improved when downloading
- User activity tracking added (page visits, static file downloads)
- Webview fallback now shows just the terminal when no app-mode browser is found
- Tightened API access control; auth guards now use
Security()instead ofDepends() - Removed initial content when a language is pre-defined
- Invitation email subject line updated
Bug Fixes
- Fixed SQLite compatibility issue with migrations (
batch_alter_table) - Fixed Calibre-based artifact generation when using translations
- Fixed searching regression
- Fixed chapter fetch/translate functions not passing user ID correctly
- Fixed
select_descendantstypo in security module (#2966) - Fixed invalid URL exceptions crashing
fetch_chapterandfetch_image - Fixed
ensure_loadcrashing when sync thread was already cleaned up - Fixed app startup issues
Source Updates
- wtr-lab.com — multiple fixes and updates
- novelfire.py — several iterative fixes
- Chapter title tag removal extended to
<h4>elements - More sources flagged as rejected/inactive in the index
Internal / Infrastructure
lncrawl-webis no longer a git submodule; web build artifacts are bundled directly- Removed deprecated
fetch-novelAPI endpoint (replaced byfetch-novels) - Python 3.15 excluded from psycopg test matrix (not yet supported upstream)
server-compose.ymlupdated
Full diff: v4.5.0...v4.6.0
v4.5.0
What's Changed
Bug Fixes
- Fix crash when downloading novels with more than 9 volumes (#2970)
- Fix artifact download failing with
400 Bad Requestwhen filename contains%(#2963) - Fix PostgreSQL database connection broken since v4.2.1 (#2981)
- Fix storage path directory not being created before writing URL in
_build_url - Add MIME type handling for file responses in the web server
- Fix browser detection on Flatpak environments
New Features
- Windows installer: Added Inno Setup-based installer (
lncrawl.exe) for proper install/uninstall on Windows - Fallback browser window: When Chrome/Edge is not found, a tkinter window with the app icon is used as fallback
- Faster Windows startup: Switched to
--onedirmode on Windows (vs--onefileon Mac/Linux) for quicker launch - Added explicit
appsubcommand to CLI for launching the webview directly - Improved URL building in webview server
New Sources
- Added novelfrance.fr (#2946)
Updated Sources
- Updated wattpad.com (#2983)
Internal Changes
- Refactored LSP session management and source synchronization logic
- Enhanced LSP configuration and logging; updated dependencies
- Fixed
ruffformat command syntax in lint workflow
Full Changelog: v4.4.0...v4.5.0
v4.4.2
Generate source index
v4.4.1
v4.4.0
What's Changed
New Features
- LSP server: Implemented a built-in Python Language Server (
pylsp) for source code editing, with improved readiness checks and restart logic - Source management API: Added API endpoints for source code retrieval, management, and live testing directly from the web UI
- GitHub integration: Added GitHub token management and enhanced
GitHubClientfor fetching/editing remote source files; added remote edit link per source - Source testing for admins: Admin role check and expanded source testing functionality; non-admins receive a proper error when attempting to run modified source code
- Domain endpoint: New endpoint to retrieve a source item by domain;
extract_hostutility for reliable domain extraction in novel creation PageSoup.prettify: Addedprettifymethod toPageSoupfor cleaner debug output in crawler testsdevMakefile target: Newmake devtarget added;watchdependency updated- Pyright type checking: Added Pyright static analysis to the lint CI workflow
Bug Fixes
- Fixed app launch inside the webview (#2942 — also fixes webview not starting on Windows, and UV path in Makefile on Windows)
- Fixed empty chapter bodies produced by the NovelFull template
- Fixed
novelbinand related NovelFull-based sources - Fixed chapter list and chapter body parsing in the
novelightsource - Fixed executor initialization in
CentralNovelCrawler - Fixed port extraction in
extract_hostwhen the port value isNone
Improvements
- Faster startup: Refactored initialization path to make CLI/server startup significantly faster
- Chapter sync:
ChapterService.syncnow preservesis_doneflag and mergesextrasrather than overwriting - BrowserTemplate: Fallback browser now runs in headless mode
- TaskManager: Refactored to manage progress bars internally; removed unused proxy module
- EPUB metadata: Corrected group position handling in EPUB metadata (#2905)
- TextCleaner / Webfic: Enhanced text cleaning and Webfic source processing
- Crawler versioning: Updated versioning logic;
process_infonow captures commit time - PR models: Refactored PR creation models, added PR fetch endpoint, improved error handling and formatting
- Type hints: Improved type hint consistency across models, config,
json_tools, and scripts - User index: Optimized user index file handling
- History limit: Added configurable history limit to project setup
Source Updates
royalroad.com— updated (×2)novelcool.com— updated (×2)freewebnovel— updatedasianovel.net— updated
Dependency Updates
pyease-grpc→1.8.0mako→1.3.12(#2950)- Added
urllib3version constraint - Updated Dockerfile to sync all extras and groups during build
- Updated license metadata in
pyproject.toml
Full Changelog: v4.3.2...v4.4.0
v4.3.2
Full Changelog: v4.3.1...v4.3.2
v4.3.1
- Updated the version from 4.3.0 to 4.3.1.
- Modified the WebView initialization to persist cookies and storage under the APP_DIR, improving user experience and data management.
Full Changelog: v4.3.0...v4.3.1
v4.3.0
What's Changed
- Refactor core components and enhance crawling functionality by @dipu-bd in #2910
- Bump mako from 1.3.10 to 1.3.11 by @dependabot[bot] in #2927
- Bump cryptography from 46.0.6 to 46.0.7 by @dependabot[bot] in #2918
- fix: fenrirealm.com crawler broken after site migration to SvelteKit by @pathsny in #2928
- fix: update skydemonorder crawler for Livewire migration by @josegonzalez in #2935
- Bump lxml from 6.0.2 to 6.1.0 by @dependabot[bot] in #2931
- Update server configuration and improve database handling
- Changed the default server port from 8080 to 8181 in the Docker Compose configuration and server command.
- Enhanced the database connection handling by using
engine.begin()for transactions. - Updated the database schema verification method to improve clarity and logging.
- Refactored EPUB generation logic to ensure proper item addition to the book structure.
- Adjusted the HTML parsing logic in the Freewebnovel template for better selector usage.
New Contributors
Full Changelog: v4.2.1...v4.3.0