Skip to content

Fixes Epub.py and NovelFire.py#2993

Merged
dipu-bd merged 26 commits into
lncrawl:devfrom
TheMr-Fool:dev
May 31, 2026
Merged

Fixes Epub.py and NovelFire.py#2993
dipu-bd merged 26 commits into
lncrawl:devfrom
TheMr-Fool:dev

Conversation

@TheMr-Fool

@TheMr-Fool TheMr-Fool commented May 22, 2026

Copy link
Copy Markdown
Contributor

Removed the # serial number heading from chapters that already have a number in their title. Specifically:
The original code always added

#{chapter.serial}

above every chapter title. Changeged it so it only adds that line if the chapter title contains no numbers. So:

"Chapter 1 Sunny" → has a number → no #1 added
"Sunny" → no number → #1 gets added

That way sites where the title already includes the chapter number won't get the duplicate #1, but sites where the title is just a plain name still get the serial number shown.

Actual fix to NovelFire now removes duplicate titles (won't work if the title repeats 3 times(tested); the last code only worked for some

TheMr-Fool and others added 19 commits May 21, 2026 17:41
Added logic to remove chapter number and duplicate title from chapter content.
Removed the chapter serial display from the HTML output.
Refactor NovelFireCrawler to streamline chapter downloading and novel information extraction.
Reordered import statement and added serial heading to chapter content.
Added a normalization function to standardize text for fuzzy matching, improving chapter title comparison.
Removed the _normalize function and its usage for chapter title normalization.
Refactor download_chapter_body to remove leading chapter titles.
Refactor download_chapter_body to improve header handling and add regex checks for chapter titles.
Comment thread lncrawl/services/binder/epub.py Outdated
serial_heading = (
""
if re.search(r"\d", chapter.title)
else f'<h4 style="opacity: 0.8">#{chapter.serial}</h4>'

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can set aria-hidden="true" attribute in h4 intead of hiding it. since your case is just to not let tts speak it out loud.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serial_heading = (
""
if re.search(r"\d", chapter.title)
else f'

#{chapter.serial}

'
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to show the serial even if chapter title contains it. otherwise this breaks consistency. some chapter will have the serial, some won't.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got ya that's fine. I can always just edit them out on my on

@dipu-bd dipu-bd May 30, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been stale for a while. As I have requested, keep the serial but put aria-hidden="true" attribute in the h4 tag.

Or, you can remove changes to the file, and we can merge the novelfire fix

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@TheMr-Fool TheMr-Fool requested a review from dipu-bd May 25, 2026 21:39

@TheMr-Fool TheMr-Fool left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dipu-bd dipu-bd merged commit 4870870 into lncrawl:dev May 31, 2026
6 checks passed
dipu-bd added a commit that referenced this pull request Jun 12, 2026
* Enhance chapter body download by cleaning content

Added logic to remove chapter number and duplicate title from chapter content.

* Update novelfire.py

* Update novelfire.py

* Update novelfire.py

* Remove chapter serial from EPUB header

* Update novelfire.py

* Remove chapter serial from HTML output

Removed the chapter serial display from the HTML output.

* Update novelfire.py

Refactor NovelFireCrawler to streamline chapter downloading and novel information extraction.

* Update novelfire.py

* Update chapter title removal logic in download_chapter_body

* Update chapter title removal to handle h4 tags

* Refactor epub.py for import order and chapter heading

Reordered import statement and added serial heading to chapter content.

* Implement text normalization for chapter title matching

Added a normalization function to standardize text for fuzzy matching, improving chapter title comparison.

* Remove unused _normalize function and related code

Removed the _normalize function and its usage for chapter title normalization.

* Update novelfire.py

* Refactor chapter body download function

Refactor download_chapter_body to remove leading chapter titles.

* Update novelfire.py

* Refactor download_chapter_body for header extraction

Refactor download_chapter_body to improve header handling and add regex checks for chapter titles.

* Fix comments and update logger info formatting

* Refactor serial heading logic in epub.py

* Update epub.py

* Update novelfire.py

* Update novelfire.py

* Update novelfire.py

* Update epub.py

---------

Co-authored-by: Sudipto Chandra <dipu.sudipta@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants