Fixes Epub.py and NovelFire.py#2993
Conversation
Added logic to remove chapter number and duplicate title from chapter content.
Removed the chapter serial display from the HTML output.
Refactor NovelFireCrawler to streamline chapter downloading and novel information extraction.
Reordered import statement and added serial heading to chapter content.
Added a normalization function to standardize text for fuzzy matching, improving chapter title comparison.
Removed the _normalize function and its usage for chapter title normalization.
Refactor download_chapter_body to remove leading chapter titles.
Refactor download_chapter_body to improve header handling and add regex checks for chapter titles.
| serial_heading = ( | ||
| "" | ||
| if re.search(r"\d", chapter.title) | ||
| else f'<h4 style="opacity: 0.8">#{chapter.serial}</h4>' |
There was a problem hiding this comment.
you can set aria-hidden="true" attribute in h4 intead of hiding it. since your case is just to not let tts speak it out loud.
There was a problem hiding this comment.
serial_heading = (
""
if re.search(r"\d", chapter.title)
else f'
#{chapter.serial}
')
There was a problem hiding this comment.
i want to show the serial even if chapter title contains it. otherwise this breaks consistency. some chapter will have the serial, some won't.
There was a problem hiding this comment.
Got ya that's fine. I can always just edit them out on my on
There was a problem hiding this comment.
This has been stale for a while. As I have requested, keep the serial but put aria-hidden="true" attribute in the h4 tag.
Or, you can remove changes to the file, and we can merge the novelfire fix
* Enhance chapter body download by cleaning content Added logic to remove chapter number and duplicate title from chapter content. * Update novelfire.py * Update novelfire.py * Update novelfire.py * Remove chapter serial from EPUB header * Update novelfire.py * Remove chapter serial from HTML output Removed the chapter serial display from the HTML output. * Update novelfire.py Refactor NovelFireCrawler to streamline chapter downloading and novel information extraction. * Update novelfire.py * Update chapter title removal logic in download_chapter_body * Update chapter title removal to handle h4 tags * Refactor epub.py for import order and chapter heading Reordered import statement and added serial heading to chapter content. * Implement text normalization for chapter title matching Added a normalization function to standardize text for fuzzy matching, improving chapter title comparison. * Remove unused _normalize function and related code Removed the _normalize function and its usage for chapter title normalization. * Update novelfire.py * Refactor chapter body download function Refactor download_chapter_body to remove leading chapter titles. * Update novelfire.py * Refactor download_chapter_body for header extraction Refactor download_chapter_body to improve header handling and add regex checks for chapter titles. * Fix comments and update logger info formatting * Refactor serial heading logic in epub.py * Update epub.py * Update novelfire.py * Update novelfire.py * Update novelfire.py * Update epub.py --------- Co-authored-by: Sudipto Chandra <dipu.sudipta@gmail.com>
Removed the # serial number heading from chapters that already have a number in their title. Specifically:
The original code always added
#{chapter.serial}
above every chapter title. Changeged it so it only adds that line if the chapter title contains no numbers. So:"Chapter 1 Sunny" → has a number → no #1 added
"Sunny" → no number → #1 gets added
That way sites where the title already includes the chapter number won't get the duplicate #1, but sites where the title is just a plain name still get the serial number shown.
Actual fix to NovelFire now removes duplicate titles (won't work if the title repeats 3 times(tested); the last code only worked for some