Skip to content

BK-2435 ContentCleanUpPlugin sometimes removes valid content#842

Merged
eos87 merged 1 commit intobooktype:masterfrom
ride90:BK-2435
Jan 30, 2018
Merged

BK-2435 ContentCleanUpPlugin sometimes removes valid content#842
eos87 merged 1 commit intobooktype:masterfrom
ride90:BK-2435

Conversation

@ride90
Copy link
Copy Markdown
Member

@ride90 ride90 commented Jan 30, 2018

Hi @eos87 ,
This pr is about how sweet lxml is :)

for elem in body.xpath("//*"):

I would expect lxml walk through objects inside body (I hope you also expected it), but it does not. It walks starting from the root (html element), which is wrong for us. Check a screenshot from my debugger:
screen shot 2018-01-30 at 12 09 47
Here you can see what result gives body.xpath("//*") in our context.

parent.remove(elem)

Here we see another lxml's feature. When we execute this command, lxml removes element, let's say an empty <a> and a content after it which belongs to parent. See a screenshot.
screen shot 2018-01-30 at 15 00 39
In this context elem is a reference to an empty <a> tag which is inside <p>, lxml removes an empty tag and rest of the content after it in context of parent's <p>.

Please, test this pr with cases ContentCleanUpPlugin was developed for 🍺 🍺 🍺

@eos87 eos87 merged commit 6cd7290 into booktype:master Jan 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants