Skip to content

Add option to escape unescaped ampersands#2218

Merged
gazpachoking merged 18 commits intoFlexget:developfrom
asm0dey:patch-1
Oct 10, 2018
Merged

Add option to escape unescaped ampersands#2218
gazpachoking merged 18 commits intoFlexget:developfrom
asm0dey:patch-1

Conversation

@asm0dey
Copy link
Copy Markdown
Contributor

@asm0dey asm0dey commented Oct 7, 2018

Motivation for changes:

Right now some trackers (e.g. newstudio) do not eascape ampersands with &

Detailed changes:

This commits adds logics to fix that. Option is disabled by default because it adds some processing overhead and needs to go thru whole XML content and then rebuild new XML.

Config usage if relevant (new plugin or updated schema):

tasks:
  Newstudio:
    rss:
      url: '{? newstudio.address ?}'
      escape: yes

Right now some trackers (e.g. newstudio) do not eascape ampersands with `&`
This commits adds logics to fix that. Option is disabled by default because it adds some processing overhead and needs to go thru whole XML content and then rebuild new XML.
elif not in_cdata_block and char == '<' and content[idx:idx+9] == '<![CDATA[':
in_cdata_block = True
elif in_cdata_block and char == ']' and content[idx:idx+3] == ']]>':
should_change = False
Copy link
Copy Markdown
Contributor

@cvium cvium Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really do anything (you probably forgot to change it). You probably want to do it like this instead, since it becomes a little more complex to skip ahead 3 characters while enumerating:

elif in_cdata_block and char == '>' and content[idx-2:idx+1] == ']]>':
     in_cdata_block= False

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment and finding performance issue

char = '&amp;'
elif not in_cdata_block and char == '<' and content[idx:idx+9] == '<![CDATA[':
in_cdata_block = True
elif in_cdata_block and char == ']' and content[idx-2:idx+1] == ']]>':
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be char == '>'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to test on ']' because it can be seen in XML not such often as '>'?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're already inside a CDATA block though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's common practice to put XML inside CDATA

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Oct 8, 2018

It would be nice to have a couple of tests though.

@asm0dey
Copy link
Copy Markdown
Contributor Author

asm0dey commented Oct 8, 2018

I would love to write some, but have no idea how to do it

@asm0dey
Copy link
Copy Markdown
Contributor Author

asm0dey commented Oct 8, 2018

I can write tests in naive form, but don't know how to write them in python

we should test that

escape('') == ''
escape('<![CDATA[]]>') == '<![CDATA[]]>'
escape('<![CDATA[&]]>') == ''<![CDATA[&]]>
escape('&amp;') == '&amp;'
escape('&gt;') == '&gt;'
escape('&&gt;&') == '&amp;&gt;&amp;'
escape('<![CDATA[&amp;]]>') == ''<![CDATA[&amp;]]>

And test should work for those strings, which are used inside flexget

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Oct 8, 2018

@asm0dey
Copy link
Copy Markdown
Contributor Author

asm0dey commented Oct 8, 2018

I do not understand ho they work :( How do they get content on localhost?

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Oct 8, 2018

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Oct 8, 2018

You can just create a new test XML-file and write a few tests in the same vein as the existing ones.

Copy link
Copy Markdown
Member

@gazpachoking gazpachoking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have more luck if you deal only with bytes in this function. Left some notes on how to accomplish that.

@asm0dey
Copy link
Copy Markdown
Contributor Author

asm0dey commented Oct 9, 2018

@gazpachoking is there anything else I can do to get this commit merged?

@gazpachoking gazpachoking merged commit 80fdc98 into Flexget:develop Oct 10, 2018
@asm0dey asm0dey deleted the patch-1 branch October 10, 2018 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants