Conversation
|
You want to dynamically enable |
|
I am just looking for more fine-grained control over the parser and no changes to the tokenizer are required. I updated the parser to allow the option to recognize self-closing tags and CDATA while still in "HTML mode". The issue was that "xmlMode" was a single option that was being used to control a lot of different parsing behaviors. In my case, I have HTML files that may include XML constructs such as self-closing tags and CDATA sections. The changes I made will not impact existing code at all because there will only be an impact on parsing if the new options are enabled. As a separate fix, I fixed the bug where the "lowerCaseTags" and "lowerCaseAttributeNames" options were not working as expected. I pushed to the same branch so that commit was merged into this Pull Request. Please let me know if you need more clarification. Thanks for looking into this Pull Request |
|
What I was saying is this: If you don't want dynamic |
|
I'll make the change and add a new commit. Thanks. |
|
I have updated my Pull Request with the suggested change. Please review and let me know if you find any issues. Thanks. |
lib/Parser.js
Outdated
There was a problem hiding this comment.
This fails if options is undefined (have a look at the failing test). Also, I would prefer a solution that accepts all truthy values, not only true (the property should be a bool, though). The in operator is probably the better choice here as well.
|
I think we are close. I updated the code to handle the case where options is null and verified that tests passed. I also updated the code to allow truthy values for the |
|
I reviewed the changes independently and they look good. I like the fine-grain control over XML-like parsing behavior instead of having only a single "xmlMode" flag. The only line that didn't need to be changed was "this._options = options || {};" (which was split into two lines). Are there any other changes that need to be made before this can be merged? |
|
I don't really understand why the |
|
The reason var parser = new htmlparser.Parser(handlers, {
lowerCaseTags: false
});The user would expect the ("lowerCaseTags" in options) === trueThe following expression will result in the correct boolean value: (!!options.lowerCaseTags) === falseFor your second point, why would you want to avoid polymorphic |
lib/Parser.js
Outdated
There was a problem hiding this comment.
this._lowerCaseAttributeNames = "lowerCaseAttributeNames" in this._options ? !!this._options.lowerCaseAttributeNames : !this._options.xmlMode;|
I updated the code to use the Feel free to merge my change and make any additional syntactic changes if you think they're necessary. My end-goal is to have additional control over parsing behavior in HTML parsing mode via the new options. Thanks. |
lib/Parser.js
Outdated
There was a problem hiding this comment.
Just move this to a single line & I'll merge.
|
It would also be great if you could update the wiki page to reflect these changes. |
|
Please review the latest change with the options init code merged into a single line. |
|
Also, I think it would be beneficial to move the documentation on the Wiki into the README so that the documentation is versioned with the code. Thoughts? Would you like me to make that change? |
|
Merged and done! Thanks a lot, @patrick-steele-idem! |
|
It would probably be beneficial to move the documentation to the module, but eg. the cheerio docs link to the page & I doubt it's worth the effort. |
To allow a hybrid approach to parsing HTML code that may have certain XML constructs, a few more options have been added:
truethen self-closing tags will result in the tag being closed even ifxmlModeis not set totruetruethen CDATA text will result in thecontextevent being fired even ifxmlModeis not set totrueAlso, the
lowerCaseTagsandlowerCaseAttributeNamesoptions have been fixed so that case conversion can be disabled in non-XML mode.