fix: Extra content at the end of the document#161
Conversation
## Why? XML with multiple root elements is invalid. See: ruby#160 (comment)
641d9d1 to
4e9de51
Compare
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: ruby#161 (comment)
| @source.position -= "<".bytesize | ||
| end | ||
| if @tags.empty? and @have_root | ||
| if text.strip != "" |
There was a problem hiding this comment.
strip allocates a new string. Can we avoid it?
For example: /\A\s*\z/.match?(text)
## Why? XML with additional content at the end of the document is invalid. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc ``` [27] Misc ::= Comment | PI | S ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI ``` [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget ``` [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) ```
4e9de51 to
c094825
Compare
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: ruby#161 (comment)
|
Thanks. |
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: #161 (comment)
|
@naitoh @kou After this change, parsing of The main cause is this if statement at lib/rexml/parsers/baseparser.rb:498 |
|
|
|
@kou Well, I use socket to get xml messages from the server and parse them using PullParser. Each message is complete and valid. Before change it worked like a charm. Now it doesn't work anymore. |
|
OK. Could you open a new issue for it? |
Why?
XML with additional content at the end of the document is invalid.
https://www.w3.org/TR/2006/REC-xml11-20060816/#document
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget