This may be related to #100 (at least: the issue looks similar?):
If you parse an empty string, the .plain_text() version of it is, of course, an empty string. That's good.
>>> parsed = wikitextparser.parse('')
>>> parsed.plain_text()
''
But if you parse text with elements and then remove all those elements, so that you're left with an empty string, .plain_text() throws an exception. That's not so good.
>>> parsed = wikitextparser.parse('<ref>Test</ref>')
>>> for ref in parsed.get_tags('ref'): del ref[:]
...
>>> parsed
WikiText('')
>>> parsed.string
''
>>> parsed.plain_text()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[...]/wikitextparser/_wikitext.py", line 590, in plain_text
lst = list(parsed.string)
^^^^^^^^^^^^^
File "[...]/wikitextparser/_wikitext.py", line 399, in string
start, end, _, _ = self._span_data
^^^^^^^^^^^^^^^
AttributeError: 'WikiText' object has no attribute '_span_data'
I poked around the code a little bit; it looks like the warning on _del_update that "this function can cause data loss in self._type_to_spans" isn't kidding. The _type_to_spans on parse('') has [0, 0, None, bytearray(b'')] in its "WikiText" list; in the parsed-with-everything-removed object, WikiText is an empty list. (Which means, in the plain_text() function, the for span_data in tts[self._type] loop is iterating over an empty list, so the code falls out of that whole initial if-block without ever having set _span_data on the new parsed object it created.)
This might be as simple as having an else block on that for loop that sets some sort of default? Or it might involve wanting to ensure that deleting the only object inside a WikiText object doesn't leave its _type_to_spans['WikiText'] empty.
This may be related to #100 (at least: the issue looks similar?):
If you parse an empty string, the
.plain_text()version of it is, of course, an empty string. That's good.But if you parse text with elements and then remove all those elements, so that you're left with an empty string,
.plain_text()throws an exception. That's not so good.I poked around the code a little bit; it looks like the warning on
_del_updatethat "this function can cause data loss in self._type_to_spans" isn't kidding. The_type_to_spansonparse('')has[0, 0, None, bytearray(b'')]in its "WikiText" list; in the parsed-with-everything-removed object, WikiText is an empty list. (Which means, in theplain_text()function, thefor span_data in tts[self._type]loop is iterating over an empty list, so the code falls out of that whole initial if-block without ever having set_span_dataon the newparsedobject it created.)This might be as simple as having an
elseblock on thatforloop that sets some sort of default? Or it might involve wanting to ensure that deleting the only object inside a WikiText object doesn't leave its_type_to_spans['WikiText']empty.