I am using html2text to reduce the output of plain_text to actual plain text. It works great, but notice stuff like this is not getting eliminated. Not sure if a problem with wikitextparser or html2text or user error? See minimum reproducible example with my workaround hack below:
[feacluster@micro wikipedia]$ cat test.py
text = """
==Comparison of green, teal, blue and ultramarine ==
{| class="wikitable sortable" style="width:100%"
|-
!Name
!width=100|Color
!HEX Code
!Red
!Green
!Blue
!Hue
!Sat
!Lum
|[[Ultramarine]] (Electric Ultramarine)
|style = "background-color: #3f00ff; color: #ffffff"|
|#3F00FF
|63
|0
|255
|255°
|100%
|100%
|}
"""
import wikitextparser as wtp
from html2text import html2text as htt
import re
text = wtp.parse(text).plain_text()
text = htt(text)
print ( text )
text = re.sub( r'{[^}]*}', '', text) # erase everything in curly braces
print ( text )
[feacluster@micro wikipedia]$ python3 test.py
==Comparison of green, teal, blue and ultramarine == {| class="wikitable
sortable" style="width:100%" |- !Name !width=100|Color !HEX Code !Red !Green
!Blue !Hue !Sat !Lum |Ultramarine (Electric Ultramarine) |style = "background-
color: #3f00ff; color: #ffffff"| |#3F00FF |63 |0 |255 |255° |100% |100% |}
==Comparison of green, teal, blue and ultramarine ==
[feacluster@micro wikipedia]$
I am using html2text to reduce the output of plain_text to actual plain text. It works great, but notice stuff like this is not getting eliminated. Not sure if a problem with wikitextparser or html2text or user error? See minimum reproducible example with my workaround hack below: