Hi Pygments team
I'll start by saying that I use pygmentize for probably more than I should, but it's huge value to me. I currently have less aliased to to catfilter which runs pygmentize based on some manual properties (e.g., forcing certain lint formats based on file extension).
I noticed that when format autodetecting is on, the filter runs slow:
time pygmentize -l py generate.py > /dev/null
0.07s user 0.00s system 99% cpu 0.075 total
time pygmentize generate.py >/dev/null
0.41s user 0.03s system 99% cpu 0.436 total
I asked a junior recruit, @Blackwolf499 to see if he could figure out what was taking so long. He determined with strace that there were a huge amount of libraries scanned / imported from python path. I have a fast SSD, but suspect this is an IO issue caused by that.
Together, we then stepped through code, and found that the speed issue was caused by https://github.com/pygments/pygments/blob/master/pygments/plugin.py#L49 , which subsequently calls pkg_resources (possibly std python lib, Idk). We were able to derive a poc which reproduces the issue:
import pkg_resources
group_name = 'pygments.lexers'
for i in pkg_resources.iter_entry_points(group_name): # This line costs about 130ms
print(i)
print(i.load()) # This line costs about 230ms
The above is based on my i9 8core w/ a Samsung 2TB pro ssd, with ipython, ipython3 and ipythonconsole installed.
I'm unclear on exactly why a filesystem scan has to occur, I suspect to scan+load lexers which must be loaded in order to auto-determine which linting format to be used. As such, I'm unsure on how to go about fixing this and think it might require more context/knowledge about pygments than I have.
I'm probably not in a position to recommend solutions, but one/more of:
- some form of caching
- the ability to disable plugin scanning, or hardcode plugin paths (e.g., some form of manually managed caching, such as
pygmentize --plugin-scan > plugints.txt; pygmentize --plugin-scan-path plugins.txt test.py)
- reviewing the actual purpose / necessity of the pkg_resources dep - I have no understanding of why it's used, but possibly alternatives are faster?
I don't have a lot of time personally, but would really appreciate this fixed as it would save me a small amount of time and huge amount of frustration. Happy to throw a $50 donation or something at the project to see it fixed if that helps :).
Hi Pygments team
I'll start by saying that I use pygmentize for probably more than I should, but it's huge value to me. I currently have less aliased to to
catfilterwhich runs pygmentize based on some manual properties (e.g., forcing certain lint formats based on file extension).I noticed that when format autodetecting is on, the filter runs slow:
I asked a junior recruit, @Blackwolf499 to see if he could figure out what was taking so long. He determined with strace that there were a huge amount of libraries scanned / imported from python path. I have a fast SSD, but suspect this is an IO issue caused by that.
Together, we then stepped through code, and found that the speed issue was caused by https://github.com/pygments/pygments/blob/master/pygments/plugin.py#L49 , which subsequently calls pkg_resources (possibly std python lib, Idk). We were able to derive a poc which reproduces the issue:
The above is based on my i9 8core w/ a Samsung 2TB pro ssd, with ipython, ipython3 and ipythonconsole installed.
I'm unclear on exactly why a filesystem scan has to occur, I suspect to scan+load lexers which must be loaded in order to auto-determine which linting format to be used. As such, I'm unsure on how to go about fixing this and think it might require more context/knowledge about pygments than I have.
I'm probably not in a position to recommend solutions, but one/more of:
pygmentize --plugin-scan > plugints.txt; pygmentize --plugin-scan-path plugins.txt test.py)I don't have a lot of time personally, but would really appreciate this fixed as it would save me a small amount of time and huge amount of frustration. Happy to throw a $50 donation or something at the project to see it fixed if that helps :).