-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add dependency for JS parser #2534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2534 +/- ##
=======================================
Coverage 52.56% 52.56%
=======================================
Files 239 239
Lines 15118 15118
=======================================
Hits 7947 7947
Misses 7171 7171 |
|
Thanks for bringing this up for discussion @vstavrinov! I'm not super familiar with the various js parser options for Python, so I'd just like to make sure we do our due diligence in selecting one:
|
|
They still accept pull-requests. If there are no maintenance we can take it over and make our own fork when it will makes sense. Other issues we have already discussed. You can add more details if consider it worth of that. And finally, feel free to propose an alternative and we will discuss Your proposal here. P.S. As for esprima itself, it looks like standard solution (I have already mentioned this), concern may touch it's python implementation only. I think it worth to take care about this issue. |
|
is there any real use case for a popular plugin? |
|
I suggested that a utility function for parsing things that are not quite JSON objects more reliable {
test: "foo",
}This is a simple case, but there are some more tricky cases where regex becomes quite fragile. |
|
@back-to, It will come when You add the dependency. Who wants to waste time without perspective to be accepted? I have already passed this way, You know for example ovvatv but not only that. And why "popular" only? It could be almost any one of Your choice as often as js comes into game. |
|
Here is new plugin as example of esprima usage: |
|
One more example: https://github.com/vstavrinov/streamlink/blob/master/src/streamlink/plugins/arconai.py This is specific case, where regex based solution seems impossible. Even using html/js parsers insufficient. The problem is that the request of the same html page return different js containing the same type of chunks of target url in different fields of the structure. Yes, it is possbile to create different patterns according to every structure, but their exhaustive list is unknown, so there always are chance that next request fails. This is interesting but it turned out stream url could be assembled from the javascript structure by means of js beautifier independently of alternating structure. So it is more than only formatter while the parsers used only to locate approriate data. Ultimately workflow is as follow: html parser -> js beautifier -> js parser. The story conclusion may differ, but there are no guarantee it is unique. Thus it may considered as yet another reason using regular tools for js handling. |
|
I have had a play around with _js_to_json_re = partial(re.compile(r'(\w+):(["\']|\d?\.?\d+,|true|false|\[|{)').sub, r'"\1":\2')Using Obviously adding this dependency is some work, but I think it worthwhile. It doesn't have any other dependencies so that is a plus. My only concern is that the developer hasn't made any commits for a while... |
|
You should also take a look at Kronuz/esprima-python#11 |
|
@brouxco yeah that is a bit weird, it could be used to host the malicious script on GitHub. The file referred to in that PR is not actually included in the distribution, it only appears to be used for testing. Weird none the less. Unfortunately esprima does seem to be the best JavaScript lexer for python and it would be a shame for that file to mess it up. |
|
Yeah. The fact that the project does not look very active is concerning too as you mentioned. I don't know if you had a look at ytdl-org/youtube-dl#11292 but they are discussing this since November 2016. Adding a dependency could help as long as it is maintained. Quoting: ytdl-org/youtube-dl#11292 (comment)
Maybe this is what we should try to do if we deem that having a JS parser is really necessary. |
|
The thing with esprima is that it’s a pure python implementation, so there are no dependencies to manage and it appears to be a pretty good le we for modern JavaScript. It should be sufficient for most of Streamlink’s requirements, and I don't think we should be adding anything that might be able to actually execute JavaScript. |
| 'backports.shutil_which;python_version<"3.3"', | ||
| 'backports.shutil_get_terminal_size;python_version<"3.3"' | ||
| 'backports.shutil_get_terminal_size;python_version<"3.3"', | ||
| 'esprima' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pin to this specific version, or possible a smaller range of versions.
| 'esprima' | |
| 'esprima==4.0.1' |
|
@beardypig Esprima is a dead project, why are we even discussing usage of this again? Almost no one is pushing for having a JS parser like this, doesn't seem worth adding. |
|
@gravyboat it would help with some things, regex is pretty ugly for some things. It doesn't have to be a fully fledged all singing, all dancing JS parser, we don't want to execute JS - just extract useful bits of information slightly easier. I created a function for transforming JavaScript objects not in JSON to Python dicts (beardypig@edce224) as an example use case. Do you mean that |
|
@beardypig Yes I am specifically referring to esprima. I have no issue with using some kind of (light and actively maintained/secure) js parser and I like the util you added. |
|
As far as I understand |
|
And it is not dead. The author responds to pull requests. So python-esprima is managed. |
|
One more new plugin using js parser: https://github.com/vstavrinov/streamlink/blob/master/src/streamlink/plugins/ukraina24.py Unfortunately, they all are out of upstream. |
|
Closing as there hasn't been any movement or a decision in months. |
Introducing esprima as JS parser. Inspired by discussions in #2425 and #2505
Source code:
https://github.com/Kronuz/esprima-python
Packages for nodejs exists in Debian and other distro while python-esprima should be expected as well.
Here is usage example. The key idea is to iterate list until find type / name pair: