Skip to content

Conversation

@vstavrinov
Copy link
Contributor

Introducing esprima as JS parser. Inspired by discussions in #2425 and #2505
Source code:
https://github.com/Kronuz/esprima-python
Packages for nodejs exists in Debian and other distro while python-esprima should be expected as well.
Here is usage example. The key idea is to iterate list until find type / name pair:

import esprima
js = '''
function a(b) {
    b["c"] = "d"
    return b
}

class e {
    constructor() {
        this.f = {g: "h", c: "k"}
    }
    m(n) {
        var p = this.f
        p["g"] = "r"
        return n(p)
    }
}

s = new e
console.log(s.f)
console.log(s.m(a))

'''
data = ""
parse = esprima.parseScript(js, { "tolerant": True })
body = parse.body
for node in body:
    if node.type == "ClassDeclaration" and node.id.name == "e":
        for node in node.body.body:
            if node.type == "MethodDefinition" and node.key.name == "m":
                for node in node.value.body.body:
                    if node.type == "ExpressionStatement" and \
                        node.expression.left.object.name == "p" and \
                        node.expression.left.property.value == "g":
                        data = node.expression.right.value
                        break
                break
        break

print("data = " + data)

@codecov
Copy link

codecov bot commented Jul 8, 2019

Codecov Report

Merging #2534 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2534   +/-   ##
=======================================
  Coverage   52.56%   52.56%           
=======================================
  Files         239      239           
  Lines       15118    15118           
=======================================
  Hits         7947     7947           
  Misses       7171     7171

@gravyboat
Copy link
Member

Thanks for bringing this up for discussion @vstavrinov! I'm not super familiar with the various js parser options for Python, so I'd just like to make sure we do our due diligence in selecting one:

  • Looking at the repo there hasn't been an update in almost a year. They also don't allow the creation of issues, is this still actively maintained?
  • Are there any alternatives? What makes esprima the best? Is there anything with a higher adoption rate, better OS support, more active development, etc?
  • How many distros support esprima with official packages for python-esprima?

@vstavrinov
Copy link
Contributor Author

@gravyboat:

They still accept pull-requests. If there are no maintenance we can take it over and make our own fork when it will makes sense.

Other issues we have already discussed. You can add more details if consider it worth of that.

And finally, feel free to propose an alternative and we will discuss Your proposal here.

P.S. As for esprima itself, it looks like standard solution (I have already mentioned this), concern may touch it's python implementation only. I think it worth to take care about this issue.

@back-to
Copy link
Collaborator

back-to commented Jul 20, 2019

is there any real use case for a popular plugin?
(if possible already a finished plugin)

@beardypig
Copy link
Member

I suggested that a utility function for parsing things that are not quite JSON objects more reliable

{
	test: "foo",
}

This is a simple case, but there are some more tricky cases where regex becomes quite fragile.

@vstavrinov
Copy link
Contributor Author

@back-to, It will come when You add the dependency. Who wants to waste time without perspective to be accepted? I have already passed this way, You know for example ovvatv but not only that. And why "popular" only? It could be almost any one of Your choice as often as js comes into game.

@vstavrinov
Copy link
Contributor Author

Here is new plugin as example of esprima usage:
https://github.com/vstavrinov/streamlink/blob/maverick/src/streamlink/plugins/kanalukraina.py

@vstavrinov
Copy link
Contributor Author

One more example:

https://github.com/vstavrinov/streamlink/blob/master/src/streamlink/plugins/arconai.py

This is specific case, where regex based solution seems impossible. Even using html/js parsers insufficient. The problem is that the request of the same html page return different js containing the same type of chunks of target url in different fields of the structure. Yes, it is possbile to create different patterns according to every structure, but their exhaustive list is unknown, so there always are chance that next request fails.

This is interesting but it turned out stream url could be assembled from the javascript structure by means of js beautifier independently of alternating structure. So it is more than only formatter while the parsers used only to locate approriate data. Ultimately workflow is as follow: html parser -> js beautifier -> js parser.

The story conclusion may differ, but there are no guarantee it is unique. Thus it may considered as yet another reason using regular tools for js handling.

@beardypig
Copy link
Member

beardypig commented Apr 3, 2020

I have had a play around with esprima and it seems pretty good for parsing JavaScript objects where regular JSON parsing would fail and modifying with regex is clunky. For example this regex from the bloomberg plugin:

_js_to_json_re = partial(re.compile(r'(\w+):(["\']|\d?\.?\d+,|true|false|\[|{)').sub, r'"\1":\2')

Using esprima I wrote a function that will properly parse the JSON like object in to a Python object. Something that I would like to include in this PR.

Obviously adding this dependency is some work, but I think it worthwhile. It doesn't have any other dependencies so that is a plus. My only concern is that the developer hasn't made any commits for a while...

@brouxco
Copy link
Contributor

brouxco commented Apr 3, 2020

You should also take a look at Kronuz/esprima-python#11

@beardypig
Copy link
Member

@brouxco yeah that is a bit weird, it could be used to host the malicious script on GitHub. The file referred to in that PR is not actually included in the distribution, it only appears to be used for testing. Weird none the less.

Unfortunately esprima does seem to be the best JavaScript lexer for python and it would be a shame for that file to mess it up.

@brouxco
Copy link
Contributor

brouxco commented Apr 4, 2020

Yeah. The fact that the project does not look very active is concerning too as you mentioned.

I don't know if you had a look at ytdl-org/youtube-dl#11292 but they are discussing this since November 2016.
The corresponding pull request is ytdl-org/youtube-dl#11272 (no activity since June 2018)

Adding a dependency could help as long as it is maintained.
It does not look like any JS parser in python is actively maintained. (see other JS parser projects: https://github.com/PiotrDabkowski/pyjsparser, https://github.com/rspivak/slimit)

Quoting: ytdl-org/youtube-dl#11292 (comment)

Python wrappers for those existing engines [SpiderMonkey of Gecko, V8 of Blink, JavascriptCore of Webkit and ChakraCore of IE/Edge, and maybe other JS engines used in popular browsers] are much easier to write and maintain in comparison with a new Python implementation from scratch.

Maybe this is what we should try to do if we deem that having a JS parser is really necessary.

@beardypig
Copy link
Member

The thing with esprima is that it’s a pure python implementation, so there are no dependencies to manage and it appears to be a pretty good le we for modern JavaScript. It should be sufficient for most of Streamlink’s requirements, and I don't think we should be adding anything that might be able to actually execute JavaScript.

'backports.shutil_which;python_version<"3.3"',
'backports.shutil_get_terminal_size;python_version<"3.3"'
'backports.shutil_get_terminal_size;python_version<"3.3"',
'esprima'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pin to this specific version, or possible a smaller range of versions.

Suggested change
'esprima'
'esprima==4.0.1'

@gravyboat
Copy link
Member

gravyboat commented Apr 28, 2020

@beardypig Esprima is a dead project, why are we even discussing usage of this again? Almost no one is pushing for having a JS parser like this, doesn't seem worth adding.

@beardypig
Copy link
Member

@gravyboat it would help with some things, regex is pretty ugly for some things. It doesn't have to be a fully fledged all singing, all dancing JS parser, we don't want to execute JS - just extract useful bits of information slightly easier. I created a function for transforming JavaScript objects not in JSON to Python dicts (beardypig@edce224) as an example use case.

Do you mean that python-esprima is dead project?

@gravyboat
Copy link
Member

@beardypig Yes I am specifically referring to esprima. I have no issue with using some kind of (light and actively maintained/secure) js parser and I like the util you added.

@beardypig
Copy link
Member

As far as I understand python-esprima is following the release cycle of esprima which the latest version is 4.0.1. python-esprima is a pure python package, with no external dependencies - pretty light.
It doesn't seem any less secure that any other dependency we have, the "shellcode" that was included in the tests (not in the package) was explained, as has since been removed.
I dunno, I like it ...

@vstavrinov
Copy link
Contributor Author

And it is not dead. The author responds to pull requests. So python-esprima is managed.

@vstavrinov
Copy link
Contributor Author

One more new plugin using js parser:

https://github.com/vstavrinov/streamlink/blob/master/src/streamlink/plugins/ukraina24.py

Unfortunately, they all are out of upstream.

@gravyboat
Copy link
Member

Closing as there hasn't been any movement or a decision in months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants