AST for JavaScript developers

9 min readApr 19, 2018

TL;DR This article is my talk for Stockholm ReactJS Meetup I’ve had recently. You can check out slides here https://www.slideshare.net/BohdanLiashenko/ast-for-javascript-developers

Why Abstract Syntax Tree?

If you check devDependencies of any modern project, you will see how much it grew for last years. We got literally groups of tools there: JavaScript transpiling, code minification, CSS pre-processors, eslint, prettier, etc. These are JavaScript modules we don’t ship to production, but they do play a very important role in our development process. All these tools, one way or another, built on top of AST processing.

All these tools, one way or another, built on top of AST processing.

There is a plan what I’ll talk about. We are starting from what is AST and how to build it from plain code. Then, we will slightly touch some of the most popular use cases and tools built on top of AST processing. And, I am planning to finish talking about my project js2flowchart, which we will be the good demo what you can build while working with AST. So, be with me and let’s get started.

What is Abstract Syntax Tree?

It is a hierarchical program representation that presents source code structure according to the grammar of a programming language, each AST node corresponds to an item of a source code.

Alright. Let’s see it on example.

But this is the main idea. From the plain text, we are getting tree-like data structure. Items in code match to nodes in a tree.

How to get AST from plain code? Well, we know compilers have been doing that already. Let’s just check an average compiler.

Fortunately, we don’t need to go through all its phases, converting High-level language code to bits. We are interested in Lexical and Syntax Analysis only. These two steps play the main role in generating AST from code.

First step. Lexical analyzer, also called scanner, it reads a stream of characters (our code) and combines them into tokens using defined rules. Also, it will remove white space characters, comments, etc. In the end, the entire string of code will be split into a list of tokens.

When the lexical analyzer read the source-code, it scans the code letter by letter; and when it encounters a whitespace, operator symbol, or special symbols, it decides that a word is completed.

Second step. Syntax analyzer, also called parser, will take a plain list of tokens after Lexical Analysis and turn it into a tree representation, validating language syntax and throwing syntax errors, if such happened.

While generating a tree, some parsers omit unnecessary tokens (like redundant brackets for example) so they create ‘Abstract Syntax Tree’ — it is not 100% of code match, but enough to know how to deal with it. On another hand, parsers which fully cover all code structure generate tree called ‘Concrete Syntax Tree’.

Want to learn more about compilers?

The-super-tiny-compiler. You can start from this repo. It’s a super-simplified example of all the major pieces of a compiler written in JavaScript. It has something like 200 lines of actual code and the idea behind is to compile Lisp into C language. All code covered with comments and explanations.

https://github.com/jamiebuilds/the-super-tiny-compiler

LangSandbox. One more nice project to check out. It illustrates how to build a programming language. There is a list of articles or book (if you prefer) how to do that. So, it goes a bit further, because instead of compiling Lisp to C (as it was in the previous example) here you can write your language and compile to C/bytecode and execute after.

https://github.com/ftomassetti/LangSandbox

Can I just use a library? Sure, there are plenty of libraries. You can visit astexplorer and pick up one you prefer. There is live editor where you can play with AST parsers. It also contains many other languages, besides JavaScript.

I want to highlight particularly one of them, which in my opinion is really good one, called Babylon.

It’s used in Babel, and maybe it’s a reason for its popularity. Because it’s backed by Babel project, you can expect that it will be always up to date with new JS features, which we are getting pretty often last few years. So, when we get next thing like ‘asynchronous iteration’ (whatever) this parser will not tell you ‘Unexpected token’. Also, it has quite a good API and easy to use in general.

Ok, now you know how you can generate AST for a code. Let’s move on to real-life use cases.

The first use case I want to talk about is code transpiling and, obviously, Babel.

Babel is not a ‘tool for having ES6 support’. Well, it is, but it is far not only what it is about.

Many associate Babel with a support of ES6/7/8 features. And, in fact, it’s why we often use it. But it’s just one group of plugins. We also can use it for code minification, for React-related syntax transpiling (like JSX for example), plugins for Flow etc.

Babel is a JS compiler. At a high level, it has 3 stages that it runs code in: parsing, transforming and generation. You give Babel some JavaScript code, it modifies the code and generates the new code back out. How it modifies the code? Exactly! It builds AST, traverses it, modifies it based on plugins applied and then generate new code from modified AST.

Get Bohdan Liashenko’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Let’s see this in a simple code example.

As I’ve mentioned before, Babel uses Babylon, so, the first thing we parse code, then traverse AST and reverse all variable names. The final step — generate code. Done. As you can see here, 1st (parsing) and 3rd (code generating) stages look pretty common, that’s what you will do each time. So, Babel takes care of them, because the only thing you are really interested in, it’s AST transformation.

When you are developing Babel-plugin, you just describe node “visitors” which will transform your AST.

Add it to your Babel plugins list, set babel-loader in your webpack config and here we go, easy peezy.

You may check out Babel-handbook if you would like to learn more about how to build your first babel-plugin.

https://github.com/jamiebuilds/babel-handbook

Let’s move on, the next use-case I wanted to mention is automated code refactoring and JSCodeshift.

Let’s say you want to replace all old-fashion anonymous functions with short and nice arrow functions.

Your code editor most-likely will not be able to do that, because it’s not simple find-replace operations. That’s where jscodeshift comes to play.

If you hear ‘jscodeshift’ you most likely will hear it together with ‘codemods’, what can be confusing at the first time. Jscodeshift is a toolkit for running “codemods”. A “codemod” is a code which does actually describe what transformation should be made to AST. So, the idea is really similar to Babel and its plugins.

And you can see it looks almost like a Babel-plugin.

So, if you wanna create ‘automated way’ to migrate your code-base to new framework version, here is a way to go. For example, here is react 16 prop types refactoring.

Hope everyone is already migrated to 16 version ;)

Try it out, there are many differences codemodes already created, you can save yourself some time by not doing that manually.

https://github.com/facebook/jscodeshift; https://github.com/reactjs/react-codemod

The last use-case I wanted to mention slightly is Prettier because probably everyone uses its daily work.

Prettier formats our code. It will break long lines, clean up spaces, brackets and so on. So it will take the code as input and return modified code as output. Sounds familiar already, right? Exactly.

The idea is still the same. At first, take the code and generate AST. Then, actually magic of prettier is going to happen. AST will be converted to ‘intermediate representation‘ or ‘Doc’. On the high level, AST nodes will be extended with information how they relate one to another in terms of formatting, for example, list of parameters for the function should be treated as a group of items. So if the list is long and doesn’t fit in one line, break each parameter into a separate line, etc. Then, the main algorithm, called ‘printer’ will go through IR and based on the whole picture decide how to format code.

Again, if you want to learn more theory behind pretty printing, which actually not that simple it can look, there is an available book you can dive into.

http://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf

Here we go, the last thing for today I wanted to mention is my library called js2flowchart (4.2 k stars on Github).

It does exactly how it’s named, it takes JS code and generates SVG flowchart.

And this is a good example here because it shows you that you can do whatever you want when you have AST code representation. It’s not necessary to put it back into a code string, you can draw a flowchart of it, whatever else you want.

What is the use case? You can explain/document your code by flowcharts, learn others code by visual understanding, create flowcharts for any process simply described by valid JS syntax.

The simplest way to try it right now — go to live editor.

https://bogdan-lyashenko.github.io/js-code-to-svg-flowchart/docs/live-editor/index.html

Try it out. Also, you can use it from the code, or, it also has CLI, so you can just point to file from each you want to generate SVG file. Also, there is VS Code extension (link in readme).

So, what else it can do? Well, first of all, apart from just generating big scheme of all code, you can specify abstraction level how detailed scheme should be.

It means, for example, you can draw only what module exports, or only classes definitions, or functions definitions and their calls. Then you can generate a presentation where each slide is deeper dive into details.

There is also a bunch of handy tools to how you can modify the tree. For example, you can see here ‘.forEach’ method call, which is just method call, but we can specify that they should be treated as loops, because we know forEach is a loop, so, let’s render it like a loop right away.

Alright, how it works under the hood?

First thing, parsing code to AST, then, we traverse AST and generate another tree, I called it FlowTree. It omits a lot of small, not important tokens but puts together key blocks, like functions, loop, conditions, etc. After that, we traverse FlowTree and created ShapesTree from it. Each node of ShapesTree contains information about its visual type, positioning, connections in a tree etc. The final step, we go through all shapes and for each of them generate SVG representation, combining all together into one SVG file.

Check it out https://github.com/Bogdan-Lyashenko/js-code-to-svg-flowchart.

If you liked this post and want to have updates about my next articles, please follow me on twitter @bliashenko!

ITNEXT

AST for JavaScript developers

Get Bohdan Liashenko’s stories in your inbox

Published in ITNEXT

Written by Bohdan Liashenko

Responses (4)