Skip to content

Reduce memory footprint of program ast by packing booleans into a bit field #5127

@thebanjomatic

Description

@thebanjomatic

Feature Use Case

The ast nodes on the rollup side of things consume a lot of memory for large projects. These are generally speaking already fairly small objects, but in many cases the sheer volume of nodes means that small reductions in object size for commonly appearing nodes can have larger impacts.

For example, on the project I'm currently building, there are 1,309,167 Identifier nodes that are each 128 bytes. Resulting in ~160 MiB of memory for Identifier objects alone (excluding the data they retain through references). Each boolean field on an object currently uses 4 bytes of memory for 1 bit of information.

Looking at the shape of the Identifier node, there are a 3 booleans (included, deoptimized, and isTDZAccess) so we can save 8 bytes (12 - 4) per instance by packing the boolean values into a single 4 byte number field. (Side note: isTDZAccess requires 2-bits of data as it currently is boolean|null to allow us to cache the result if it is currently null).

image

Two of those boolean values exist in the NodeBase and ExpressionEntity base classes, so this optimization reduces the size of all node classes by 4 bytes at a minimum + 4 bytes for every boolean of their own that gets packed into the field.

Additionally, there are a few other things we can do like only setting esTreeNode if it needs to be retained (most of the time it doesn't), or providing this.context through an accessor that that returns return this._context ?? this.parent.context that would allow the value to be overridden by node subclasses, but for the default case would save another 4 bytes for each of these two fields.

I have tested this approach by modifying the rollup js output in node_modules directly, and this appears to have saved about 120mb of AST node data in total according to the heap snapshots.

After this optimization, Identifier takes 96 bytes per instance vs 128 (32 byte reduction resulting in 40 MiB less heap data)
image

This initial investigation only touched ExpressionEntity, NodeBase, Identifier, and MemberExpression, but as mentioned previously, ExpressionEntity and NodeBase are base classes for lots of other node types.

Feature Proposal

The way I implemented this test was to add a number field flags to ExpressionEntity along with a getFlag/setFlag helper function that takes a mask and a value.

Each boolean field/member on the class is then replaced with a getter/setter pair that calls into that helper function using an enum that defines the mask values

class ExpressionEntity {
    protected flags: number = 0;
    protected getFlag(mask: number): boolean {
        return (this.flags & mask) !== 0
    }
    protected setFlag(mask: number, value: boolean): void {
        const original = this.flags;
        this.flags = value ? (original & ~mask) : (original | mask);
    }
    
    protected get included(): boolean {
        return this._getFlag(FLAG_INCLUDED);
    }
    protected set included(value: boolean): void {
        this._setFlag(FLAG_INCLUDED, value);
    }
    
    // ... more code
}

classes that then derive from ExpressionEntity then just need to provide those accessors instead of defining fields on the class. Its a little more verbose, but not too bad:

This:

class NodeBase extends ExpressionEntity {
	protected deoptimized = false;
}

becomes this:

class NodeBase extends ExpressionEntity {
	protected get deoptimized(): boolean {
            return this.getFlag(FLAG_DEOPTIMIZED);
        }
	protected get deoptimized(value: boolean) {
            this.setFlag(FLAG_DEOPTIMIZED, value);
        }

	constructor() {
            // if the previous initializer was !== false then we need to initialize in the constructor,
            // otherwise we can omit this:
            // this.deoptimized = true; 
        }
}

In my test project, I only wound up using 8 bits for the mask, but numbers still take up the full 32 bits. I don't anticipate needing more than 32 bits for this process as there aren't that many more booleans, but technically speaking the sub-type bits can overlap between node types if we need to. I'd prefer to avoid that if we can just because it leads to more confusion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions