Skip to content

GC AST nodes #214

@MichaReiser

Description

@MichaReiser

Red Knot currently loads and holds on to every analyzed file's AST and source code. For large projects, this can easily take up multiple GB of data.

Persistent caching, which removes the need to analyze all files, should help with overall consumption but it won't help for the initial run or when many files changed.

Salsa plans to add support for LRU garbage collection (or already has). It might be able to do exactly what we need but I'm not sure if it is limited to collecting stale values between revisions. I also suspect that it won't help in our case because collecting the cached results for parsed_modules isn't sufficient because DefinitionKind and ExpressionKind hold on to an Arced AST and the tracked structs don't get collected.

One option would be to have a lazy representation for AstNodeRef (WeakAstNodeRef) that stores an identifier that uniquely identifies the node, a weak Arc, together with the node's reference. Resolving the node would re-parse the file if the Arc has been collected and finds the node in the tree (which could be expensive). On the other hand, it uses the Ast as is if the Arc is still materialized.

We should look into ways on how we can drop no longer needed ASTs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    memoryrelated to memory usage

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions