Skip to content

Consider changing AOTAutograd cache to hit on graphs with different input and node names #157792

@zou3519

Description

@zou3519

We ran into this in vLLM, where vLLM:

  1. creates one big graph with a lot of transformer layers
  2. splits the big graph into multiple graphs, each of which are the same
  3. compiles each graph separately (sending it through AOTAutograd). All of these graphs cache miss due to having different input names.

The workaround I am currently applying is to normalize the inputs of the graphs before sending them through AOTAutograd. It's not clear to me if the right long-term solution is:

  1. AOTAutograd cache becomes agnostic to names
  2. user is supposed to do some sort of hierarchical compilation
  3. user is supposed to know these quirks with AOTAutograd cache and program with those in mind

cc @chauhang @penguinwu @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @masnesral @coconutruben

Metadata

Metadata

Assignees

No one assigned

    Labels

    compile-cachemodule: compile-timeCompilation mechanism or time spent in (re)compilation, tracing, startupmodule: vllmoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate modulevllm-compile

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions