We ran into this in vLLM, where vLLM:
- creates one big graph with a lot of transformer layers
- splits the big graph into multiple graphs, each of which are the same
- compiles each graph separately (sending it through AOTAutograd). All of these graphs cache miss due to having different input names.
The workaround I am currently applying is to normalize the inputs of the graphs before sending them through AOTAutograd. It's not clear to me if the right long-term solution is:
- AOTAutograd cache becomes agnostic to names
- user is supposed to do some sort of hierarchical compilation
- user is supposed to know these quirks with AOTAutograd cache and program with those in mind
cc @chauhang @penguinwu @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @masnesral @coconutruben
We ran into this in vLLM, where vLLM:
The workaround I am currently applying is to normalize the inputs of the graphs before sending them through AOTAutograd. It's not clear to me if the right long-term solution is:
cc @chauhang @penguinwu @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @masnesral @coconutruben