Skip to content

CPU time optimization for GraphInputMatcher#7895

Merged
JackCaoG merged 1 commit intomasterfrom
JackCaoG/dynamo_input_matcher_optimization
Aug 21, 2024
Merged

CPU time optimization for GraphInputMatcher#7895
JackCaoG merged 1 commit intomasterfrom
JackCaoG/dynamo_input_matcher_optimization

Conversation

@JackCaoG
Copy link
Copy Markdown
Collaborator

@JackCaoG JackCaoG commented Aug 20, 2024

couple optimiztion I did in this pr for the Graph_Input_matched

  1. cache the seed_info_id instead of getting it every time
  2. cache the arg_idxs, it should be the same for all calls so no need to calculte for every execution
  3. pre-allocate the real_input instead of calling append over and over
  4. move the if tensor_id == self.seed_info_id check into the if arg_idx is None since it is a uncommon case

with all of these I was able to reduce the runtime of this function from 0.3ms to 0.2ms for the llama3 8B case for vllm

@JackCaoG JackCaoG marked this pull request as ready for review August 21, 2024 01:12
@JackCaoG JackCaoG requested review from alanwaketan and lsy323 August 21, 2024 02:16
@JackCaoG JackCaoG merged commit ae28308 into master Aug 21, 2024
@JackCaoG JackCaoG deleted the JackCaoG/dynamo_input_matcher_optimization branch August 21, 2024 16:53
yitongh pushed a commit to AlibabaPAI/xla that referenced this pull request Oct 11, 2024
yitongh pushed a commit to AlibabaPAI/xla that referenced this pull request Dec 11, 2024
yitongh pushed a commit to AlibabaPAI/xla that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants