Skip to content

Graph Cache#261

Merged
jjsjann123 merged 27 commits into20_7_6_develfrom
graph_cache_PR
Aug 11, 2020
Merged

Graph Cache#261
jjsjann123 merged 27 commits into20_7_6_develfrom
graph_cache_PR

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 commented Aug 3, 2020

  1. Added GraphCache with dummy FusionExecutorCache (heuristic cache).
  2. Refactored/fixed permutation to coalesce tensor dimensions with neighboring stride to facilitate dimension collapsing.
  3. GraphCache works for both profiling executor & legacy executor, where we construct new graph for in-compatible tensor shape (change in i. broadcast rule, ii. contiguity, or iii. order of strides)
  4. Added quite some functional tests. Unfortunately, the pieces used in dimension coalescing is not in a functional style, which turns out to be very tricky to write individual tests. This is should be backlogged as future refactor work.

@jjsjann123 jjsjann123 changed the title [WIP][DO NOT REVIEW YET] Graph Cache Graph Cache Aug 5, 2020
return std_vec;
}

void debugPrint(const TensorTypePtr& type) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, this is debugPrint I left it here for convenience. ☺️

Copy link
Copy Markdown
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'd ask you to comment headers better. That's where we should discuss high level concepts/structure.
From what I understood looks good to me.

Comment thread test/test_jit_cuda_fuser.py Outdated
Comment thread test/test_jit_cuda_fuser.py
Comment thread test/test_jit_cuda_fuser.py
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.h Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated

// TODO: hash indexing;
for (int i = 0; i < static_cast<int>(fec_cache_.size()); i++) {
if (input_stack.complyWith(input_stacks_[i])) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this at least return relatively quickly? Checking most common things or quickest things to check that likely don't match?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope it doesn't. There's really not a good way I can think about that would make this check fast.
Hence I'm thinking about just adding a hash/encode of input properties, so we can map seen inputs directly to the cached fusion executor.

Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.h
Comment thread torch/csrc/jit/codegen/cuda/manager.cpp
// b. kernel_id -> FusionExecutor
//
// This allows FusionExecutor reuse across nodes;
class CudaFusionManager {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be nice to have a comment reminding us that CudaFusionManager is not thread safe

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

I think this is potentially to be an issue. We can change the singleton into a thread_local instance instead, which would prevent kernel reuse across threads but would be safer.

Thoughts? @tlemo @csarofeen

Comment thread torch/csrc/jit/codegen/cuda/manager.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/tensor_view.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.h
Copy link
Copy Markdown
Collaborator

@kevinstephano kevinstephano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjsjann123 jjsjann123 merged commit d851ecd into 20_7_6_devel Aug 11, 2020
@csarofeen csarofeen deleted the graph_cache_PR branch June 9, 2021 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants