Graph Cache by jjsjann123 · Pull Request #261 · csarofeen/pytorch

jjsjann123 · 2020-08-03T11:40:33Z

Added GraphCache with dummy FusionExecutorCache (heuristic cache).
Refactored/fixed permutation to coalesce tensor dimensions with neighboring stride to facilitate dimension collapsing.
GraphCache works for both profiling executor & legacy executor, where we construct new graph for in-compatible tensor shape (change in i. broadcast rule, ii. contiguity, or iii. order of strides)
Added quite some functional tests. Unfortunately, the pieces used in dimension coalescing is not in a functional style, which turns out to be very tricky to write individual tests. This is should be backlogged as future refactor work.

…h GPU resources, we probably need a launch guard?

jjsjann123 · 2020-08-10T19:35:09Z

+  return std_vec;
+}
+
+void debugPrint(const TensorTypePtr& type) {


Just a note, this is debugPrint I left it here for convenience. ☺️

csarofeen

Generally I'd ask you to comment headers better. That's where we should discuss high level concepts/structure.
From what I understood looks good to me.

csarofeen · 2020-08-10T19:46:43Z

+
+  // TODO: hash indexing;
+  for (int i = 0; i < static_cast<int>(fec_cache_.size()); i++) {
+    if (input_stack.complyWith(input_stacks_[i])) {


Does this at least return relatively quickly? Checking most common things or quickest things to check that likely don't match?

Nope it doesn't. There's really not a good way I can think about that would make this check fast.
Hence I'm thinking about just adding a hash/encode of input properties, so we can map seen inputs directly to the cached fusion executor.

tlemo · 2020-08-10T21:47:08Z

-//   b. kernel_id -> FusionExecutor
-//
-// This allows FusionExecutor reuse across nodes;
 class CudaFusionManager {


it may be nice to have a comment reminding us that CudaFusionManager is not thread safe

Good catch!

I think this is potentially to be an issue. We can change the singleton into a thread_local instance instead, which would prevent kernel reuse across threads but would be safer.

Thoughts? @tlemo @csarofeen

kevinstephano

LGTM

…e_PR

…sallowed broadcast in fusion

jjsjann123 added 19 commits July 31, 2020 01:34

shelve WIP

7e8182d

shelve WIP

792aa26

WIP compiles and runs with failing python tests!

d2e089b

looks like contiguity & permutation is fixed, need to add tests

f2225ce

Merge remote-tracking branch 'csarofeen/20_7_6_devel' into WIP

7c24557

abandon naive reduction schedule to WAR issue with requesting too muc…

5faee0d

…h GPU resources, we probably need a launch guard?

integration is functional

b0b1835

fixing missing tensor type info in legacy executor

d0db71b

clang-format

3e40fa7

fixing output permutation

15e51b4

fixing graph copy; fixing require permutation check

710e133

fixing contiguity in tensor_view; TODO: add cpp tests!

483097f

fixing requiresPermutation; fixing empty tensor property issue

bb39644

adding tests

475dfe8

clang-format

2df6b80

remove debug messages

4ea3a73

flake8

449164e

clang-tidy

c145b73

clang-tidy2

044ecfc

jjsjann123 changed the title ~~[WIP][DO NOT REVIEW YET] Graph Cache~~ Graph Cache Aug 5, 2020

clang-format

e1d1533

jjsjann123 requested review from csarofeen and kevinstephano August 5, 2020 11:38

remove obsolete code

5e5da0c

jjsjann123 commented Aug 10, 2020

View reviewed changes

csarofeen approved these changes Aug 10, 2020

View reviewed changes

tlemo reviewed Aug 10, 2020

View reviewed changes

kevinstephano reviewed Aug 10, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/kernel_cache.h

kevinstephano approved these changes Aug 10, 2020

View reviewed changes

tlemo approved these changes Aug 11, 2020

View reviewed changes

jjsjann123 added 6 commits August 11, 2020 02:26

addressing review comments

54888cf

clang-format, fixing typo as well

0c52008

clang_format

5d77d9c

adding more comments

ba2b9f6

Merge remote-tracking branch 'csarofeen/20_7_6_devel' into graph_cach…

2ec7865

…e_PR

disabling broadcast in permutation test, because of recent PR that di…

2881792

…sallowed broadcast in fusion

jjsjann123 merged commit d851ecd into 20_7_6_devel Aug 11, 2020

This was referenced Aug 11, 2020

[1.0 Requirement] Refactor graph caching #191

Closed

[1.0 Requirement] Dimension coalescing. #194

Closed

csarofeen deleted the graph_cache_PR branch June 9, 2021 13:37

Conversation

jjsjann123 commented Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 Aug 10, 2020

Choose a reason for hiding this comment

Uh oh!

csarofeen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

csarofeen Aug 10, 2020

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Aug 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlemo Aug 10, 2020

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Aug 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinstephano left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jjsjann123 commented Aug 3, 2020 •

edited

Loading