Graph Cache#261
Conversation
…h GPU resources, we probably need a launch guard?
| return std_vec; | ||
| } | ||
|
|
||
| void debugPrint(const TensorTypePtr& type) { |
There was a problem hiding this comment.
Just a note, this is debugPrint I left it here for convenience.
csarofeen
left a comment
There was a problem hiding this comment.
Generally I'd ask you to comment headers better. That's where we should discuss high level concepts/structure.
From what I understood looks good to me.
|
|
||
| // TODO: hash indexing; | ||
| for (int i = 0; i < static_cast<int>(fec_cache_.size()); i++) { | ||
| if (input_stack.complyWith(input_stacks_[i])) { |
There was a problem hiding this comment.
Does this at least return relatively quickly? Checking most common things or quickest things to check that likely don't match?
There was a problem hiding this comment.
Nope it doesn't. There's really not a good way I can think about that would make this check fast.
Hence I'm thinking about just adding a hash/encode of input properties, so we can map seen inputs directly to the cached fusion executor.
| // b. kernel_id -> FusionExecutor | ||
| // | ||
| // This allows FusionExecutor reuse across nodes; | ||
| class CudaFusionManager { |
There was a problem hiding this comment.
it may be nice to have a comment reminding us that CudaFusionManager is not thread safe
There was a problem hiding this comment.
Good catch!
I think this is potentially to be an issue. We can change the singleton into a thread_local instance instead, which would prevent kernel reuse across threads but would be safer.
Thoughts? @tlemo @csarofeen
…sallowed broadcast in fusion
Uh oh!
There was an error while loading. Please reload this page.