feat: second attempt to support DDS and NonZero op#3388
Conversation
98aebfd to
8ef1a87
Compare
d55c451 to
ad04cf9
Compare
9a9852f to
d718464
Compare
| if ( | ||
| node != output_node | ||
| and len(node.users) == 0 | ||
| and len(node.all_input_nodes) > 0 |
There was a problem hiding this comment.
probably better to add an assert checking if if has only one input (print the number in the string if it fails)
There was a problem hiding this comment.
I previously reused the code from other lowering pass. it looks like we can directly remove unused ops right?
do you think if there's any potential issues?
| need_cudagraphs_reset, | ||
| ) = self.runtime_states.set_runtime_states( | ||
| cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed | ||
| self.cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed |
There was a problem hiding this comment.
Is use_pre_allocated_outputs valid now that you're adding OA feature ?
There was a problem hiding this comment.
I think the OA feature will not affact use_pre_allocated_outputs because I didn't change the behavior of CG and use_pre_allocated_outputs has its own context manager as well.
| raise RuntimeError( | ||
| "Both CUDA Graphs and OutputAllocator are enabled. Please disable either one." | ||
| ) | ||
| if self.use_output_allocator_outputs: |
There was a problem hiding this comment.
How is use_output_allocator_outputs set ? Is it by using the with context manager by the user ?
There was a problem hiding this comment.
yes, it will be set by the with context manager by the user. If users don't set it, it will choose standard exec or OA according to the converter decorator.
28b27c5 to
7e1a1ca
Compare
narendasan
left a comment
There was a problem hiding this comment.
LGTM after minor change
Description
Added a new path to support Data Dependent Shape (DDS) and NonZero op in this PR.
Static and dynamic shapes go the original path; DDS goes the new path with IOutputAllocator.
Fixes #2516
Type of change
Checklist: