[PyTorch] Fix record function inputs_valid_ check by louisfeng · Pull Request #78002 · pytorch/pytorch

louisfeng · 2022-05-20T20:15:47Z

Summary:
I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert:

fbcode/caffe2/aten/src/ATen/record_function.h
322
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback");

I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback.

Test Plan:
After this diff fix:

=> buck build mode/dev-nosan caffe2/test:profiler --show-output

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json
INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json
Trace compression enabled: 1
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded:
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.060s

OK

New test failure case:

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json
INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json
Trace compression enabled: 1
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback
Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 5  torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long)
# 6  std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&)
# 7  at::RecordFunction::runStartCallbacks()

Differential Revision: D36556512

Summary: I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert: ``` fbcode/caffe2/aten/src/ATen/record_function.h 322 TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback"); ``` I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback. Test Plan: After this diff fix: ``` => buck build mode/dev-nosan caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json Trace compression enabled: 1 INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded: INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.060s OK ``` New test failure case: ``` => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json Trace compression enabled: 1 INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 4 c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) # 5 torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long) # 6 std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&) # 7 at::RecordFunction::runStartCallbacks() ``` Differential Revision: D36556512 fbshipit-source-id: 0192fcc02ccc6fb867a6239fa39cc534301fd5cb

facebook-github-bot · 2022-05-20T20:15:52Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/78002
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 5160342 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-05-20T20:16:19Z

This pull request was exported from Phabricator. Differential Revision: D36556512

facebook-github-bot · 2022-05-25T06:30:30Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

github-actions · 2022-05-25T06:34:43Z

Hey @louisfeng.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: #78002 I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert: ``` fbcode/caffe2/aten/src/ATen/record_function.h 322 TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback"); ``` I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback. Test Plan: After this diff fix: ``` => buck build mode/dev-nosan caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json Trace compression enabled: 1 INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded: INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.060s OK ``` New test failure case: ``` => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json Trace compression enabled: 1 INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 4 c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) # 5 torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long) # 6 std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&) # 7 at::RecordFunction::runStartCallbacks() ``` Reviewed By: swolchok Differential Revision: D36556512 fbshipit-source-id: 083e7053eb92ae962c02d41c5353e7592b90bb3a

facebook-github-bot added the cla signed label May 20, 2022

facebook-github-bot added the fb-exported label May 20, 2022

swolchok self-requested a review May 20, 2022 20:37

swolchok approved these changes May 20, 2022

View reviewed changes

pytorchmergebot added the Merged label May 25, 2022

pytorchmergebot closed this in 82cb721 May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Fix record function inputs_valid_ check#78002

[PyTorch] Fix record function inputs_valid_ check#78002
louisfeng wants to merge 1 commit intopytorch:masterfrom
louisfeng:export-D36556512

louisfeng commented May 20, 2022

Uh oh!

facebook-github-bot commented May 20, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented May 20, 2022

Uh oh!

facebook-github-bot commented May 25, 2022

Uh oh!

github-actions bot commented May 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

louisfeng commented May 20, 2022

Uh oh!

facebook-github-bot commented May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

facebook-github-bot commented May 20, 2022

Uh oh!

facebook-github-bot commented May 25, 2022

Uh oh!

github-actions bot commented May 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented May 20, 2022 •

edited

Loading