Skip to content

[PyTorch] Fix record function inputs_valid_ check#78002

Closed
louisfeng wants to merge 1 commit intopytorch:masterfrom
louisfeng:export-D36556512
Closed

[PyTorch] Fix record function inputs_valid_ check#78002
louisfeng wants to merge 1 commit intopytorch:masterfrom
louisfeng:export-D36556512

Conversation

@louisfeng
Copy link
Contributor

Summary:
I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert:

fbcode/caffe2/aten/src/ATen/record_function.h
322
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback");

I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback.

Test Plan:
After this diff fix:

=> buck build mode/dev-nosan caffe2/test:profiler --show-output

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json
INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json
Trace compression enabled: 1
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded:
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.060s

OK

New test failure case:

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json
INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json
Trace compression enabled: 1
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback
Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 5  torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long)
# 6  std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&)
# 7  at::RecordFunction::runStartCallbacks()

Differential Revision: D36556512

Summary:
I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert:
```
fbcode/caffe2/aten/src/ATen/record_function.h
322
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback");
```
I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback.

Test Plan:
After this diff fix:

```
=> buck build mode/dev-nosan caffe2/test:profiler --show-output

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json
INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json
Trace compression enabled: 1
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded:
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.060s

OK
```

New test failure case:

```
=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json
INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json
Trace compression enabled: 1
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback
Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 5  torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long)
# 6  std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&)
# 7  at::RecordFunction::runStartCallbacks()
```

Differential Revision: D36556512

fbshipit-source-id: 0192fcc02ccc6fb867a6239fa39cc534301fd5cb
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 20, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 5160342 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D36556512

@swolchok swolchok self-requested a review May 20, 2022 20:37
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@github-actions
Copy link
Contributor

Hey @louisfeng.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request May 25, 2022
Summary:
Pull Request resolved: #78002

I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert:
```
fbcode/caffe2/aten/src/ATen/record_function.h
322
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback");
```
I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback.

Test Plan:
After this diff fix:

```
=> buck build mode/dev-nosan caffe2/test:profiler --show-output

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json
INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json
Trace compression enabled: 1
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded:
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.060s

OK
```

New test failure case:

```
=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json
INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json
Trace compression enabled: 1
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback
Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 5  torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long)
# 6  std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&)
# 7  at::RecordFunction::runStartCallbacks()
```

Reviewed By: swolchok

Differential Revision: D36556512

fbshipit-source-id: 083e7053eb92ae962c02d41c5353e7592b90bb3a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants