Log structured logging overhead to dynamo compile (kinda)#136142
Log structured logging overhead to dynamo compile (kinda)#136142
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D62643611 |
Summary: X-link: pytorch/pytorch#136142 Pull Request resolved: pytorch#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Reviewed By: oulgen Differential Revision: D62643611
|
This pull request was exported from Phabricator. Differential Revision: D62643611 |
Summary: Pull Request resolved: #136142 X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611
020d916 to
f8cd161
Compare
Summary: X-link: pytorch/pytorch#136142 Pull Request resolved: pytorch#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Reviewed By: oulgen Differential Revision: D62643611
f8cd161 to
b03d9f6
Compare
|
This pull request was exported from Phabricator. Differential Revision: D62643611 |
Summary: Pull Request resolved: #136142 X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611
Summary: X-link: pytorch/pytorch#136142 Pull Request resolved: pytorch#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Reviewed By: oulgen Differential Revision: D62643611
|
This pull request was exported from Phabricator. Differential Revision: D62643611 |
b03d9f6 to
cda3c95
Compare
Summary: X-link: pytorch/pytorch#136142 Pull Request resolved: #2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Reviewed By: oulgen Differential Revision: D62643611 fbshipit-source-id: 9353d1dbb323079e292b9b4786604fc377971e13
|
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…6142) Summary: X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611 Pull Request resolved: pytorch#136142 Approved by: https://github.com/bobrenjc93
Summary:
X-link: pytorch/benchmark#2454
This adds structured logging overhead at a per compile basis to compilation metrics.
To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table.
Implementation notes:
Test Plan:
Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table:
https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq
Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this.
You can also look at samples for a more detailed log of this.
Reviewed By: oulgen
Differential Revision: D62643611
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec