Log structured logging overhead to dynamo compile (kinda) by jamesjwu · Pull Request #2454 · pytorch/benchmark

jamesjwu · 2024-09-13T15:49:24Z

Summary:
This adds structured logging overhead at a per compile basis to compilation metrics.

To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table.

Implementation notes:

If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis.
We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number in compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small.
I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though.

Differential Revision: D62643611

facebook-github-bot · 2024-09-13T15:49:40Z

This pull request was exported from Phabricator. Differential Revision: D62643611

facebook-github-bot · 2024-09-16T16:54:35Z

This pull request was exported from Phabricator. Differential Revision: D62643611

Summary: X-link: pytorch/pytorch#136142 Pull Request resolved: pytorch#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Reviewed By: oulgen Differential Revision: D62643611

Summary: Pull Request resolved: #136142 X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611

facebook-github-bot · 2024-09-18T21:31:09Z

This pull request was exported from Phabricator. Differential Revision: D62643611

Summary: X-link: pytorch/pytorch#136142 Pull Request resolved: pytorch#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Reviewed By: oulgen Differential Revision: D62643611

facebook-github-bot · 2024-09-18T21:37:32Z

This pull request was exported from Phabricator. Differential Revision: D62643611

Summary: Pull Request resolved: #136142 X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611

facebook-github-bot · 2024-09-19T16:07:13Z

This pull request has been merged in 9c7aacc.

Summary: X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611 Pull Request resolved: #136142 Approved by: https://github.com/bobrenjc93

…6142) Summary: X-link: pytorch/benchmark#2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611 Pull Request resolved: pytorch#136142 Approved by: https://github.com/bobrenjc93

facebook-github-bot added the cla signed label Sep 13, 2024

jamesjwu temporarily deployed to docker-s3-upload September 13, 2024 15:49 — with GitHub Actions Inactive

facebook-github-bot added the fb-exported label Sep 13, 2024

jamesjwu had a problem deploying to docker-s3-upload September 13, 2024 15:50 — with GitHub Actions Failure

jamesjwu mentioned this pull request Sep 16, 2024

Log structured logging overhead to dynamo compile (kinda) pytorch/pytorch#136142

Closed

jamesjwu force-pushed the export-D62643611 branch from b6c0c94 to 13a0bb7 Compare September 16, 2024 16:54

jamesjwu temporarily deployed to docker-s3-upload September 16, 2024 16:54 — with GitHub Actions Inactive

jamesjwu had a problem deploying to docker-s3-upload September 16, 2024 16:55 — with GitHub Actions Failure

jamesjwu had a problem deploying to docker-s3-upload September 18, 2024 21:20 — with GitHub Actions Error

jamesjwu force-pushed the export-D62643611 branch from 13a0bb7 to e104be4 Compare September 18, 2024 21:31

jamesjwu had a problem deploying to docker-s3-upload September 18, 2024 21:31 — with GitHub Actions Error

jamesjwu had a problem deploying to docker-s3-upload September 18, 2024 21:32 — with GitHub Actions Error

jamesjwu force-pushed the export-D62643611 branch from e104be4 to fda5196 Compare September 18, 2024 21:37

jamesjwu temporarily deployed to docker-s3-upload September 18, 2024 21:39 — with GitHub Actions Inactive

jamesjwu had a problem deploying to docker-s3-upload September 18, 2024 21:40 — with GitHub Actions Failure

facebook-github-bot closed this in 9c7aacc Sep 19, 2024

facebook-github-bot added the Merged label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log structured logging overhead to dynamo compile (kinda)#2454

Log structured logging overhead to dynamo compile (kinda)#2454
jamesjwu wants to merge 1 commit intopytorch:mainfrom
jamesjwu:export-D62643611

jamesjwu commented Sep 13, 2024

Uh oh!

facebook-github-bot commented Sep 13, 2024

Uh oh!

facebook-github-bot commented Sep 16, 2024

Uh oh!

facebook-github-bot commented Sep 18, 2024

Uh oh!

facebook-github-bot commented Sep 18, 2024

Uh oh!

facebook-github-bot commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamesjwu commented Sep 13, 2024

Uh oh!

facebook-github-bot commented Sep 13, 2024

Uh oh!

facebook-github-bot commented Sep 16, 2024

Uh oh!

facebook-github-bot commented Sep 18, 2024

Uh oh!

facebook-github-bot commented Sep 18, 2024

Uh oh!

facebook-github-bot commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants