Skip to content

add PT_XLA_DEBUG_LEVEL#7149

Merged
JackCaoG merged 6 commits intomasterfrom
JackCaoG/pt_xla_debug_level
May 30, 2024
Merged

add PT_XLA_DEBUG_LEVEL#7149
JackCaoG merged 6 commits intomasterfrom
JackCaoG/pt_xla_debug_level

Conversation

@JackCaoG
Copy link
Copy Markdown
Collaborator

PT_XLA_DEBUG_LEVEL=1 will not output the executation frame analysis. Also update the post-compilation analysis to GB instead of MB. sample output

Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis:   mark_step in parallel loader at step end
Compilation Analysis: Graph Info: 
Compilation Analysis:   Graph Hash: c74c3b91b855b2b123f833b0d5f86943
Compilation Analysis:   Number of Graph Inputs: 35
Compilation Analysis:   Number of Graph Outputs: 107
Compilation Analysis: Python Frame Triggered Execution: 
Compilation Analysis:   mark_step (/workspaces/dk3/pytorch/xla/torch_xla/core/xla_model.py:1055)
Compilation Analysis:   next (/workspaces/dk3/pytorch/xla/torch_xla/distributed/parallel_loader.py:44)
Compilation Analysis:   __next__ (/workspaces/dk3/pytorch/xla/torch_xla/distributed/parallel_loader.py:32)
Compilation Analysis:   train_loop_fn (/workspaces/dk3/pytorch/xla/examples/train_decoder_only_base.py:48)
Compilation Analysis:   start_training (/workspaces/dk3/pytorch/xla/examples/train_decoder_only_base.py:65)
Compilation Analysis:   <module> (/workspaces/dk3/pytorch/xla/examples/train_decoder_only_base.py:73)
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================

Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 1.548000 GB
Post Compilation Analysis: Graph output size: 7.922460 GB
Post Compilation Analysis: Aliased Input size: 1.547871 GB
Post Compilation Analysis: Intermediate tensor size: 12.124478 GB
Post Compilation Analysis: Compiled program size: 0.028210 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
epoch: 1, step: 0, loss: 7.349868297576904, rate: 2.489864196404525

@JackCaoG JackCaoG marked this pull request as ready for review May 30, 2024 01:49
@JackCaoG
Copy link
Copy Markdown
Collaborator Author

This should be ready for review.

@JackCaoG JackCaoG merged commit 8c2234e into master May 30, 2024
@will-cromar
Copy link
Copy Markdown
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants