Apply precision config env vars in the root process.#6152
Merged
golechwierowicz merged 3 commits intomasterfrom Dec 15, 2023
Merged
Apply precision config env vars in the root process.#6152golechwierowicz merged 3 commits intomasterfrom
golechwierowicz merged 3 commits intomasterfrom
Conversation
frgossen
approved these changes
Dec 14, 2023
| raise ValueError(f"Unknown precision: {precision}") | ||
| return precision_flag | ||
|
|
||
| def apply_env(self, process_env): |
Collaborator
There was a problem hiding this comment.
No need to address in this PR:
Since this is a workaround, wdyt about putting it behind an optional flag?
This helps comparing XLA against inductor but it somewhat obfuscates comparing PyTorch+Inductor against PyTorch/XLA+XLA.
Collaborator
Author
There was a problem hiding this comment.
Hmm, I can do it in the next PR, but I think this flag should be enabled by default. We need to compare models with the same precision ops in the end, otherwise comparisons are kind of meaningless.
06cb2d6 to
65da637
Compare
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 18, 2023
In "dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for pytorch_CycleGAN_and_pix2pix seems to exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding pytorch_CycleGAN_and_pix2pix to the deny list, so that experiment_runner does not die early. Note that pytorch_CycleGAN_and_pix2pix was not running successfully for XLA before the aforementioned dfcf306 commit, so skipping it does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 18, 2023
In "dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for pytorch_CycleGAN_and_pix2pix seems to exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding pytorch_CycleGAN_and_pix2pix to the deny list, so that experiment_runner does not die early. Note that pytorch_CycleGAN_and_pix2pix was not running successfully for XLA before the aforementioned dfcf306 commit, so skipping it does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 18, 2023
In `dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)` we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for pytorch_CycleGAN_and_pix2pix seems to exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding pytorch_CycleGAN_and_pix2pix to the deny list, so that experiment_runner does not die early. Note that pytorch_CycleGAN_and_pix2pix was not running successfully for XLA before the aforementioned dfcf306 commit, so skipping it does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 19, 2023
In `dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)` we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models do exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early. Note that these models were not running successfully under XLA before the aforementioned dfcf306 commit, so skipping them does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 19, 2023
In `dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)` we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early. Note that these models were not running successfully under XLA before the aforementioned dfcf306 commit, so skipping them does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 19, 2023
In `dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)` we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early. Note that these models were not running successfully under XLA before the aforementioned dfcf306 commit, so skipping them does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 19, 2023
In `dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)` we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process when using XLA. This results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early. Note that these models were not running successfully under XLA before the aforementioned dfcf306 commit, so skipping them does not reduce coverage.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 19, 2023
In `dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)` we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process, which results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early.
cota
added a commit
to cota/pytorch-xla
that referenced
this pull request
Dec 19, 2023
In "dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process, which results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early.
cota
added a commit
that referenced
this pull request
Dec 20, 2023
In "dfcf306e7 Apply precision config env vars in the root process. (#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process, which results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early.
mbzomowski
pushed a commit
to mbzomowski-test-org/xla
that referenced
this pull request
Jan 3, 2024
…ch#6199) In "dfcf306e7 Apply precision config env vars in the root process. (pytorch#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process, which results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early.
golechwierowicz
added a commit
that referenced
this pull request
Jan 12, 2024
After some changes to the main branch, os.environ was not sufficient to pick up new env vars in the subprocess. In this PR we apply a necessary workaround in the root process which launches subprocess per each experiment. New flags are passed via process_env var.
golechwierowicz
pushed a commit
that referenced
this pull request
Jan 12, 2024
In "dfcf306e7 Apply precision config env vars in the root process. (#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process, which results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early.
bhavya01
pushed a commit
that referenced
this pull request
Apr 22, 2024
After some changes to the main branch, os.environ was not sufficient to pick up new env vars in the subprocess. In this PR we apply a necessary workaround in the root process which launches subprocess per each experiment. New flags are passed via process_env var.
bhavya01
pushed a commit
that referenced
this pull request
Apr 22, 2024
In "dfcf306e7 Apply precision config env vars in the root process. (#6152)" we started running load_benchmark() from experiment_runner's main process. Unfortunately, load_benchmark() for some models does exit the calling process, which results in experiment_runner exiting prematurely. Work around this issue by adding these models to the deny list, so that experiment_runner does not die early.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After some changes to the main branch,
os.environwas not sufficient to pick up new env vars in the subprocess.In this PR we apply a necessary workaround in the root process which launches subprocess per each experiment. New flags are passed via
process_envvar.I tried to keep the
experiment_runner.pyas clean as possible, and abstracted the new env vars viaapply_envmethod of{Benchmark, TorchBench}Model.Tested with
PJRT_DEVICE=CUDA python3 new_xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=eval --filter='hf_Bert$|BERT_pytorch$' --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=3 --print-subprocess --no-resume --profile-cuda --profile-cuda-dump=/tmp/dumpz --profile-cuda-cpu-collectFor
hf_Bertthe fp16 casting works, and it shows in the gemm kernels.for
BERT_pytorchwhich does not have fp16 as default we do nothing