[author: jluntamazon] Adding more explicit HLO lowering control by exposing LoweringContext…#5431
Conversation
caa8dc1 to
41e3fb6
Compare
41e3fb6 to
f296a1a
Compare
|
we can merge it after CI is green. @wonjoolee95 can you cherry0pick this into 2.2 once merged? |
|
Hmmm... The test results in CI are different from what I observe locally. Need to figure out why. |
|
@JackCaoG @wonjoolee95 is there a way I can directly test in the CI environment? |
|
can you try to run the whole test('test_operations.py') instead of single test? If I have to guess it is other test running before this(which is in the same process) affect the result. |
|
Yeah, I tried that locally but it still passed. |
|
There are logs in https://github.com/pytorch/xla/actions/runs/6174499874/job/16761837847?pr=5431 under |
|
Cool, thanks. |
| std::vector<torch::lazy::Value> ir_values; | ||
| for (auto& xtensor : xtensors) { | ||
| torch::lazy::Value value = xtensor->CurrentIrValue(); | ||
| torch::lazy::Value value = xtensor->GetIrValue(); |
There was a problem hiding this comment.
oh haha, I should catch this when reviewing
|
I don't have the access to the docker image. Now the error messages are all because of |
|
The failure is and log is it is suggesting that it is trying to get the parameter but the parameter buffer is already deleted. This is a bit weird, because the test code is The only thing I can guess is aliasing somehow messed up. can you try run with |
|
Jsut to double check, you mean setting the envvar when running this unit test? |
|
yea, were you able to repo locally? |
|
No, no luck. I can still run the test sucessfully locally with and without the envvar. |
seems like it is dynamic shape failing. can you run with |
|
@wonjoolee95 Do you have time to take a look at this one? |
|
Let me try to build this and PyTorch master locally and see if I can reproduce it, it should be quick (like 15 minutes or so). If not, we can just disable the test in the CI for now. |
|
Thanks @wonjoolee95 |
|
I can also see that running both the entire Since we can both verify that this passes successfully in our dev env and cannot reproduce, let's just skip the test to keep the CI happy. @seanlatias, we can skip this with |
|
@wonjoolee95 not sure what happens to the GPU test. |
|
Can you rebase with master one more time so it re-triggers the CI? |
… (and utilities) to python for Neuron
4e345ac to
13686fc
Compare
|
It's still the same. It seems the CI cancels itself automatically when it reaches 4 hr. I also see similar results in other PRs. |
|
it is a known issue, because forked pr can't use the remote cache. We can just merge. |
|
Thanks @JackCaoG @wonjoolee95 |
…posing LoweringContext… (#5431) * Adding more explicit HLO lowering control by exposing LoweringContext (and utilities) to python for Neuron * fixing linter issues * fixing spacing * apply comments and fix compilation errors * add test for new apis * fix linter * update test * update test * modify test * reverse back to GetIrValue() * update test inputs with random numbers * skip unittest because it only fails in CI --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com>
…posing LoweringContext… (#5431) (#5580) * Adding more explicit HLO lowering control by exposing LoweringContext (and utilities) to python for Neuron * fixing linter issues * fixing spacing * apply comments and fix compilation errors * add test for new apis * fix linter * update test * update test * modify test * reverse back to GetIrValue() * update test inputs with random numbers * skip unittest because it only fails in CI --------- Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com>
…posing LoweringContext… (#5431) (#5580) * Adding more explicit HLO lowering control by exposing LoweringContext (and utilities) to python for Neuron * fixing linter issues * fixing spacing * apply comments and fix compilation errors * add test for new apis * fix linter * update test * update test * modify test * reverse back to GetIrValue() * update test inputs with random numbers * skip unittest because it only fails in CI --------- Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com>
* Handle dynamo function without input (#5565) (#5577) * Make cpu tensor on XLA dynamo backend a warning instead of error (#5549) (#5576) * [author: jluntamazon] Adding more explicit HLO lowering control by exposing LoweringContext… (#5431) (#5580) * Adding more explicit HLO lowering control by exposing LoweringContext (and utilities) to python for Neuron * fixing linter issues * fixing spacing * apply comments and fix compilation errors * add test for new apis * fix linter * update test * update test * modify test * reverse back to GetIrValue() * update test inputs with random numbers * skip unittest because it only fails in CI --------- Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com> * fixing num_local_processes typo (#5573) (#5579) Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> * Move where clear pending IR is called to avoid crash (#5552) (#5582) * Move where clear pending IR is called to avoid crash * fix CI * fix CI and add some debugging messages * Fix release branch and tag patterns for GitHub Actions (#5587) (#5590) * Improve bernoulli rng-bit-generation memory footprint (#5581) (#5589) * Allow downcasting RngUniform genenration for Bernoulli Co-authored-by: Yeounoh Chung <yeounoh@google.com> * Enable xla:gpu autocast for bfloat16 if not restricted (#5570) (#5591) * Enable autocast for XLA:GPU * linter fix * XLA autocast test for GPU and TPU * linter fix * Ensure that xla autocast is properly enabled for GPU and does not crash when torch cuda is not available. * linter fix * Add tests * Support bf16 * linter fix * exclude unsupported test cases * increase GPU test timeout to 300 Co-authored-by: Yeounoh Chung <yeounoh@google.com> * Cherry-pick: Don't trigger CI build on release tag push (#5595) Copy of #5594 on release branch * formatting --------- Co-authored-by: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Co-authored-by: Wonjoo Lee <wonjoo@google.com> Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com> Co-authored-by: Manfei <41607353+ManfeiBai@users.noreply.github.com> Co-authored-by: Yeounoh Chung <yeounoh@google.com>

… (and utilities) to python for Neuron