Skip to content

Torchbench dynamo training failure with openxla dynamo backend #5410

@JackCaoG

Description

@JackCaoG

🐛 Bug

with command

python benchmarks/dynamo/torchbench.py --randomize-input --performance --training --trace-on-xla --backend=openxla --only resnet50

after setting up the torchbench, we can see a error

RuntimeError: ./torch_xla/csrc/runtime/pjrt_computation_client.h:166 : Check failed: HasValue() 
*** Begin stack trace ***
	tsl::CurrentStackTrace[abi:cxx11]()
	torch_xla::runtime::PjRtComputationClient::PjRtData::GetOpaqueHandle()
	torch::lazy::LazyGraphExecutor::RunPostOrder(std::vector<torch::lazy::Value, std::allocator<torch::lazy::Value> > const&, torch::lazy::LazyGraphExecutor::SyncTensorCollection*)
	torch_xla::XLAGraphExecutor::RunPostOrder(std::vector<torch::lazy::Value, std::allocator<torch::lazy::Value> > const&, torch::lazy::LazyGraphExecutor::SyncTensorCollection*)
	torch_xla::XLAGraphExecutor::GetGraphHash(std::vector<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> >, std::allocator<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > > > const&)

Note that we need to patch the benchmark script similar to #4174 (comment)

@shunting314 @jansel @wconstab @wonjoolee95 @alanwaketan

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions