Description
I have written a library that transform Deep Learning models into WGPU compute pipeline, called wonnx: https://github.com/haixuanTao/wonnx
The library works on a mnist model on Windows DX12 but larger model like squeezenet fail to run on DX12.
Both mnist and squeezenet works on Linux VULKAN on github action and local with NVIDIA card.
The error I get is:
ID3D12CommandAllocator::Reset: A command allocator 0x000001C4C43FE4C0:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]
I have scoped the error to the line:
device.poll(wgpu::Maintain::Wait);
From my research, I think this error has to do with the high number of compute pipeline as squeezenet is 10x larger than mnist.
I have gotten this error on a vagrant VM and github action VM and it might be caused by the virtualisation.
Repro steps
To reproduce the error, you can clone my repo:
SETX RUST_LOG debug
git clone https://github.com/haixuanTao/wonnx
git checkout 71e25a47f5ed831fa96499b77084424188b2e35d
cargo run --example squeeze
You can also run the test that should be passing
You can also check my github action here: https://github.com/haixuanTao/wonnx/actions/runs/1569686479 that has test check for both linux x86 and windows x86.
Expected vs observed behavior
I would expect Windows to either fail both MNIST and SQUEEZENET if it was an implementation problem.
Extra materials
time: pre_run: 24.2054ms
[2021-12-12T18:56:15Z INFO wgpu_core::device] Created buffer Valid((53, 2, Dx12)) with BufferDescriptor { label: Some("staging_squeezenet0_flatten0_reshape0"), size: 4000, usage: MAP_READ | COPY_DST, mapped_at_creation: false }
time: run: 159.6157ms
time: run: 200.1022ms
[2021-12-12T18:56:20Z ERROR wgpu_hal::dx12::instance] ID3D12CommandAllocator::Reset: A command allocator 0x000002035A627380:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Process is terminating. Using simple reporting. Please call ReportLiveObjects() at runtime for standard reporting.
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Live Producer at 0x00000203498D5A98, Refcount: 330.
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Live Object at 0x0000020349916220, Refcount: 0.
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Live Object at 0x0000020349F3B600, Refcount: 0.
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Live Object at 0x0000020349F787F0, Refcount: 0.
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Live Object at 0x0000020349F792F0, Refcount: 0.
[2021-12-12T18:56:20Z
.......
wgpu_hal::dx12::instance] Live Object at 0x000002034A048A00, Refcount: 0.
[2021-12-12T18:56:20Z WARN wgpu_hal::dx12::instance] Live Object : 8 error: process didn't exit successfully: `target\debug\examples\squeeze.exe` (exit code: 1)
Platform
The vagrant VM I am using is the following: https://github.com/nbigaouette/windows_vagrant_rustv
UPDATE: I have now removed the test from my CI to be able to dev
Description
I have written a library that transform Deep Learning models into WGPU compute pipeline, called wonnx: https://github.com/haixuanTao/wonnx
The library works on a mnist model on Windows DX12 but larger model like squeezenet fail to run on DX12.
Both mnist and squeezenet works on Linux VULKAN on github action and local with NVIDIA card.
The error I get is:
I have scoped the error to the line:
From my research, I think this error has to do with the high number of compute pipeline as squeezenet is 10x larger than mnist.
I have gotten this error on a vagrant VM and github action VM and it might be caused by the virtualisation.
Repro steps
To reproduce the error, you can clone my repo:
You can also run the test that should be passing
You can also check my github action here: https://github.com/haixuanTao/wonnx/actions/runs/1569686479 that has test check for both linux x86 and windows x86.
Expected vs observed behavior
I would expect Windows to either fail both MNIST and SQUEEZENET if it was an implementation problem.
Extra materials
Platform
The vagrant VM I am using is the following: https://github.com/nbigaouette/windows_vagrant_rustv
UPDATE: I have now removed the test from my CI to be able to dev