-
Notifications
You must be signed in to change notification settings - Fork 361
Synchronous GPUBuffer.map(). #506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm not sure I understand the details of the proposal:
|
|
Thanks @jdashg for making this PR and @Kangz for great questions! I'd like to add one more. So the promise (no pun!) here is that the user doing fences would be able to guarantee that this |
@kvark -- [1]: I hadn't considered what map(MAP_READ) does when it's premature. I think the least-bad thing is to issue loud warnings and stall, like in GLES/WebGL. Racy choices are making mapping fallible (which, with warnings, is pretty actionable), reading zeros, or reading stale values. There's potentially an option where we spec that what you get on map(READ) is tied to what the most recently checked Fence allows you to see. That is, map(READ) always gives you the content of the buffer as of the last Fence you checked. If you never check fences, it only gives you its initial values: zeros. This is also something I would issue console warnings about, and it's not racier than receiving Promises. |
I think there is a big difference here:
This proposal, unlike the others, requires the implementation to track buffer usage and fences (on the client side) for correctness, in order to know whether mapping should succeed or error out. |
Remove createBufferMapped as now redundant.
|
Async mapping can be used to polyfill this, so the tracking overhead is quite minimal. |
| are unchanged between unmap and map. (Or, if the buffer has not been previously | ||
| mapped, the initial buffer contents. | ||
|
|
||
| If the contents of a mapping without {{GPUMap/WRITE}} are changed, on unmap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super weird but if I understand correctly the intent is that applications are free to race with the GPU, but it causes a device loss so the result of the race is non-observable?
This spec language doesn't work because an application could start with a buffer full of zeroes, map it, race happily with the GPU for extended periods of time, fill the buffer back with zeros, and unmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this preserves the behavior we have in our current design, where, if you submit a command buffer which references a mapped resource, the command buffer fails instead. I don't see anything in this proposal about persistent mappings; I only see words about synchronous mappings.
| to ensure that {{GPUBuffer/map()}} provides an efficient mapping to memory with a minimal | ||
| number of copies. | ||
|
|
||
| If applications do not ensure proper fencing, {{GPUBuffer/map()}} may provide a less |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in combination with [1] means we have to keep a persistent copy of the buffer's data in the content process, which would be very unfortunate.
|
|
||
| {{GPUBuffer/map()}} returns null if the device is lost. | ||
|
|
||
| Because MAP_WRITE is writable only by the CPU, the contents of the ArrayBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1]
| efficient mapping. (such as requiring an extra allocation and copy) Implementations should | ||
| provide warnings to inform when this overhead is incurred. | ||
|
|
||
| {{GPUMap/READ}} will stall if its contents are stale. Because this condition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stalling isn't an option in Web APIs anymore. WebGL got away with it but this is no longer a design option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include a reference to this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this blocking behavior is why the existing design returns promises. It's important to not block - we don't know how long the GPU will be using the resource. A long compute job could take seconds, and it's not okay to hang the main browser thread for seconds.
A better design would probably instead be for the map operation to fail outright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that you already made a pull request for fallible mapping: #511
litherum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this proposal is workable, with a few changes marked inline. I like that this proposal distinguishes between the "my buffer is ready for me to map it" state from the actual act of mapping the buffer itself. What's in the IDL now has this problem where, in order to know when it's safe to map a buffer, you have to actually map it, which seems wasteful. I think, ultimately, we will need a fallible synchronous buffer mapping function in WebGPU because it solves this problem.
However, I'm not sure this proposal passes the test of "is easier to use correctly than incorrectly." Rather than having a pool of in-flight resources, I would expect an author who just wants to upload something for the current frame to 1) create a buffer 2) synchronously map it (successfully) 3) populate it 4) unmap it 5) use it in some GPU commands, and, crucially, forget to destroy it. This is the same problem I was worried about in #418.
I also think this proposal is more complicated than our current design, because (almost) all data uploads would require programmers using fences, which increases the complexity and lines of code for even the most trivial programs.
|
|
||
| Only one mapping may be active per buffer at a time. | ||
|
|
||
| With {{GPUQueue/createFence()}} and awaiting {{GPUFence/onCompletion()}}, it's possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the idea is that, before reading from anything, the Javascript is supposed to wait on a fence to make sure the thing it's reading from has already completed. And if the JS doesn't wait on fence itself, then the runtime will wait on a fence on its behalf, inside the map() call. Is that accurate?
And for writing, the JS is supposed to wait on a fence to know that the thing it's writing to isn't being used by the GPU, and if it is, the implementation will make a copy under the hood inside the unmap() call. Is that accurate?
If that's accurate, doesn't unmap() need to be associated with a queue? So the implementation can issue the copy on that queue, and so content authors know exactly when their uploaded writes will be visible by the device.
| efficient mapping. (such as requiring an extra allocation and copy) Implementations should | ||
| provide warnings to inform when this overhead is incurred. | ||
|
|
||
| {{GPUMap/READ}} will stall if its contents are stale. Because this condition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this blocking behavior is why the existing design returns promises. It's important to not block - we don't know how long the GPU will be using the resource. A long compute job could take seconds, and it's not okay to hang the main browser thread for seconds.
A better design would probably instead be for the map operation to fail outright.
| are unchanged between unmap and map. (Or, if the buffer has not been previously | ||
| mapped, the initial buffer contents. | ||
|
|
||
| If the contents of a mapping without {{GPUMap/WRITE}} are changed, on unmap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this preserves the behavior we have in our current design, where, if you submit a command buffer which references a mapped resource, the command buffer fails instead. I don't see anything in this proposal about persistent mappings; I only see words about synchronous mappings.
|
After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling The difference is how the two approaches devolve if the application messes up and calls the map function when the buffer isn't ready. The synchronous mapping approach devolves by (if my comment is heeded) simply failing. The current design in the spec devolves by becoming asynchronous. In the latter case, you still see results on the screen, which seems better than the former case, where the page would appear broken. |
Because we have an implicit present after requestAnimationFrame returns, this means that you cannot wait on any Promise inside requestAnimationFrame. Even if the Promise finishes within the same frame's microtask queue, that is still after the implicit present. Perhaps this is an argument for explicit present, but I've heard this is hard for browser vendors to adapt successfully.
If I make draw calls with the assumption that my buffers will have been mapped and filled in, in the best case, the buffer contents are vaguely like what I wanted, leaving it up to chance. If you submit draws the buffer contains the wrong contents, then nothing will appear correctly. Buffer mapping needs synchronous to happen if we also have synchronous submission & presentation. If I know that I need to wait, then I need to not submit the draw calls at the same time. |
|
How can the implicit present be in the middle of a micro task queue? The browser doesn’t have the program counter. |
|
The browser is the one that is in charge of (synchronously) executing the requestAnimationFrame callback, and can know when execution from that callback returns back to it. |
|
Is that how WebGL’s implicit present or 2D canvas’s implicit present works? |
Note that this isn't how the spec is written today. See for example step 5 of |
Wouldn't fences also operate on the queue timeline? So its already internally consistent and you would get the right answer? I believe the whole point of fences is to track resource usage. If you can't use fences to track resource usage, then it seems we've designed fences poorly. |
Not true! The HTML5 spec defines the order these things occur in. Step 10.11 is "run the animation frame callbacks." Within that, step 3.3 is "Invoke callback." Within that, step 11 is the actual javascript function call, and step 14.2 is "Clean up after running script." Within cleaning up after running script, step 3 is "perform a microtask checkpoint" which drains the microtask queue. So there is a microtask queue drain between the |
|
I stand corrected! My recollection was indeed wrong. That said, I still think that synchronous buffer mapping is much more helpful for the immediate case of "should I submit this draw call or not". If it fails to map, I can create a different buffer with the contents I want, build bindings for it, and then submit my draw call. And if that fails, I can choose what to do then and there, including not submitting the draw. Asynchronous draw call submission seems difficult to get right, considering how stateful GPUCommandBuffer is. |
Right, this makes sense. I don't think it's specific to a the synchronous mapping proposal, though. We could totally add a |
Thinking about this more, it is not tractable to have remote implementations know whether it is possible for |
This is only true for a non-GPU-Process architecture. Any GPU process architecture must necessarily round-trip to the GPU process for each and every map call, before the JS can use the mapping (assuming no extra copies are allowed) |
|
Closing since we agreed on a proposal in the spirit of #605 |
…h stencil formats (gpuweb#506) * Add validation tests on buffer offset in B2T and T2B copies with depth stencil formats * Small fix * Validate the offset must be a multiple of 4 * Address reviewer's comments * Add selectDevice in other tests * fix typo Co-authored-by: Kai Ninomiya <kainino@chromium.org>
Replaces createBufferMapped, mapReadAsync, mapWriteAsync.
Preview | Diff