Synchronous GPUBuffer.map(). #506

kdashg · 2019-11-27T10:19:04Z

Replaces createBufferMapped, mapReadAsync, mapWriteAsync.

Kangz · 2019-11-27T16:24:01Z

I'm not sure I understand the details of the proposal:

Under which circumstances does map return null?
What's the initial content of MAP_WRITE mappings?
What should happens if you write into a MAP_READ mapping?
What should happens if you have two mappings that alias (with any combination of read / write)?
Does this still have some notion of mapped state such that a mapped buffer cannot be used in a submit?
For MAP_READ how does a multi-process implementation know it should send the bytes from the GPU process to the content process? Does it need to do it eagerly after every submit?

kvark · 2019-11-27T16:55:14Z

Thanks @jdashg for making this PR and @Kangz for great questions! I'd like to add one more.

So the promise (no pun!) here is that the user doing fences would be able to guarantee that this map_range succeeds, right? How does the client know that it's safe to hand out the direct buffer mapping, since the API is synchronous? Does it track the buffer use and associates it to all the fences and their waits?

kdashg · 2019-11-29T22:58:43Z

@Kangz

map never returns null in this proposal, but instead allocates and copy-bloats. [1]
MAP_WRITE without MAP_READ is write+discard, and so is initialized to zero.
Efficient read-/write-only mappings require read-/write-only ArrayBuffers, which preliminarily seem doable. It's impossible to offer efficient read-only mappings without a read-only ArrayBuffer regardless of design.
One mapping at a time for each buffer.
Yes, that spec text is unchanged.
Lazily to modifications, but eagerly in respect to when fences are propagated gpu->js. (Like in webgl: [2]) However, MAP_WRITE and COPY_DEST are the only writable usages valid with MAP_READ.

@kvark
Buffer tracking is required, as it is for any design that can offer direct copy/maps.
The tracking is similar to what we do in WebGL for async reads. [2]

--

[1]: I hadn't considered what map(MAP_READ) does when it's premature. I think the least-bad thing is to issue loud warnings and stall, like in GLES/WebGL. Racy choices are making mapping fallible (which, with warnings, is pretty actionable), reading zeros, or reading stale values.

There's potentially an option where we spec that what you get on map(READ) is tied to what the most recently checked Fence allows you to see. That is, map(READ) always gives you the content of the buffer as of the last Fence you checked. If you never check fences, it only gives you its initial values: zeros.

This is also something I would issue console warnings about, and it's not racier than receiving Promises.

[2]: https://jdashg.github.io/misc/async-gpu-downloads.html

kvark · 2019-12-02T17:33:11Z

Buffer tracking is required, as it is for any design that can offer direct copy/maps.

I think there is a big difference here:

The async mapping doesn't require tracking on the client, since it's async
The queued transfers have buffer tracking as an optional optimization path for the implementation.

This proposal, unlike the others, requires the implementation to track buffer usage and fences (on the client side) for correctness, in order to know whether mapping should succeed or error out.

Remove createBufferMapped as now redundant.

kdashg · 2019-12-05T07:40:09Z

Async mapping can be used to polyfill this, so the tracking overhead is quite minimal.

Kangz · 2019-12-05T14:44:34Z

spec/index.bs

+are unchanged between unmap and map. (Or, if the buffer has not been previously
+mapped, the initial buffer contents.
+
+If the contents of a mapping without {{GPUMap/WRITE}} are changed, on unmap(),


This is super weird but if I understand correctly the intent is that applications are free to race with the GPU, but it causes a device loss so the result of the race is non-observable?

This spec language doesn't work because an application could start with a buffer full of zeroes, map it, race happily with the GPU for extended periods of time, fill the buffer back with zeros, and unmap.

I assume this preserves the behavior we have in our current design, where, if you submit a command buffer which references a mapped resource, the command buffer fails instead. I don't see anything in this proposal about persistent mappings; I only see words about synchronous mappings.

Kangz · 2019-12-05T14:45:50Z

spec/index.bs

+to ensure that {{GPUBuffer/map()}} provides an efficient mapping to memory with a minimal
+number of copies.
+
+If applications do not ensure proper fencing, {{GPUBuffer/map()}} may provide a less


This in combination with [1] means we have to keep a persistent copy of the buffer's data in the content process, which would be very unfortunate.

Kangz · 2019-12-05T14:46:00Z

spec/index.bs

+
+{{GPUBuffer/map()}} returns null if the device is lost.
+
+Because MAP_WRITE is writable only by the CPU, the contents of the ArrayBuffer


Kangz · 2019-12-05T14:47:13Z

spec/index.bs

+efficient mapping. (such as requiring an extra allocation and copy) Implementations should
+provide warnings to inform when this overhead is incurred.
+
+{{GPUMap/READ}} will stall if its contents are stale. Because this condition is


Stalling isn't an option in Web APIs anymore. WebGL got away with it but this is no longer a design option.

Please include a reference to this rule.

https://w3ctag.github.io/design-principles/#wrapper-apis

Right, this blocking behavior is why the existing design returns promises. It's important to not block - we don't know how long the GPU will be using the resource. A long compute job could take seconds, and it's not okay to hang the main browser thread for seconds.

A better design would probably instead be for the map operation to fail outright.

I didn't realize that you already made a pull request for fallible mapping: #511

litherum

I think this proposal is workable, with a few changes marked inline. I like that this proposal distinguishes between the "my buffer is ready for me to map it" state from the actual act of mapping the buffer itself. What's in the IDL now has this problem where, in order to know when it's safe to map a buffer, you have to actually map it, which seems wasteful. I think, ultimately, we will need a fallible synchronous buffer mapping function in WebGPU because it solves this problem.

However, I'm not sure this proposal passes the test of "is easier to use correctly than incorrectly." Rather than having a pool of in-flight resources, I would expect an author who just wants to upload something for the current frame to 1) create a buffer 2) synchronously map it (successfully) 3) populate it 4) unmap it 5) use it in some GPU commands, and, crucially, forget to destroy it. This is the same problem I was worried about in #418.

I also think this proposal is more complicated than our current design, because (almost) all data uploads would require programmers using fences, which increases the complexity and lines of code for even the most trivial programs.

litherum · 2020-02-24T05:05:50Z

spec/index.bs

+
+Only one mapping may be active per buffer at a time.
+
+With {{GPUQueue/createFence()}} and awaiting {{GPUFence/onCompletion()}}, it's possible


It sounds like the idea is that, before reading from anything, the Javascript is supposed to wait on a fence to make sure the thing it's reading from has already completed. And if the JS doesn't wait on fence itself, then the runtime will wait on a fence on its behalf, inside the map() call. Is that accurate?

And for writing, the JS is supposed to wait on a fence to know that the thing it's writing to isn't being used by the GPU, and if it is, the implementation will make a copy under the hood inside the unmap() call. Is that accurate?

If that's accurate, doesn't unmap() need to be associated with a queue? So the implementation can issue the copy on that queue, and so content authors know exactly when their uploaded writes will be visible by the device.

litherum · 2020-02-24T05:17:26Z

spec/index.bs

+efficient mapping. (such as requiring an extra allocation and copy) Implementations should
+provide warnings to inform when this overhead is incurred.
+
+{{GPUMap/READ}} will stall if its contents are stale. Because this condition is


Right, this blocking behavior is why the existing design returns promises. It's important to not block - we don't know how long the GPU will be using the resource. A long compute job could take seconds, and it's not okay to hang the main browser thread for seconds.

A better design would probably instead be for the map operation to fail outright.

litherum · 2020-02-24T05:23:48Z

spec/index.bs

+are unchanged between unmap and map. (Or, if the buffer has not been previously
+mapped, the initial buffer contents.
+
+If the contents of a mapping without {{GPUMap/WRITE}} are changed, on unmap(),


I assume this preserves the behavior we have in our current design, where, if you submit a command buffer which references a mapped resource, the command buffer fails instead. I don't see anything in this proposal about persistent mappings; I only see words about synchronous mappings.

litherum · 2020-02-24T07:45:40Z

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser. Therefore, if the content is using fences properly, and they say `let arrayBuffer = await buffer.mapReadAsync();" then that's effectively the same as having synchronous mapping.

The difference is how the two approaches devolve if the application messes up and calls the map function when the buffer isn't ready. The synchronous mapping approach devolves by (if my comment is heeded) simply failing. The current design in the spec devolves by becoming asynchronous. In the latter case, you still see results on the screen, which seems better than the former case, where the page would appear broken.

magcius · 2020-02-24T16:51:48Z

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser.

Because we have an implicit present after requestAnimationFrame returns, this means that you cannot wait on any Promise inside requestAnimationFrame. Even if the Promise finishes within the same frame's microtask queue, that is still after the implicit present.

Perhaps this is an argument for explicit present, but I've heard this is hard for browser vendors to adapt successfully.

The current design in the spec devolves by becoming asynchronous. In the latter case, you still see results on the screen, which seems better than the former case, where the page would appear broken.

If I make draw calls with the assumption that my buffers will have been mapped and filled in, in the best case, the buffer contents are vaguely like what I wanted, leaving it up to chance. If you submit draws the buffer contains the wrong contents, then nothing will appear correctly.

Buffer mapping needs synchronous to happen if we also have synchronous submission & presentation. If I know that I need to wait, then I need to not submit the draw calls at the same time.

litherum · 2020-02-24T16:53:26Z

How can the implicit present be in the middle of a micro task queue? The browser doesn’t have the program counter.

magcius · 2020-02-24T16:54:58Z

The browser is the one that is in charge of (synchronously) executing the requestAnimationFrame callback, and can know when execution from that callback returns back to it.

litherum · 2020-02-24T16:56:16Z

Is that how WebGL’s implicit present or 2D canvas’s implicit present works?

Kangz · 2020-02-24T17:44:44Z

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser. Therefore, if the content is using fences properly, and they say `let arrayBuffer = await buffer.mapReadAsync();" then that's effectively the same as having synchronous mapping.

Note that this isn't how the spec is written today. See for example step 5 of mapReadAsync: the promise resolution is put on the queue timeline, so the buffer becomes mappable as soon as all previously submitted operations, that interact with the buffer or not, are completed. It could be changed but would need to be carefully considered.

litherum · 2020-02-24T19:53:46Z

the buffer becomes mappable as soon as all previously submitted operations, that interact with the buffer or not, are completed

Wouldn't fences also operate on the queue timeline? So its already internally consistent and you would get the right answer?

I believe the whole point of fences is to track resource usage. If you can't use fences to track resource usage, then it seems we've designed fences poorly.

litherum · 2020-02-25T01:50:40Z

@magcius

Even if the Promise finishes within the same frame's microtask queue, that is still after the implicit present.

Not true!

The HTML5 spec defines the order these things occur in. Step 10.11 is "run the animation frame callbacks." Within that, step 3.3 is "Invoke callback." Within that, step 11 is the actual javascript function call, and step 14.2 is "Clean up after running script." Within cleaning up after running script, step 3 is "perform a microtask checkpoint" which drains the microtask queue.

So there is a microtask queue drain between the rAF() callback and the present.

magcius · 2020-02-25T02:17:09Z

I stand corrected! My recollection was indeed wrong. That said, I still think that synchronous buffer mapping is much more helpful for the immediate case of "should I submit this draw call or not".

If it fails to map, I can create a different buffer with the contents I want, build bindings for it, and then submit my draw call. And if that fails, I can choose what to do then and there, including not submitting the draw.

Asynchronous draw call submission seems difficult to get right, considering how stateful GPUCommandBuffer is.

litherum · 2020-02-25T02:50:35Z

If it fails to map, I can ...

Right, this makes sense. I don't think it's specific to a the synchronous mapping proposal, though. We could totally add a ifITriedToMapWouldItHappenImmediately() which returns a bool, or a tryMap() which rejects the promise if it can't happen immediately.

Kangz · 2020-02-26T16:20:45Z

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser. Therefore, if the content is using fences properly, and they say `let arrayBuffer = await buffer.mapReadAsync();" then that's effectively the same as having synchronous mapping.

Thinking about this more, it is not tractable to have remote implementations know whether it is possible for mapWrite to resolve the promise immediately if the fence is passed. How does the client-side of a remote implementation know which fence value it needs to wait for before it can resolve the promise immediately? Basically the same comment as #511 (comment)

litherum · 2020-03-02T07:41:06Z

After thinking some more about this, I think you can already do synchronous mapping in the existing design.

This is only true for a non-GPU-Process architecture. Any GPU process architecture must necessarily round-trip to the GPU process for each and every map call, before the JS can use the mapping (assuming no extra copies are allowed)

Kangz · 2020-04-20T13:29:42Z

Closing since we agreed on a proposal in the spirit of #605

…h stencil formats (gpuweb#506) * Add validation tests on buffer offset in B2T and T2B copies with depth stencil formats * Small fix * Validate the offset must be a multiple of 4 * Address reviewer's comments * Add selectDevice in other tests * fix typo Co-authored-by: Kai Ninomiya <kainino@chromium.org>

Synchronous GPUBuffer.map().

9a9984b

Add clarifying spec language.

26567b3

Remove createBufferMapped as now redundant.

kdashg force-pushed the sync-mapping branch from d252dba to 26567b3 Compare December 5, 2019 07:37

Kangz reviewed Dec 5, 2019

View reviewed changes

litherum reviewed Feb 24, 2020

View reviewed changes

litherum mentioned this pull request Feb 24, 2020

Can't map half a buffer #555

Closed

kainino0x mentioned this pull request Mar 7, 2020

rAF and microtasks (discussion thread) #596

Closed

kvark mentioned this pull request Apr 9, 2020

Replace current gpu future implementation with one that doesn't allocate. gfx-rs/wgpu-rs#246

Closed

kainino0x mentioned this pull request Apr 13, 2020

Recycling ArrayBuffer in createBufferMapped #697

Open

Kangz closed this Apr 20, 2020


		{{GPUBuffer/map()}} returns null if the device is lost.

		Because MAP_WRITE is writable only by the CPU, the contents of the ArrayBuffer


		Only one mapping may be active per buffer at a time.

		With {{GPUQueue/createFence()}} and awaiting {{GPUFence/onCompletion()}}, it's possible

Synchronous GPUBuffer.map(). #506

Synchronous GPUBuffer.map(). #506

Uh oh!

Conversation

kdashg commented Nov 27, 2019 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kangz commented Nov 27, 2019

Uh oh!

kvark commented Nov 27, 2019

Uh oh!

kdashg commented Nov 29, 2019

Uh oh!

kvark commented Dec 2, 2019

Uh oh!

kdashg commented Dec 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

litherum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

litherum commented Feb 24, 2020

Uh oh!

magcius commented Feb 24, 2020

Uh oh!

litherum commented Feb 24, 2020

Uh oh!

magcius commented Feb 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litherum commented Feb 24, 2020

Uh oh!

Kangz commented Feb 24, 2020

Uh oh!

litherum commented Feb 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litherum commented Feb 25, 2020

Uh oh!

magcius commented Feb 25, 2020

Uh oh!

litherum commented Feb 25, 2020

Uh oh!

Kangz commented Feb 26, 2020

Uh oh!

litherum commented Mar 2, 2020

Uh oh!

Kangz commented Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kdashg commented Nov 27, 2019 •

edited by pr-preview bot

Loading

magcius commented Feb 24, 2020 •

edited

Loading

litherum commented Feb 24, 2020 •

edited

Loading