Skip to content

Conversation

@kdashg
Copy link
Contributor

@kdashg kdashg commented Nov 27, 2019

Replaces createBufferMapped, mapReadAsync, mapWriteAsync.


Preview | Diff

@Kangz
Copy link
Contributor

Kangz commented Nov 27, 2019

I'm not sure I understand the details of the proposal:

  • Under which circumstances does map return null?
  • What's the initial content of MAP_WRITE mappings?
  • What should happens if you write into a MAP_READ mapping?
  • What should happens if you have two mappings that alias (with any combination of read / write)?
  • Does this still have some notion of mapped state such that a mapped buffer cannot be used in a submit?
  • For MAP_READ how does a multi-process implementation know it should send the bytes from the GPU process to the content process? Does it need to do it eagerly after every submit?

@kvark
Copy link
Contributor

kvark commented Nov 27, 2019

Thanks @jdashg for making this PR and @Kangz for great questions! I'd like to add one more.

So the promise (no pun!) here is that the user doing fences would be able to guarantee that this map_range succeeds, right? How does the client know that it's safe to hand out the direct buffer mapping, since the API is synchronous? Does it track the buffer use and associates it to all the fences and their waits?

@kdashg
Copy link
Contributor Author

kdashg commented Nov 29, 2019

@Kangz

  1. map never returns null in this proposal, but instead allocates and copy-bloats. [1]
  2. MAP_WRITE without MAP_READ is write+discard, and so is initialized to zero.
  3. Efficient read-/write-only mappings require read-/write-only ArrayBuffers, which preliminarily seem doable. It's impossible to offer efficient read-only mappings without a read-only ArrayBuffer regardless of design.
  4. One mapping at a time for each buffer.
  5. Yes, that spec text is unchanged.
  6. Lazily to modifications, but eagerly in respect to when fences are propagated gpu->js. (Like in webgl: [2]) However, MAP_WRITE and COPY_DEST are the only writable usages valid with MAP_READ.

@kvark
Buffer tracking is required, as it is for any design that can offer direct copy/maps.
The tracking is similar to what we do in WebGL for async reads. [2]

--

[1]: I hadn't considered what map(MAP_READ) does when it's premature. I think the least-bad thing is to issue loud warnings and stall, like in GLES/WebGL. Racy choices are making mapping fallible (which, with warnings, is pretty actionable), reading zeros, or reading stale values.

There's potentially an option where we spec that what you get on map(READ) is tied to what the most recently checked Fence allows you to see. That is, map(READ) always gives you the content of the buffer as of the last Fence you checked. If you never check fences, it only gives you its initial values: zeros.

This is also something I would issue console warnings about, and it's not racier than receiving Promises.

[2]: https://jdashg.github.io/misc/async-gpu-downloads.html

@kvark
Copy link
Contributor

kvark commented Dec 2, 2019

Buffer tracking is required, as it is for any design that can offer direct copy/maps.

I think there is a big difference here:

  • The async mapping doesn't require tracking on the client, since it's async
  • The queued transfers have buffer tracking as an optional optimization path for the implementation.

This proposal, unlike the others, requires the implementation to track buffer usage and fences (on the client side) for correctness, in order to know whether mapping should succeed or error out.

Remove createBufferMapped as now redundant.
@kdashg
Copy link
Contributor Author

kdashg commented Dec 5, 2019

Async mapping can be used to polyfill this, so the tracking overhead is quite minimal.

are unchanged between unmap and map. (Or, if the buffer has not been previously
mapped, the initial buffer contents.

If the contents of a mapping without {{GPUMap/WRITE}} are changed, on unmap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super weird but if I understand correctly the intent is that applications are free to race with the GPU, but it causes a device loss so the result of the race is non-observable?

This spec language doesn't work because an application could start with a buffer full of zeroes, map it, race happily with the GPU for extended periods of time, fill the buffer back with zeros, and unmap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this preserves the behavior we have in our current design, where, if you submit a command buffer which references a mapped resource, the command buffer fails instead. I don't see anything in this proposal about persistent mappings; I only see words about synchronous mappings.

to ensure that {{GPUBuffer/map()}} provides an efficient mapping to memory with a minimal
number of copies.

If applications do not ensure proper fencing, {{GPUBuffer/map()}} may provide a less
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in combination with [1] means we have to keep a persistent copy of the buffer's data in the content process, which would be very unfortunate.


{{GPUBuffer/map()}} returns null if the device is lost.

Because MAP_WRITE is writable only by the CPU, the contents of the ArrayBuffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1]

efficient mapping. (such as requiring an extra allocation and copy) Implementations should
provide warnings to inform when this overhead is incurred.

{{GPUMap/READ}} will stall if its contents are stale. Because this condition is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stalling isn't an option in Web APIs anymore. WebGL got away with it but this is no longer a design option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a reference to this rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this blocking behavior is why the existing design returns promises. It's important to not block - we don't know how long the GPU will be using the resource. A long compute job could take seconds, and it's not okay to hang the main browser thread for seconds.

A better design would probably instead be for the map operation to fail outright.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that you already made a pull request for fallible mapping: #511

Copy link
Contributor

@litherum litherum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this proposal is workable, with a few changes marked inline. I like that this proposal distinguishes between the "my buffer is ready for me to map it" state from the actual act of mapping the buffer itself. What's in the IDL now has this problem where, in order to know when it's safe to map a buffer, you have to actually map it, which seems wasteful. I think, ultimately, we will need a fallible synchronous buffer mapping function in WebGPU because it solves this problem.

However, I'm not sure this proposal passes the test of "is easier to use correctly than incorrectly." Rather than having a pool of in-flight resources, I would expect an author who just wants to upload something for the current frame to 1) create a buffer 2) synchronously map it (successfully) 3) populate it 4) unmap it 5) use it in some GPU commands, and, crucially, forget to destroy it. This is the same problem I was worried about in #418.

I also think this proposal is more complicated than our current design, because (almost) all data uploads would require programmers using fences, which increases the complexity and lines of code for even the most trivial programs.


Only one mapping may be active per buffer at a time.

With {{GPUQueue/createFence()}} and awaiting {{GPUFence/onCompletion()}}, it's possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the idea is that, before reading from anything, the Javascript is supposed to wait on a fence to make sure the thing it's reading from has already completed. And if the JS doesn't wait on fence itself, then the runtime will wait on a fence on its behalf, inside the map() call. Is that accurate?

And for writing, the JS is supposed to wait on a fence to know that the thing it's writing to isn't being used by the GPU, and if it is, the implementation will make a copy under the hood inside the unmap() call. Is that accurate?

If that's accurate, doesn't unmap() need to be associated with a queue? So the implementation can issue the copy on that queue, and so content authors know exactly when their uploaded writes will be visible by the device.

efficient mapping. (such as requiring an extra allocation and copy) Implementations should
provide warnings to inform when this overhead is incurred.

{{GPUMap/READ}} will stall if its contents are stale. Because this condition is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this blocking behavior is why the existing design returns promises. It's important to not block - we don't know how long the GPU will be using the resource. A long compute job could take seconds, and it's not okay to hang the main browser thread for seconds.

A better design would probably instead be for the map operation to fail outright.

are unchanged between unmap and map. (Or, if the buffer has not been previously
mapped, the initial buffer contents.

If the contents of a mapping without {{GPUMap/WRITE}} are changed, on unmap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this preserves the behavior we have in our current design, where, if you submit a command buffer which references a mapped resource, the command buffer fails instead. I don't see anything in this proposal about persistent mappings; I only see words about synchronous mappings.

@litherum litherum mentioned this pull request Feb 24, 2020
@litherum
Copy link
Contributor

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser. Therefore, if the content is using fences properly, and they say `let arrayBuffer = await buffer.mapReadAsync();" then that's effectively the same as having synchronous mapping.

The difference is how the two approaches devolve if the application messes up and calls the map function when the buffer isn't ready. The synchronous mapping approach devolves by (if my comment is heeded) simply failing. The current design in the spec devolves by becoming asynchronous. In the latter case, you still see results on the screen, which seems better than the former case, where the page would appear broken.

@magcius
Copy link

magcius commented Feb 24, 2020

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser.

Because we have an implicit present after requestAnimationFrame returns, this means that you cannot wait on any Promise inside requestAnimationFrame. Even if the Promise finishes within the same frame's microtask queue, that is still after the implicit present.

Perhaps this is an argument for explicit present, but I've heard this is hard for browser vendors to adapt successfully.

The current design in the spec devolves by becoming asynchronous. In the latter case, you still see results on the screen, which seems better than the former case, where the page would appear broken.

If I make draw calls with the assumption that my buffers will have been mapped and filled in, in the best case, the buffer contents are vaguely like what I wanted, leaving it up to chance. If you submit draws the buffer contains the wrong contents, then nothing will appear correctly.

Buffer mapping needs synchronous to happen if we also have synchronous submission & presentation. If I know that I need to wait, then I need to not submit the draw calls at the same time.

@litherum
Copy link
Contributor

How can the implicit present be in the middle of a micro task queue? The browser doesn’t have the program counter.

@magcius
Copy link

magcius commented Feb 24, 2020

The browser is the one that is in charge of (synchronously) executing the requestAnimationFrame callback, and can know when execution from that callback returns back to it.

@litherum
Copy link
Contributor

Is that how WebGL’s implicit present or 2D canvas’s implicit present works?

@Kangz
Copy link
Contributor

Kangz commented Feb 24, 2020

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser. Therefore, if the content is using fences properly, and they say `let arrayBuffer = await buffer.mapReadAsync();" then that's effectively the same as having synchronous mapping.

Note that this isn't how the spec is written today. See for example step 5 of mapReadAsync: the promise resolution is put on the queue timeline, so the buffer becomes mappable as soon as all previously submitted operations, that interact with the buffer or not, are completed. It could be changed but would need to be carefully considered.

@litherum
Copy link
Contributor

litherum commented Feb 24, 2020

the buffer becomes mappable as soon as all previously submitted operations, that interact with the buffer or not, are completed

Wouldn't fences also operate on the queue timeline? So its already internally consistent and you would get the right answer?

I believe the whole point of fences is to track resource usage. If you can't use fences to track resource usage, then it seems we've designed fences poorly.

@litherum
Copy link
Contributor

@magcius

Even if the Promise finishes within the same frame's microtask queue, that is still after the implicit present.

Not true!

The HTML5 spec defines the order these things occur in. Step 10.11 is "run the animation frame callbacks." Within that, step 3.3 is "Invoke callback." Within that, step 11 is the actual javascript function call, and step 14.2 is "Clean up after running script." Within cleaning up after running script, step 3 is "perform a microtask checkpoint" which drains the microtask queue.

So there is a microtask queue drain between the rAF() callback and the present.

@magcius
Copy link

magcius commented Feb 25, 2020

I stand corrected! My recollection was indeed wrong. That said, I still think that synchronous buffer mapping is much more helpful for the immediate case of "should I submit this draw call or not".

If it fails to map, I can create a different buffer with the contents I want, build bindings for it, and then submit my draw call. And if that fails, I can choose what to do then and there, including not submitting the draw.

Asynchronous draw call submission seems difficult to get right, considering how stateful GPUCommandBuffer is.

@litherum
Copy link
Contributor

If it fails to map, I can ...

Right, this makes sense. I don't think it's specific to a the synchronous mapping proposal, though. We could totally add a ifITriedToMapWouldItHappenImmediately() which returns a bool, or a tryMap() which rejects the promise if it can't happen immediately.

@Kangz
Copy link
Contributor

Kangz commented Feb 26, 2020

After thinking some more about this, I think you can already do synchronous mapping in the existing design. If the application uses fences to know when a buffer isn't in use, then calling mapWrite*() should cause the promise to be resolved immediately. Then, because of the HTML event loop, the handler is guaranteed to run before control is returned to the browser. Therefore, if the content is using fences properly, and they say `let arrayBuffer = await buffer.mapReadAsync();" then that's effectively the same as having synchronous mapping.

Thinking about this more, it is not tractable to have remote implementations know whether it is possible for mapWrite to resolve the promise immediately if the fence is passed. How does the client-side of a remote implementation know which fence value it needs to wait for before it can resolve the promise immediately? Basically the same comment as #511 (comment)

@litherum
Copy link
Contributor

litherum commented Mar 2, 2020

After thinking some more about this, I think you can already do synchronous mapping in the existing design.

This is only true for a non-GPU-Process architecture. Any GPU process architecture must necessarily round-trip to the GPU process for each and every map call, before the JS can use the mapping (assuming no extra copies are allowed)

@Kangz
Copy link
Contributor

Kangz commented Apr 20, 2020

Closing since we agreed on a proposal in the spirit of #605

@Kangz Kangz closed this Apr 20, 2020
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022
…h stencil formats (gpuweb#506)

* Add validation tests on buffer offset in B2T and T2B copies with depth stencil formats

* Small fix

* Validate the offset must be a multiple of 4

* Address reviewer's comments

* Add selectDevice in other tests

* fix typo

Co-authored-by: Kai Ninomiya <kainino@chromium.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants