Skip to content

react-dnd manager start handler#3119

Merged
raineorshine merged 18 commits intocybersemics:mainfrom
ethan-james:3072-bug-dnd-manager-start-handler
Jul 23, 2025
Merged

react-dnd manager start handler#3119
raineorshine merged 18 commits intocybersemics:mainfrom
ethan-james:3072-bug-dnd-manager-start-handler

Conversation

@ethan-james
Copy link
Collaborator

@ethan-james ethan-james commented Jul 8, 2025

Fixes #3072 and #3073

There was a brief gap between the start of a touch and the firing of react-dnd's start event, so there was a window where long press had begun (bullet highlighted) but drag-and-drop had not been initialized.

Now TouchBackendImpl.ts has been patched to emit a dragStart event when drag-and-drop initializes. useLongPress listens for that event on mobile devices in order to synchronize long press timing with drag-and-drop.

@ethan-james ethan-james self-assigned this Jul 8, 2025
@ethan-james ethan-james marked this pull request as draft July 8, 2025 20:39
@ethan-james ethan-james marked this pull request as ready for review July 9, 2025 18:03
@ethan-james
Copy link
Collaborator Author

I'm going to start testing this thoroughly, and you can feel free to do the same.

@ethan-james
Copy link
Collaborator Author

It does still seem to be possible to get into a state where the button highlights, but drag-and-drop is not activated. I think it's harder, but I can still do it semi-reliably.

@raineorshine
Copy link
Contributor

Interesting. That suggests that the timeout is finishing before react-dnd has a chance to stop propagation and prevent MultiGesture from kicking in.

Let's keep working on this. Thanks.

@ethan-james
Copy link
Collaborator Author

This uses my other fixes (#3113) so that MultiGesture is disabled at the same time that the bullet highlights. I'm under the impression that we're looking at one of two issues:

  1. same race condition between our timeout and the react-dnd timeout, or
  2. handleTopMoveStart fires at the correct time, but that's not quite enough to initialize drag-and-drop

I'll take a look at MultiGesture first to make sure that it's not blocking things independent of shouldCancelGesture, which I'm pretty sure is working correctly because you can't actually draw any gestures after the bullet highlights. After that, I'll try to dig into the timing of react-dnd to see why we're still getting this discrepancy.

@raineorshine
Copy link
Contributor

This uses my other fixes (#3113) so that MultiGesture is disabled at the same time that the bullet highlights.

Got it, thanks.

  1. same race condition between our timeout and the react-dnd timeout, or

We could create minute timing offset, though that seems a bit hacky.

2. handleTopMoveStart fires at the correct time, but that's not quite enough to initialize drag-and-drop

Yeah, there is some subtle timing thing here that's difficult to identify.

Thanks for taking another look!

@ethan-james
Copy link
Collaborator Author

We could create minute timing offset, though that seems a bit hacky.

Right, maybe I'll try that to make sure that it solves the issue, and then we can discuss whether we like it.

@ethan-james
Copy link
Collaborator Author

ethan-james commented Jul 10, 2025

Disabling MultiGesture entirely makes it very hard to reproduce, but mainly because the viewport scrolls whenever you drag. I did get the bug to occur once.

Reducing the react-dnd timeout causes problems in the other direction, where you can drag-and-drop and draw a gesture at the same time, but the bullet isn't highlighted. Allowing longPress transitions from Inactive -> DragInProgress (skipping DragHold) fixes that and makes this a viable solutiion. I was still able to reproduce the bug after reducing the timeout by 10ms, but not after reducing it by 20ms.

Next I'll see if I can find out any more information about why drag-and-drop takes so long to begin.

@ethan-james
Copy link
Collaborator Author

TLDR: TouchBackend cancels handleTopMoveStart as soon as it detects any movement, and only consults its touchSlop option later. This will throttle dragging after it has begun, but it's very easy to prevent the drag from initializing.

Here is a healthy drag:

Screenshot 2025-07-10 at 2 29 14 PM
  1. handleTopMoveStart in TouchBackend runs, initializing react-dnd
  2. onStart in useLongPress runs, beginning the long press
  3. longPress transition, left over from debugging but kind of redundant with step 2
  4. move in useLongPress runs and exits early because SCROLL_THRESHOLD is not met
  5. handleTopMove runs and immediately cleans up the timeout in TouchBackend, which has already fired
  6. move
  7. move
  8. move
  9. SCROLL_THRESHOLD is met in move, clears timeout in useLongPress, which has already fired

Next, here is a long press that is cancelled by movement:

Screenshot 2025-07-10 at 2 28 08 PM
  1. move in useLongPress runs and exits early because SCROLL_THRESHOLD is not met
  2. handleTopMove runs and immediately cleans up the timeout in TouchBackend, cancelling handleTopMoveStart
  3. move
  4. move
  5. SCROLL_THRESHOLD is met in move, clears timeout in useLongPress, cancelling onStart

And finally, here is the bug:

Screenshot 2025-07-10 at 2 15 45 PM
  1. our move handler in useLongPress runs but exits early because SCROLL_THRESHOLD is not met
  2. handleTopMove in TouchBackendImpl runs, clears handleTopMoveStart timeout
  3. our move handler runs again, still hasn't met SCROLL_THRESHOLD
  4. onStart runs inside the timeout in useLongPress, long press has officially begun on our end
  5. longPress transition, left over from debugging other stuff
  6. our move handler runs a third time
  7. move has passed SCROLL_THRESHOLD so clearTimeout runs (onStart has already triggered, so this has no effect)

So because TouchBackend cancels handleTopMoveStart before touchSlop is exceeded, it's going to be difficult to ensure that drag-and-drop always starts under the same circumstances as our long press.

@raineorshine
Copy link
Contributor

Reducing the react-dnd timeout causes problems in the other direction, where you can drag-and-drop and draw a gesture at the same time, but the bullet isn't highlighted.

Oops, that's not good either.

Allowing longPress transitions from Inactive -> DragInProgress (skipping DragHold) fixes that and makes this a viable solutiion. I was still able to reproduce the bug after reducing the timeout by 10ms, but not after reducing it by 20ms.

Sounds promising. I think the exact timing will be affected by performance unfortunately. If the app slows down, it might create more false positives.

We could also just delay the appearance of the highlight, without touching the long press timing if that's the only thing affected. Though maybe there is an alert or something else tied to the timeout.

So because TouchBackend cancels handleTopMoveStart before touchSlop is exceeded, it's going to be difficult to ensure that drag-and-drop always starts under the same circumstances as our long press.

Is this a bug in TouchBackendImpl?

Would it be possible to lean on touchSlop instead of using our own SCROLL_THRESHOLD?

Thanks and let me know what you recommend.

@ethan-james
Copy link
Collaborator Author

Is this a bug in TouchBackendImpl?

I'd say so. I'm not sure what the rationale is for cancelling handleTopMoveStart so early, but it doesn't match with the behavior that I would expect.

Would it be possible to lean on touchSlop instead of using our own SCROLL_THRESHOLD?

I don't think so. The only options that I can think of are:

  1. cancel our own long press immediately to match the behavior of TouchBackend (remove SCROLL_THRESHOLD)
  2. patch TouchBackend to stop it from cancelling handleTopMoveStart until touchSlop is exceeded

The third option would be to tweak the timing as we've discussed above, however, now that I've realized that this is not quite a race condition, I've also realized that it's still possible to break it with any timing offset. Reducing the timeout so that drag-and-drop initializes sooner will fix the initial repro steps, but it's still possible to wiggle your finger juuuuuust right and break it.

Video.18.mov

@raineorshine
Copy link
Contributor

2. patch TouchBackend to stop it from cancelling handleTopMoveStart until touchSlop is exceeded

I'm open to this. I've already had to patch TouchBackend once, and it seems like a genuine bug.

@ethan-james
Copy link
Collaborator Author

  1. patch TouchBackend to stop it from cancelling handleTopMoveStart until touchSlop is exceeded

I'm open to this. I've already had to patch TouchBackend once, and it seems like a genuine bug.

OK, should I follow these instructions and put the patch in .yarn/patches?

@raineorshine
Copy link
Contributor

Yes, thanks! Once you edit the dependency file, it should generate a patch for you. It might even put it directly in .yarn/patches, but I don't remember exactly.

See: https://yarnpkg.com/cli/patch

@ethan-james
Copy link
Collaborator Author

ethan-james commented Jul 11, 2025

I think it worked! I also needed to add this bit of code because handleTopMoveStart hadn't run and initialized _mouseClientOffset:

if (clientOffset) {
	if (!this._mouseClientOffset.hasOwnProperty('x') && !this._mouseClientOffset.hasOwnProperty('y')) {
		this._mouseClientOffset = clientOffset
	}
}

Copy link
Contributor

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This seems promising, but I'm getting some weird behavior with this change. Half the time I try this, the long press gets canceled and gestures stop working.

Steps to Reproduce

  1. Refresh the page
  2. Draw and cancel a gesture
  3. Long press and drag a thought with the timing of the original issue.
  4. Attempt to input a gesture in the gesture zone.

Current Behavior

The long press gets canceled for some reason, then gestures stop working.

In this video, after the long press I'm swiping on the gesture zone.

trim.D3DD6052-2788-4C53-9722-F75C0649CE05.MOV

@ethan-james
Copy link
Collaborator Author

Hmm, I was able to find a bug with drag-and-drop after drawing a gesture, but it doesn't look quite the same as yours. I discovered that handleTopMoveEndCapture is only cleaning up its _mouseClientOffset object if the drag was successfully initialized, which means that a new touchmove would use a stale starting position, far enough away from the real touchstart position to exceed the touchSlop option. I had to move the line that resets that variable up to a place where it would be guaranteed to run.

I'll keep looking to try to reproduce your bug, but please do let me know if this fixed it for you.

@raineorshine
Copy link
Contributor

Thanks. It's still occurring for me, although it doesn't happen very often I'm afraid. Maybe 1 in 10 times. Either gestures are completely disabled and the Gesture Zone starts allowing scrolling, or sometimes both drag-and-drop and gestures are incorrectly active at the same time.

@ethan-james
Copy link
Collaborator Author

I can reproduce the issue where drag-and-drop and gestures are active at the same time. As you might expect, it occurs when SCROLL_THRESHOLD is exceeded exactly 400ms after the press begins.

Screenshot 2025-07-15 at 2 36 25 PM
  1. The onStart function in useLongPress is invoked
  2. handleTopMoveStart fires, initiating drag-and-drop
  3. move fires in useLongPress, cancelling the timeout in onStart
  4. handleTopMove fires in TouchBackend, cancelling the handleTopMoveStart timeout, which has already fired

There should be another line, onStart timeout, but it is cancelled by move.

The easiest way to fix this is to allow state.longPress to transition from Inactive straight to DragInProgress without passing through DragHold.

@ethan-james
Copy link
Collaborator Author

ethan-james commented Jul 15, 2025

As for the flickering bullet shown in the video, I still haven't been able to reproduce it, but I can see how it could be the inverse of the bug above (i.e. drag-and-drop and gesture are active at the same time).

Instead of the timeout in useLongPress getting canceled during the same millisecond when it should have fired, if the timeout in handleTopMoveStart got canceled during the same millisecond when it should have fired, then beginDrag would never get called and drag-and-drop would never activate, but the dragHold action would get called by useDragHold.

This would cause the bullet to highlight because state.draggedSimplePath would be set briefly, and then cleared in move, but since onLongPressEnd is called with canceled: true, then it would not call dispatch(longPress({ value: LongPressState.Inactive })). Since drag-and-drop is not running, then it would never call endDrag.

stop in useLongPress would get called because react-dnd wouldn't be intercepting its events, but it would exit early due to globals.longpressing being false. So state.longPress would get stuck in an active state which would keep causing shouldCancelGesture to be true and prevent gestures.

It seems like there are 3 issues to consider:

  1. Is it possible to absolutely prevent handleTopMoveStart from being canceled? My instinct says no, at least with how the timeout is currently structured.
  2. We can make sure state.longPress gets set back to Inactive whenever stop runs, although not by using the existing onLongPressEnd logic.
  3. The bullet highlighting will be imperfect unless we can actually guarantee list item 1.

@ethan-james
Copy link
Collaborator Author

After mulling a bit, I see three possible solutions:

  1. Assuming the macrotask queue behaves like a queue, it should be possible to restructure the code to ensure that both move functions run either before or after both timeouts, deterministically. This would prevent situations where one timeout gets canceled but not the other.
  2. If that's too fiddly, it should be possible to store the timestamp when the timeouts begin, then check that value in the move functions. If exactly TIMEOUT_LONG_PRESS_THOUGHT milliseconds have passed, then the move functions can choose not to cancel the timeouts, ensuring that they both run.
  3. Redo the TouchBackend patch so that we eliminate the TouchBackend timeout and instead have a way to invoke beginDrag on demand. This is basically a retread of my earlier fork idea.

@raineorshine
Copy link
Contributor

Thanks so much for the in-depth analysis. I'll try to offer my thoughts, although I have to maintain a bit of distance to avoid spending too much time myself. Thanks for your understanding.

The easiest way to fix this is to allow state.longPress to transition from Inactive straight to DragInProgress without passing through DragHold.

That seems fine. Since we use the same TIMEOUT_LONG_PRESS_THOUGHT for both dragHold and drag-and-drop, it would make sense to go straight from Inactive to DragInProgress if the move event occurs immediately after the timeout completes.

  1. Is it possible to absolutely prevent handleTopMoveStart from being canceled? My instinct says no, at least with how the timeout is currently structured.

I'm not sure. I will have to rely on your experience so that I don't get too deep on the implementation details. I do know that we want to avoid modifying TouchBackendImpl more than absolutely necessary.

As for the flickering bullet shown in the video, I still haven't been able to reproduce it, but I can see how it could be the inverse of the bug above (i.e. drag-and-drop and gesture are active at the same time).

Instead of the timeout in useLongPress getting canceled during the same millisecond when it should have fired, if the timeout in handleTopMoveStart got canceled during the same millisecond when it should have fired, then beginDrag would never get called and drag-and-drop would never activate, but the dragHold action would get called by useDragHold.

This would cause the bullet to highlight because state.draggedSimplePath would be set briefly, and then cleared in move, but since onLongPressEnd is called with canceled: true, then it would not call dispatch(longPress({ value: LongPressState.Inactive })). Since drag-and-drop is not running, then it would never call endDrag.

stop in useLongPress would get called because react-dnd wouldn't be intercepting its events, but it would exit early due to globals.longpressing being false. So state.longPress would get stuck in an active state which would keep causing shouldCancelGesture to be true and prevent gestures.

Yes, that seems like a plausible cause. Something must be leading to shouldCancelGesture returning true.

It seems like the long press should always be set to inactive on touchend or touchcancel. The stuck state is higher impact than the flickering bullet problem, so we should take extra steps to ensure it can never occur.

  1. Assuming the macrotask queue behaves like a queue, it should be possible to restructure the code to ensure that both move functions run either before or after both timeouts, deterministically. This would prevent situations where one timeout gets canceled but not the other.

It's a bit hard for me to wrap my had around, but I'm open to it if it's not a ton of code.

  1. If that's too fiddly, it should be possible to store the timestamp when the timeouts begin, then check that value in the move functions. If exactly TIMEOUT_LONG_PRESS_THOUGHT milliseconds have passed, then the move functions can choose not to cancel the timeouts, ensuring that they both run.

Seems low impact enough.

  1. Redo the TouchBackend patch so that we eliminate the TouchBackend timeout and instead have a way to invoke beginDrag on demand. This is basically a retread of my earlier fork idea.

This feels too high risk to me.


One thing you could try is setting touchSlop to SCROLL_THRESHOLD so that it corresponds to the movement we allow before the long press is cancelled. That could help avoid drag-and-drop and gestures being active at the same time. Though maybe they could only be truly coordinated if we hooked into TouchBackendImpl's touchSlop logic, which could be messy.

@ethan-james
Copy link
Collaborator Author

One thing you could try is setting touchSlop to SCROLL_THRESHOLD so that it corresponds to the movement we allow before the long press is cancelled. That could help avoid drag-and-drop and gestures being active at the same time. Though maybe they could only be truly coordinated if we hooked into TouchBackendImpl's touchSlop logic, which could be messy.

That's what I did here without actually using the SCROLL_THRESHOLD variable because it's not exported. Probably worthwhile to clean that up before merging by adding SCROLL_THRESHOLD to constants.ts.

In any case, that fixed most of the buggy behavior. What's left is due to situations where one timeout (e.g. useLongPress) is canceled at the exact millisecond when it should be firing, while the other timeout (e.g. TouchBackend) is not canceled and takes effect. So all of the handlers activate at the same time, like:

  1. move in useLongPress cancels onStart because SCROLL_THRESHOLD = 10 has been exceeded
  2. onStart would run in the same millisecond, but has been canceled
  3. handleTopMoveStart runs in TouchBackend
  4. handleTopMove runs, tries to cancel handleTopMoveStart because touchSlop = 10 has been exceeded, but it's already active

So I'm going to try to do option number 2 above:

  1. If that's too fiddly, it should be possible to store the timestamp when the timeouts begin, then check that value in the move functions. If exactly TIMEOUT_LONG_PRESS_THOUGHT milliseconds have passed, then the move functions can choose not to cancel the timeouts, ensuring that they both run.

and see if it's possible to guard against out-of-order event handlers to ensure that either

  1. Both start functions run, or
  2. Both move functions run and cancel their start functions

@raineorshine
Copy link
Contributor

One thing you could try is setting touchSlop to SCROLL_THRESHOLD so that it corresponds to the movement we allow before the long press is cancelled. That could help avoid drag-and-drop and gestures being active at the same time. Though maybe they could only be truly coordinated if we hooked into TouchBackendImpl's touchSlop logic, which could be messy.

That's what I did here without actually using the SCROLL_THRESHOLD variable because it's not exported. Probably worthwhile to clean that up before merging by adding SCROLL_THRESHOLD to constants.ts.

Got it, thanks. Yes, we should use a single constant for that, and probably rename it to TOUCH_SLOP now that it's not related to scrolling and to bring it into alignment with react-dnd's terminology.

In any case, that fixed most of the buggy behavior. What's left is due to situations where one timeout (e.g. useLongPress) is canceled at the exact millisecond when it should be firing, while the other timeout (e.g. TouchBackend) is not canceled and takes effect.

You think it's really the exact millisecond? That seems so difficult to hit in practice, and I reproduced the issue a few times.

So all of the handlers activate at the same time, like:

  1. move in useLongPress cancels onStart because SCROLL_THRESHOLD = 10 has been exceeded
  2. onStart would run in the same millisecond, but has been canceled
  3. handleTopMoveStart runs in TouchBackend
  4. handleTopMove runs, tries to cancel handleTopMoveStart because touchSlop = 10 has been exceeded, but it's already active

So I'm going to try to do option number 2 above:

  1. If that's too fiddly, it should be possible to store the timestamp when the timeouts begin, then check that value in the move functions. If exactly TIMEOUT_LONG_PRESS_THOUGHT milliseconds have passed, then the move functions can choose not to cancel the timeouts, ensuring that they both run.

and see if it's possible to guard against out-of-order event handlers to ensure that either

  1. Both start functions run, or
  2. Both move functions run and cancel their start functions

I'm worried about how complex this is, but see what you can do.

It seems like there should be a more robust way to defer to TouchBackend rather than racing timeouts. Maybe patch it to fire a custom event? I'm not really sure.

Anything you can do to manage complexity here is going to help this be maintainable over time.

@ethan-james
Copy link
Collaborator Author

You think it's really the exact millisecond? That seems so difficult to hit in practice, and I reproduced the issue a few times.

That seemed to be what was happening for the bug that I was able to reproduce, but maybe you're seeing something different.

I'm worried about how complex this is, but see what you can do.

It seems like there should be a more robust way to defer to TouchBackend rather than racing timeouts. Maybe patch it to fire a custom event? I'm not really sure.

Anything you can do to manage complexity here is going to help this be maintainable over time.

A custom event sounds like a good idea, and may allow me to get rid of the move function in useLongPress.

(value === LongPressState.DragInProgress && state.longPress === LongPressState.DragHold) ||
// Drag should only begin after long press has begun, but the occasional race condition
// may allow react-dnd to initiate a drag before useLongPress sets this to DragHold.
// Drag should not re-start after DragCanceled.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hopefully not true anymore, but the async nature of useLongPress and beginDrag makes me nervous that removing it might re-introduce some rare bug.

@ethan-james ethan-james requested a review from raineorshine July 22, 2025 23:11
Copy link
Contributor

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks for your persistence. This looks like a solid approach now. Coordinating with TouchBackend resolves any weird timing issues between long press and drag-and-drop, or racing timeouts. I tried repeatedly to reproduce the original issue on an iPhone and was not able to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Mobile] Bullet incorrectly highlighted on gesture

2 participants