-
Notifications
You must be signed in to change notification settings - Fork 4.1k
roachpb: does DeleteRange request handle WriteTooOld errors correctly? #45562
Description
Does the DeleteRange request properly handle WriteTooOld errors? There are two reasons why I don't think it does based on a reading of the code, but it's possible that they cancel each other out:
-
DeleteRangeRequest does not set the
isReadflag, even though it has an option toReturnKeys. Could this confused the "can defer WriteTooOld" check in `evaluateBatch? -
DeleteRangeRequest strangely does not set its
Keysresponse field on error but does set itsNumKeysfield on error. This seems nonsensical, and it's not inconsequential because batch evaluation does expect to be able to return a WriteTooOld error that gets swallowed by its caller so it needs to handle errors properly.
Based on these two issues, it seems possible to me that a DeleteRange request with ReturnKeys = true could hit a WriteTooOld error and fail to return any rows to SQL (which uses this option in deleteRangeNode) while deferring it the WriteTooOld error. I think things would eventually be ok because DeleteRange would require a refresh before committing, but this still seems accidental at best. We should figure out how this is supposed to work and fix it if it doesn't.
Additionally, this whole business of returning a WriteTooOldError and a result from batch evaluation seems sketchy and I'm not surprised that it creates these kinds of issues. It also just caused me all kinds of pain in #45482. Now that we have a retry loop right above batch evaluation, we should consider getting rid of dual response/error propagation.
Jira issue: CRDB-5146