Skip to content

Pull request race triggers inefficient pull retry #18062

@rkooo567

Description

@rkooo567

What is the problem?

#17628 (comment)

his happens 100% of time when pipeline spilling wasn't enabled, and 50% of time when it is enabled (we should eventually solve it by better retry protocol). TL;DR this block is triggered quite often which made it wait 10 additional seconds.

// Create failed. The object may already exist locally. If something else went

Object store mem: 64MB/70MB
Object: 8MB with 2 chunks

  • chunk index 1 receives at thread A
  • buffer pool attempts to spill & create an object
  • chunk index 0 receives at thread B
  • buffer pool attempts to spill & create an object
  • Object is created by chunk index 0 (since it is FIFO)
  • But chunk index 1 gets reply from the plasma store first.
  • chunk index 0 failed at
    // Create failed. The object may already exist locally. If something else went
  • chunk index 1 creates the object successfully.
  • Since chunk index 0 failed, it should wait 10 more seconds to retry.

This can be solved by either 1. better retry protocol we are planning to do. 2. cache the request until object is created (this is not ideal because it can cause large memory usage) and seal it as soon as it is created. It happens less often because creating a new object at buffer pool is usually faster (since we are not blocked by spilling)

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):
Look at the original issue

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions