Concurrent pipelines participate in backpressure#25334
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
4b0d9e2 to
3d5dbd1
Compare
a639008 to
d7ec589
Compare
amnn
left a comment
There was a problem hiding this comment.
Seems reasonable -- originally the intention was that concurrent pipelines implement backpressure through fixed channel buffer sizes, so OOMs seem to indicate that the buffer sizes were too generous there, but it also seems like a nice thing to have a uniform way to handle back-pressure.
Note that I've left a suggestion to simplify the indexer set-up even further based on this change, PTAL.
This change will also need to be accompanied with a docs change, e.g. here https://docs.sui.io/guides/developer/accessing-data/pipeline-architecture#concurrent-backpressure
| added_pipelines: BTreeSet::new(), | ||
| first_ingestion_checkpoint: u64::MAX, | ||
| next_sequential_checkpoint: None, | ||
| initial_commit_hi: None, |
There was a problem hiding this comment.
Now that all pipelines participate in this, is its value ever different from first_ingestion_checkpoint (with the exception of using u64::MAX instead of None in the empty case)?
d7ec589 to
91a007a
Compare
Previously, each spawned fetch task independently waited on ingest_hi inside itself, meaning tasks were already spawned and consuming resources while blocked on backpressure. Now checkpoints are gated at the stream level via backpressured_checkpoint_stream(), so tasks are only spawned when the backpressure window allows.
Previously only sequential pipelines reported commit_hi back to the ingestion regulator. Concurrent pipelines were effectively invisible to the broadcast. Now concurrent pipelinnes also send their watermark progress via commit_hi_tx, allowing ingestion to slow down when any pipeline falls behind. Having backpressure allows us to avoid setting a hard upper bound on ingestion concurrency and instead dynamically discover it using feedback systems.
91a007a to
b40fa8f
Compare
This reverts commit 4200be7.
Description
Previously only sequential pipelines reported commit_hi back to the
ingestion regulator. Concurrent pipelines were effectively invisible to
the broadcaster.
When I was trying to implement adaptive concurrency control without this in place,
the broadcaster would keep spawning new ingestion tasks even when the pipelines had
hit write throughput limits and eventually the system would just oom.
Test plan
I have tested it during a testnet backfill as part of a larger effort to enable adaptive rate limiting.
Release notes