Bug
Our query scheduler is currently designed to run a scheduling loop on a fixed polling interval, where on each interval we gather finished batches of tasks and dispatch new batches. This design has some fundamental problems, but that will be addressed in a separate design doc going over a proper solution.
The immediate issue, then, is that this scheduling loop currently has so much jitter that the effective rate at which we can dispatch new batches of task for each job is regulated by this jitter instead of the configured polling interval.
The following chart shows the breakdown of how time is spent in the main part of the scheduling loop over a small illustrative time slice of a longer search job as batches of task are completed, and illustrates how task completion affects jitter. (Apologies for the ad-hoc chart).
As you can see from the graph, each time a batch of tasks is completed we spend a suspicious amount of time waiting for results from celery.
As it turns out, this is because even after task.ready() is true, celery still seems to go into a polling loop to retrieve results from redis -- since the default polling interval is 0.5s we seem to always experience this 0.5s delay when retrieving results. Reducing this polling interval directly reduces the time we spend in task.get() (and experimentally even when reduced significantly we seem to always spend ~polling_interval time in get()).
Besides this issue with retrieving the results of celery tasks, we can also reduce jitter by changing how we sleep in our main scheduling loop.
Currently the loop looks something like
while True:
# do stuff
await asyncio.sleep(polling_interval)
but to reduce jitter we really want something like
while True:
# do stuff
await asyncio.sleep(polling_interval - time_spent_doing_stuff)
CLP version
0.8.0
Environment
Package build started with docker-compose.
Reproduction steps
- Compress enough data to form at least a few archives
- Make sure the configured batch size is less than the total number of archives
- Dispatch any search across all archives (note that because of another issue, command line searches that don't invoke the reducer end up with all tasks in a single batch).
Bug
Our query scheduler is currently designed to run a scheduling loop on a fixed polling interval, where on each interval we gather finished batches of tasks and dispatch new batches. This design has some fundamental problems, but that will be addressed in a separate design doc going over a proper solution.
The immediate issue, then, is that this scheduling loop currently has so much jitter that the effective rate at which we can dispatch new batches of task for each job is regulated by this jitter instead of the configured polling interval.
The following chart shows the breakdown of how time is spent in the main part of the scheduling loop over a small illustrative time slice of a longer search job as batches of task are completed, and illustrates how task completion affects jitter. (Apologies for the ad-hoc chart).
As you can see from the graph, each time a batch of tasks is completed we spend a suspicious amount of time waiting for results from celery.
As it turns out, this is because even after
task.ready()is true, celery still seems to go into a polling loop to retrieve results from redis -- since the default polling interval is 0.5s we seem to always experience this 0.5s delay when retrieving results. Reducing this polling interval directly reduces the time we spend intask.get()(and experimentally even when reduced significantly we seem to always spend ~polling_intervaltime inget()).Besides this issue with retrieving the results of celery tasks, we can also reduce jitter by changing how we sleep in our main scheduling loop.
Currently the loop looks something like
but to reduce jitter we really want something like
CLP version
0.8.0
Environment
Package build started with docker-compose.
Reproduction steps