-
Notifications
You must be signed in to change notification settings - Fork 660
Description
What is the Issue?
When tasks are removed from SwarmKit, they're immediately removed from the object store, along with all the resources allocated to them (this includes IP addresses). But, actual task containers can take much longer to shut down, as they finish their shutdown procedures. This means that IP addresses might be held for longer. If new tasks get created in the meantime, they may be allocated the same IP address, leading to a conflict, or just a wonky state until old tasks finish their shutdown.
Existing Mitigations
There were two fixes that were recently added to alleviate this problem
- Putting used IP addresses to the end of the queue, so that it's less likely that an already used IP address is allocated to a new task. For deployments with rapid creation of new tasks, or small subnets, this doesn't do much.
- On the overlay network, if the same IP is used for some reason, then there is better handling so that the configuration in the kernel is consistent across nodes. This helps when the subnet is almost exhausted, but hides the fact that such a thing is happening. This might make it harder to debug the network if issues do arise.
While these mitigations are reasonable, they don't fix the root cause, and don't guarantee that the issue will not come up.
Long Term Fix
The long term fix involves not freeing up resources immediately when the task gets removed. This is a rough draft of the fix, which might evolve as it becomes clearer:
When a task is removed,
- Update desired state for the task to
REMOVED(this is a new task status), instead of removing it from the object store - Dispatcher tells the agent that desired state has been updated to removed, and the agent is responsible for going through task shutdown
- The task reaper only removes the task when desired state is
REMOVEDand actual state >RUNNING(the task reaper is the only place where a task should be removed)
We may or may not revert the mitigations once the long term fix is in, but we should test without the mitigations. Additionally, we will need to decide how the API deals with REMOVED tasks (whether they are returned by default or not). But we can get to these questions later.