[WIP] Cleanup routine for old tasks.db tasks#2917
[WIP] Cleanup routine for old tasks.db tasks#2917olljanat wants to merge 1 commit intomoby:masterfrom
Conversation
226139b to
90977a2
Compare
Avoid it growing on environments where daemon is restarted rarely Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
90977a2 to
64b5650
Compare
|
I have honestly no idea what the purpose of the task database is, so I'm a bit nervous to merge code related to it. I'll spend some time looking into it and give you an answer, though. I know that this is a rather disruptive problem for most users. |
|
As far I understand it is cache database which contains information about started tasks which might be needed if there network issue between worker and manager? And maybe it also contains some info why tasks failed (those which are shown with docker service ps). That why I think that some kind of history is those is needed for thoublestooting purses. What I can see is that if I remove all services from swarm tasks will stay on tasks.db but only until swsrmkit is restarted. |
|
as per moby/moby#34827 I had a single swarm node go splat and stay down with a 3.4GB tasks.db - deleting it, and starting docker again seems to get us back to a functional swarm, with all the old services, volumes, secrets etc appearing to work. now the tasks.db is 25M @dperny if you do figure out what its used for, would you be willing to write it down somewhere? |
|
ping @stevvooe PTAL |
|
@dperny Do I understand correctly that with new Jobs tasks.db will grow even faster so some kind of cleanup routine would be useful? |
|
What cleans these up on start? Does this have any particular effect on leader nodes? i.g. access to task history? It seems like this cleanup should be happening at the specified task history threshold rather than a randomly picked time interval. |
|
@cpuguy83 I agree that ideally, someone that knows what this database is supposed to do, should work out how to make it do whatever it was intended but given how long this has been a big issue, having a fallback "oh crap, its all gone to hell, delete on timer" is 100% better than punting this to some future when perfect happens. |
|
I'm not suggesting that this be punted down the line. |
|
@cpuguy83 @thaJeztah @dperny so what do we need to do to get this merge before 20.03 is branched? considering the date, the clock is ticking :( |
|
We have quite a lot of docker swarm nodes and we faced the huge tasks.db problem this morning. We think it is also a performance issue as the full database seems to be loaded in memory (I have a dockerd process eating 13Gb…), and update may cause CPU usage too. We consider this problem as critical and this patch would be very welcome in the next release. |
|
PTAL #2938, which is a bit cleaner of a fix. |
relates to
- What I did
Added cleanup routine for old tasks.db tasks to avoid it growing on environments where daemon is restarted rarely.
Without this change those are removed only during worker init:
https://github.com/docker/swarmkit/blob/42085d2f8e43a3ed90ed289d3f3ed3de57837100/agent/worker.go#L95-L103
closes #2367
- How I did it
Cleanup routine run on its own thread and every 5 minutes it removes tasks on completed/failed/shutdown state which are more than 5 minutes old.
- How to test it
tasks.dband see that all tasks exists still on database.