-
Notifications
You must be signed in to change notification settings - Fork 884
Open
Description
I'm experiencing a problem on 18.06.1 similar to #2015, some services recently began to fail occasionally upon redeploy with:
starting container failed: failed to get network during CreateEndpoint: network sa8m3w8pqi0tqzro06naftwbu not found
This is docker log from one of the affected worker node, the 4th line looks weird, as it indicates some network removal operations that should not happen:
time="2019-02-22T18:50:12.298220240+08:00" level=info msg="NetworkDB stats worker-04(03ad71380621) - netID:sa8m3w8pqi0tqzro06naftwbu leaving:false netPeers:6 entries:77 Queue qLen:0 netMsg/s:0"
time="2019-02-22T18:55:12.498417669+08:00" level=info msg="NetworkDB stats worker-04(03ad71380621) - netID:sa8m3w8pqi0tqzro06naftwbu leaving:false netPeers:6 entries:61 Queue qLen:0 netMsg/s:0"
time="2019-02-22T18:57:15.646497195+08:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint sa8m3w8pqi0tqzro06naftwbu 400a306d13ff57f7c0b773f692fb4c1399500c821a9fd2f7b7172fcbbebb903d], retrying...."
time="2019-02-22T18:57:15.728256035+08:00" level=error msg="network xxx_default remove failed: unknown network xxx_default id sa8m3w8pqi0tqzro06naftwbu" module=node/agent node.id=xpoaymvry45sb5q888o1jpoi8
time="2019-02-22T18:57:15.728298569+08:00" level=error msg="remove task failed" error="unknown network xxx_default id sa8m3w8pqi0tqzro06naftwbu" module=node/agent node.id=xpoaymvry45sb5q888o1jpoi8 task.id=5mav3lpsdc7gk8hpwgnyvtbr4
time="2019-02-22T18:57:15.768893162+08:00" level=error msg="fatal task error" error="starting container failed: failed to get network during CreateEndpoint: network sa8m3w8pqi0tqzro06naftwbu not found" module=node/agent/taskmanager node.id=xpoaymvry45sb5q888o1jpoi8 service.id=2x43w0s1gvxkzu4b8qw4wk3y2 task.id=kdyljmb8isyp02cqaq5dqsdnu
time="2019-02-22T18:57:16.045314219+08:00" level=error msg="fatal task error" error="starting container failed: failed to get network during CreateEndpoint: network sa8m3w8pqi0tqzro06naftwbu not found" module=node/agent/taskmanager node.id=xpoaymvry45sb5q888o1jpoi8 service.id=vdputsk4p8vf6a3klda75od44 task.id=j3l9cc5job0ih3g7eokgp73mk
time="2019-02-22T18:57:18.861072689+08:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint sa8m3w8pqi0tqzro06naftwbu 1c327c91e28c1024b23384b52d16861b8593c002c0312975d4ebac6e5953a962], retrying...."
time="2019-02-22T18:57:40.752379324+08:00" level=warning msg="deleteServiceInfoFromCluster NetworkDB DeleteEntry failed for 655f0010bee9de626f131bfbc629da4d0b79c84227709ac924921c8e7b900c58 sa8m3w8pqi0tqzro06naftwbu err:cannot delete entry endpoint_table with network id sa8m3w8pqi0tqzro06naftwbu and key 655f0010bee9de626f131bfbc629da4d0b79c84227709ac924921c8e7b900c58 does not exist or is already being deleted"
time="2019-02-22T19:00:12.699101399+08:00" level=info msg="NetworkDB stats worker-04(03ad71380621) - netID:sa8m3w8pqi0tqzro06naftwbu leaving:false netPeers:6 entries:64 Queue qLen:0 netMsg/s:0"
My workaround:
polling service status during update, if this specific error is detected, deploy affected services again
Others:
#2341 (comment)
#2341 (comment)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels