-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Description
This might be two separate bugs, but they might be related.
Checklist
- I have included the output of
celery -A proj reportin the issue. - I have verified that the issue exists against the
masterbranch of Celery.
Steps to reproduce
Run multiple celery workers with celery multi, each with its own queue, then pass tasks to them.
Expected behavior
tasks should run on all workers.
Actual behavior
Most workers explode:
Most nodes give this, for every task:
[2018-07-04 02:52:21,280: WARNING/ForkPoolWorker-1] Request: <Context: {'origin': 'gen19542@vm', u'args': [], u'chain': None, 'root_id': '0c008f5a-507d-4b08-8082-0dcc0ab3495d', 'expires': None, u'is_eager': False, u'correlation_id': '0c008f5a-507d-4b08-8082-0dcc0ab3495d', u'chord': None, u'reply_to': '5e48e3d9-0b67-3122-b0c5-9b83efd9e5ca', 'id': '0c008f5a-507d-4b08-8082-0dcc0ab3495d', 'kwargsrepr': '{}', 'lang': 'py', 'retries': 0, 'task': 'dandelion.geostore.celery.debug_task', 'group': None, 'timelimit': [None, None], u'delivery_info': {u'priority': 0, u'redelivered': False, u'routing_key': u'celery', u'exchange': u''}, u'hostname': u'celery@vm', 'called_directly': False, 'parent_id': None, 'argsrepr': '()', u'errbacks': None, u'callbacks': None, u'kwargs': {}, 'eta': None, '_protected': 1}>
[2018-07-04 02:52:21,288: WARNING/ForkPoolWorker-1] /kx/dandelion/venv/local/lib/python2.7/site-packages/celery/app/trace.py:561: RuntimeWarning: Exception raised outside body: OperationalError('could not receive data from server: Bad file descriptor\n',):
Traceback (most recent call last):
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 442, in trace_task
uuid, retval, task_request, publish_result,
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/celery/backends/base.py", line 144, in mark_as_done
self.store_result(task_id, result, state, request=request)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/celery/backends/base.py", line 316, in store_result
request=request, **kwargs)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django_celery_results/backends/database.py", line 29, in _store_result
meta=meta,
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django_celery_results/managers.py", line 50, in _inner
return fun(*args, **kwargs)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django_celery_results/managers.py", line 119, in store_result
obj, created = self.get_or_create(task_id=task_id, defaults=fields)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 473, in get_or_create
return self.get(**lookup), False
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 379, in get
num = len(clone)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 238, in __len__
self._fetch_all()
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 1087, in _fetch_all
self._result_cache = list(self.iterator())
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 54, in __iter__
results = compiler.execute_sql()
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 835, in execute_sql
cursor.execute(sql, params)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 79, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/raven/contrib/django/client.py", line 127, in execute
return real_execute(self, sql, params)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/kx/dandelion/venv/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
OperationalError: could not receive data from server: Bad file descriptor
I guess something's closing the django db connection?
One worker succeeds but then hangs
In another worker (iris), one task actually succeeds without hitting the above message, except that then the worker hangs forever. Telling it to stop with stopwait command just hangs the stopwait command too. There is nothing in the log for this worker from the time the task completes (the task completion is also not logged, but if I insert print statements at the end of the task they end up in the log)
Our init script to run all the workers is here: https://gist.github.com/craigds/42f3ad627d68e94aefa909c2b1a4a0c0
As far as I can tell, the tasks aren't doing anything particularly interesting. Putting pass as the task body doesn't change anything.
Tasks are defined like this:
from celery import shared_task
@shared_task(bind=True, queue='iris')
def queue_slow_rendered_tiles(self):
...This works well with celery 4.1.1. What other information can I provide that might illuminate the issue?
Theory
The only difference I can think of with the iris worker is that it doesn't use the django database in its tasks. Presumably the other workers, that do use the database, are exploding prior to this hang, whereas the iris one is skipping the db connection close and hitting a different bug. Maybe thats a clue?
celery report output
I grepped the relevant bits, the entirety of our django settings is not relevant here:
software -> celery:4.2.0 (windowlicker) kombu:4.2.1 py:2.7.6
billiard:3.5.0.3 py-amqp:2.3.2
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:django-db
CELERY_BROKER_URL: u'amqp://appname:********@localhost:5672/appname'
CELERY_TASK_QUEUES:
(<unbound Queue bcast.e39987e0-37c3-4eb7-b63a-dace642ba7da -> <unbound Exchange notification_broadcast(fanout)> -> >,)
CELERY_BEAT_SCHEDULER: 'django_celery_beat.schedulers:DatabaseScheduler'
CELERY_IMPORTS:
('appname', 'otherappname')
CELERY_RESULT_BACKEND: 'django-db'
CELERY_RESULT_EXPIRES: 3600
These seem relevant too:
celery==4.2.0
django-celery-beat==1.1.1
django-celery-results==1.0.1
kombu==4.2.1
amqp==2.3.2