gRPC Python 1.8.2 Library incompatibility with fork()

This bug is essentially the same as the previously-closed [bug 12455](https://github.com/grpc/grpc/issues/12455). That bug was closed because it was fixed in 1.7.0... except we now have at least two ways to repro it in 1.8.2. :(

Basic summary of the bug: If a gRPC channel is (1) used at least once and (2) still in scope when you call fork(), and the subprocess tries to open a gRPC channel as well, then all RPC's on that channel will hang. Apparently, some global state is leaking across fork() boundaries in a bad way.

### What version of gRPC and what language are you using?
 
gRPC 1.8.2, Python 3.6
 
### What operating system (Linux, Windows, …) and version?
 
Repro'ed on OS X 10.13.2
 
### What runtime / compiler are you using (e.g. python version or version of gcc)

Python 3.6.2
 
### What did you do?

Here is a minimal repro using the GCP datastore client:

```
import multiprocessing

from google.cloud import datastore


def causeTrouble(where: str):
    client = datastore.Client(project='dev-storage-humu', namespace='aquarium')
    client.get(client.key('c', 'aquarium'))
    # The call to get() hangs forever; this line is never reached.
    print('OK')


if __name__ == '__main__':
    # Create a datastore client and do an RPC on it.
    client = datastore.Client(project='dev-storage-humu', namespace='aquarium')
    client.get(client.key('c', 'aquarium'))

    # Kick off a child process while the first client is still in scope.
    process = multiprocessing.Process(target=causeTrouble,
                                      args=['child process'])
    process.start()
```

- If you change this so that it doesn't call client.get() from main, just creating the client and then forking, it works fine.
- If you change this so that instead of creating the client and calling client.get() directly, it calls causeTrouble() before forking, it also works fine. (NB that Python has eager GC, so the difference between the two is that the client is no longer in scope if you do it this other way)
- In another server, I worked around this by calling subprocess.run() instead of multiprocessing.Process(); since subprocess() does a fork() + exec(), it has no state surviving across the process boundary. However, the call to exec() strictly limits the communication with the parent process, basically making all of the multiprocessing library unusable.

If you kill it while it's hanging, you get this stack trace:

```
Traceback (most recent call last):
  File "/Users/zunger/.pyenv/versions/3.6.2/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/Users/zunger/.pyenv/versions/3.6.2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "minimal_repro.py", line 8, in causeTrouble
    client.get(client.key('c', 'aquarium'))
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/cloud/datastore/client.py", line 309, in get
    deferred=deferred, transaction=transaction)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/cloud/datastore/client.py", line 356, in get_multi
    transaction_id=transaction and transaction.id,
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/cloud/datastore/client.py", line 138, in _extended_lookup
    project, read_options, key_pbs)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/cloud/datastore/_gax.py", line 115, in lookup
    return super(GAPICDatastoreAPI, self).lookup(*args, **kwargs)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/cloud/gapic/datastore/v1/datastore_client.py", line 204, in lookup
    return self._lookup(request, options)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 452, in inner
    return api_caller(api_call, this_settings, request)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 438, in base_caller
    return api_call(*args)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 376, in inner
    return a_func(*args, **kwargs)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/gax/retry.py", line 121, in inner
    return to_call(*args)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/google/gax/retry.py", line 68, in inner
    return a_func(*updated_args, **kwargs)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/grpc/_channel.py", line 484, in __call__
    credentials)
  File "/Users/zunger/src/humu/servers/squeegee/server/.build/venv/lib/python3.6/site-packages/grpc/_channel.py", line 478, in _blocking
    _handle_event(completion_queue.poll(), state,
  File "src/python/grpcio/grpc/_cython/_cygrpc/completion_queue.pyx.pxi", line 100, in grpc._cython.cygrpc.CompletionQueue.poll
```

On b/12455, @katbusch reported that simply _importing_ certain libraries prior to fork (e.g. `from google.cloud import bigquery`) was sufficient to trigger this bug. Presumably they do enough initialization on import to trigger this.

That makes workarounds where we pre-fork a pile of worker processes and pass them jobs as needed more difficult, although possible as a short-term measure.
 
### Anything else we should know about your project / environment?

The use case which necessitates this is that we have a server (which uses GCP features extensively, e.g. for storage) that needs to fork off subprocesses in which to run long-running operations. (It's basically an analysis pipeline manager) Since Python is heavily reliant on fork() for parallelization (thanks to the GIL there's no real 'parallelism' in its threading) this is the main approach possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gRPC Python 1.8.2 Library incompatibility with fork() #13873

What version of gRPC and what language are you using?

What operating system (Linux, Windows, …) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

Anything else we should know about your project / environment?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gRPC Python 1.8.2 Library incompatibility with fork() #13873

Description

What version of gRPC and what language are you using?

What operating system (Linux, Windows, …) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

Anything else we should know about your project / environment?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions