While attempting utilize the lease_keep_alive_once method, we discovered an incompatibility when connecting to an etcd gRPC proxy. All requests would fail with a Cancelled status like:

        from /home/jcalvert/.rvm/gems/ruby-2.4.3/gems/grpc-1.11.0-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:26:in `check_status'
        from /home/jcalvert/.rvm/gems/ruby-2.4.3/gems/grpc-1.11.0-x86_64-linux/src/ruby/lib/grpc/generic/bidi_call.rb:209:in `block in read_loop'
        from /home/jcalvert/.rvm/gems/ruby-2.4.3/gems/grpc-1.11.0-x86_64-linux/src/ruby/lib/grpc/generic/bidi_call.rb:195:in `loop'
        from /home/jcalvert/.rvm/gems/ruby-2.4.3/gems/grpc-1.11.0-x86_64-linux/src/ruby/lib/grpc/generic/bidi_call.rb:195:in `read_loop'
        from /home/jcalvert/etcdv3-ruby/lib/etcdv3/lease.rb:38:in `each'     
        from /home/jcalvert/etcdv3-ruby/lib/etcdv3/lease.rb:38:in `lease_keep_alive_once'
        from /home/jcalvert/etcdv3-ruby/lib/etcdv3/connection.rb:23:in `call'       
        from /home/jcalvert/etcdv3-ruby/lib/etcdv3/connection_wrapper.rb:21:in `handle'
        from /home/jcalvert/etcdv3-ruby/lib/etcdv3.rb:127:in `lease_keep_alive_once'

After some extensive debugging the culprit seems to be the EOF check the proxy makes here: - when the "request" object for the bidirectional stream has completed writing, this code point is reached and the receiving loop server side returns which then fires off a message to the stop channel - and so the context is marked as cancelled before the request even runs. For reference, we tried using the Go client against the etcd proxy and did not encounter this issue.

Looking at the comments here and here it indicates the issue in this case is one of needing to defer signaling that writing is complete until the code has actually received a response. Doing this allows requests that are proxied via gRPC to succeed as expected.

May 21 '18 21:05 jcalvert

Codecov Report

Merging #117 into master will increase coverage by 0.02%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
+ Coverage   98.35%   98.38%   +0.02%     
==========================================
  Files          24       24              
  Lines        1521     1545      +24     
==========================================
+ Hits         1496     1520      +24     
  Misses         25       25

Impacted Files	Coverage Δ
lib/etcdv3/lease.rb	`100% <100%> (ø)`	:arrow_up:
spec/etcdv3/lease_spec.rb	`100% <100%> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5b4373d...8335bc7. Read the comment docs.

May 21 '18 21:05 codecov[bot]

What version of Etcd are you using out of curiosity?

May 21 '18 22:05 davissp14

https://github.com/coreos/etcd/issues/9751

May 22 '18 02:05 davissp14

We've tested against 3.2 and 3.3.3 as well as master, also with the latest gRPC gem.

May 22 '18 14:05 jcalvert

Good deal, I was able to reproduce the issue and am looking to work with the CoreOS guys to get this resolved upstream. If this issue is blocking you guys, I'm happy to get this merged and work to rip it out once it's fixed.

Although, I don't feel great about adding gRPC Proxy logic/specs into this project. I'd rather remain agnostic with regard to users topology/proxy choices. That being the case, I'd ask that you remove any specs/code that relates to the gRPC Proxy, and just add a comment to your workaround.

Does that make sense?

May 22 '18 15:05 davissp14

We can just run our fork so no big rush. We can dump the proxy specific bits you mentioned for this. It's a particularly odd and unexpected interaction.

May 23 '18 15:05 jcalvert

Right on, I appreciate the help!

May 23 '18 15:05 davissp14

We'll need to update this shortly because of a bug in the way it is structured.

When code makes a bidi call, the write operation is run inside a thread - that means that if this call gets a DeadlineExceeded the enumerator doesn't complete and that write thread never exits. So when you have a poorly responding etcd server, you get an unbounded number of threads blocking and bad things happen. We need to ensure the enumerator completes so the GRPC write thread exits.

Jun 05 '18 16:06 jcalvert

@jcalvert Hey man, mind updating the parent etcd issue with your findings?

Jun 05 '18 19:06 davissp14

I think the etcd issue is still the same - this is an additional issue with error handling of bidi calls.

Jun 05 '18 19:06 mgates

Updated again...looks like some of our forked stuff leaked over, so I cleaned that up.

Jun 06 '18 16:06 jcalvert

Modify lease keep alive call to be compatible with grpc proxy

Codecov Report