Skip to content

services: fix HealthCheckingLoadBalancer.shutdown().#5848

Merged
zhangkun83 merged 2 commits intogrpc:masterfrom
zhangkun83:hc_shutdown_bug
Jun 7, 2019
Merged

services: fix HealthCheckingLoadBalancer.shutdown().#5848
zhangkun83 merged 2 commits intogrpc:masterfrom
zhangkun83:hc_shutdown_bug

Conversation

@zhangkun83
Copy link
Copy Markdown
Contributor

HealthCheckingLoadBalancer.shutdown() calls
hcState.onSubchannelState(SHUTDOWN) which removes that hcState from
helper.hcStates. Therefore, if more than one Subchannels are present,
ConcurrentModificationException will be thrown.

Since HealthCheckingLoadBalancer.shutdown() will clear the hcStates
set after the loop, it's unnecessary to do the deletion within the
loop. However, when a Subchannel is shutdown by LoadBalancer, its
HcState still needs to be removed. To do that, change moves the
deletion to Subchannel.shutdown().

HealthCheckingLoadBalancer.shutdown() calls
hcState.onSubchannelState(SHUTDOWN) which removes that hcState from
helper.hcStates.  Therefore, if more than one Subchannels are present,
ConcurrentModificationException will be thrown.

Since HealthCheckingLoadBalancer.shutdown() will clear the hcStates
set after the loop, it's unnecessary to do the deletion within the
loop.  However, when a Subchannel is shutdown by LoadBalancer, its
HcState still needs to be removed.  To do that, change moves the
deletion to Subchannel.shutdown().
@zhangkun83 zhangkun83 requested a review from dapengzhang0 June 7, 2019 00:41
@Override
public void shutdown() {
helperImpl.getSynchronizationContext().throwIfNotInThisSynchronizationContext();
helperImpl.hcStates.remove(hcState);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call delegate.shutdown()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

@voidzcy
Copy link
Copy Markdown
Contributor

voidzcy commented Jun 7, 2019

I would also make the similar change for OrcaOobUtil.

Update: OrcaOobUtil won't have the same issue as it would not directly be used as OrcaReportingHelper, so orcaStates will not be exposed to the load balancer. OrcaReportingHelper itself does not have any case to loop over orcaStates and call each of OrcaReportingState#onSubchannelState(SHUTDOWN).

@zhangkun83 zhangkun83 merged commit c6f1516 into grpc:master Jun 7, 2019
@zhangkun83 zhangkun83 deleted the hc_shutdown_bug branch June 7, 2019 16:33
ejona86 added a commit to ejona86/grpc-java that referenced this pull request Jun 13, 2019
…)"

This reverts commit c6f1516. It broke
an internal health checking test because the server wouldn't shut down.
We assume the health checking RPC isn't getting closed.
ejona86 added a commit that referenced this pull request Jun 13, 2019
This reverts commit c6f1516. It broke
an internal health checking test because the server wouldn't shut down.
We assume the health checking RPC isn't getting closed.
zhangkun83 added a commit to zhangkun83/grpc-java that referenced this pull request Jun 13, 2019
HealthCheckingLoadBalancer.shutdown() calls
hcState.onSubchannelState(SHUTDOWN) which removes that hcState from
helper.hcStates.  Therefore, if more than one Subchannels are present,
ConcurrentModificationException will be thrown.

Since HealthCheckingLoadBalancer.shutdown() will clear the hcStates
set after the loop, it's unnecessary to do the deletion within the
loop.  However, when a Subchannel is shutdown by LoadBalancer, its
HcState still needs to be removed.  To do that, change moves the
deletion to Subchannel.shutdown().
zhangkun83 added a commit that referenced this pull request Jun 14, 2019
The issue: HealthCheckingLoadBalancer.shutdown() calls
hcState.onSubchannelState(SHUTDOWN) which removes that hcState from
helper.hcStates. Therefore, if more than one Subchannels are present,
ConcurrentModificationException will be thrown.

This is an alternative approach from #5848 that was reverted in #5875. Thanks to #5883, HealthCheckingLoadBalancer.shutdown() no longer has to fake SHUTDOWN notifications, and can completely rely on Subchannels' real SHUTDOWN notifications for triggering the clean-up.
@lock lock bot locked as resolved and limited conversation to collaborators Sep 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants