Skip to content

worker: provide removeFilterChain interface#10528

Merged
mattklein123 merged 72 commits intoenvoyproxy:masterfrom
lambdai:fcd-on-ctx
Apr 16, 2020
Merged

worker: provide removeFilterChain interface#10528
mattklein123 merged 72 commits intoenvoyproxy:masterfrom
lambdai:fcd-on-ctx

Conversation

@lambdai
Copy link
Copy Markdown
Contributor

@lambdai lambdai commented Mar 26, 2020

Description:
Follow up on #10491

By adding removeFilterChain and a parameter in addListenerToWorker,
Now Worker and ConnectionHandler provides the functionalities required
by the future ListenerManager and ListenerImpl for intelligent listener update path.

The background is #10491 enabled shared filter chains among ListenerConfig, and this PR
is how ConnectionHandler can manipulate a tcp listener at filter chain granular.

A typical flow is

  1. The ListenerConfig with new filter chain set replaces the previous one through addListenerToWorker, now the new connections will see the new filter chain set.
  2. The filter chains need to be destroyed in calculated and start to be drained
  3. At the drain timeout, the filter chains are force destroyed through removeFilterChain
  4. Finally, the old ListenerConfig is destroyed.

In this PR the added flow can only be triggered by

ListenerManagerImpl::addListenerToWorkerForTest
ListenerManagerImpl::drainFilterChainsForTest

Those two helper function will be deleted in the next PR when a new shape of ListenerImpl is introduced.

Risk Level: LOW

  1. The newly added removeFilterChain is not used except in test cases.
  2. addListenerToWorker is adding a parameter (optional overridden listener) but the production code always passes nullopt and shim to original code path.

Testing:
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

lambdai added 30 commits January 8, 2020 19:59
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@mattklein123
Copy link
Copy Markdown
Member

Please let me know when this is ready for review. Also, this is still a huge and complicated PR, so if it can be split further please do. If it can't be split, the description should be updated to be about 10x as long as it is so I understand what this PR does. Thanks!

/wait

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 7, 2020

@mattklein123 It's ready to review. I added the description of what exactly this PR addresses.

It's still long but it's not that hard to review.
Firstly most of the 1 line fragment change is just reflecting the type change config_ from ref to ptr, and the new overridden_listener
Then the rest is the 2 independent flows: addListener and removeFilterChain
Finally is the DrainingFilterChainsImpl providing the scheduled listener destroy in master thread and the filter chains to be destroyed in workers thread.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good at a high level, thanks. Flushing out a first round of comments. Will take another pass once these are actioned on. Thank you!

/wait

return;
}
}
ASSERT(false, absl::StrCat("fail to replace tcp listener ", config.name(), " tagged ",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace with NOT_REACHED if the code should actually not reach here. Should this ever hit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure... I was thinking about the listener update on top of a listener socket bind error, where worker cannot listen.
I feel the removal and listener update is not ordered.
WDYT?

}
// since is_deleting_ is on, we need to manually remove the map value and drive the iterator

// Defer delete connection container to avoid race condition in destroying connection
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What race condition? Do you mean a stack unwind issue? Clarify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I feel it a deja vu...
Same reason in ActiveTcpListener::removeConnection(ActiveTcpConnection& connection) ?

  1. The connection is defer deleted
  2. The connection has reference to the connection container
    so the connection container need to be defer deleted.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 10, 2020

Thanks! Will update later today

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Address most of the comment.
I will update the deferer task class in the round

return;
}
}
ASSERT(false, absl::StrCat("fail to replace tcp listener ", config.name(), " tagged ",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure... I was thinking about the listener update on top of a listener socket bind error, where worker cannot listen.
I feel the removal and listener update is not ordered.
WDYT?

Comment on lines +82 to +83
// Fallback to iterate over all listeners. The reason is that the target listener might have began
// another update and the previous tag is lost.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drain take time and so there might be 100 removeFilterChains() scheduled, each with its own listener tag(1 to 100)
However, the all 100 removal could refer the same activetcplistener, which now has the listener tag 101.

When filter chains with listener tag 1 is removing in this function, although the listener tag is now 101, we need to remove the filter chains of this listener 101.

Does it make sense?

void ConnectionHandlerImpl::ActiveTcpListener::removeFilterChains(
const std::list<const Network::FilterChain*>& draining_filter_chains) {
// Need to recover the original deleting state
bool was_deleting = is_deleting_;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it

}
// since is_deleting_ is on, we need to manually remove the map value and drive the iterator

// Defer delete connection container to avoid race condition in destroying connection
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I feel it a deja vu...
Same reason in ActiveTcpListener::removeConnection(ActiveTcpConnection& connection) ?

  1. The connection is defer deleted
  2. The connection has reference to the connection container
    so the connection container need to be defer deleted.

while (!connections.empty()) {
connections.front()->connection_->close(Network::ConnectionCloseType::NoFlush);
}
// since is_deleting_ is on, we need to manually remove the map value and drive the iterator
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahaha...

void ConnectionHandlerImpl::removeFilterChains(
const Network::DrainingFilterChains& draining_filter_chains, std::function<void()> completion) {
for (auto& listener : listeners_) {
// TODO(lambdai): merge the optimistic path and the pessimistic locking.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimistic look up is hoping to locate the listener by tag. The pessimistic path is to go though each listener and execute the removal. The idea is that the listener tag in arg could be a staled value. Upon filter chain removal that listener may have a new listener tag

void ConnectionHandlerImpl::removeFilterChains(
const Network::DrainingFilterChains& draining_filter_chains, std::function<void()> completion) {
for (auto& listener : listeners_) {
// TODO(lambdai): merge the optimistic path and the pessimistic locking.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is the filter chain(and the connections) are defer deleted.
We don't want to invoke the completion when connection is merely in defer delete list but destroyed yet.
As a hint, the completion will delete the filter chain and the factory context, then connection will hold a reference to a destroyed factory context.

lambdai added 3 commits April 14, 2020 00:49
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. In general this LGTM but I have a few more comments/questions. Thank you!

/wait

Comment on lines +82 to +83
// Fallback to iterate over all listeners. The reason is that the target listener might have began
// another update and the previous tag is lost.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, but this is pretty confusing as I need to page back in how all of the tagging stuff works, when a tag changes, etc. Can you add quite a bit more comments here to see if that helps me? Should the tag perhaps not update if we are doing a filter chain only update? I think that would make things a lot simpler?

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 Thank you! I will update the PR.
I replied the comment. Let me if it helps!

Comment on lines +82 to +83
// Fallback to iterate over all listeners. The reason is that the target listener might have began
// another update and the previous tag is lost.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we leave the two loop here for now with a TODO, and we will resolve it in the next PR?
If the listener manager reuses the tag value, the below pessimistic loop won't be hit.
Personally I feel the listener tag is helpful when I triage the listener update.

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice work!

COUNTER(listener_modified) \
COUNTER(listener_removed) \
COUNTER(listener_stopped) \
GAUGE(total_filter_chains_draining, NeverImport) \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a note to document this when you do the docs, release notes, etc. in the final change. Thank you!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.
Thank you!

@mattklein123 mattklein123 merged commit ce5953b into envoyproxy:master Apr 16, 2020
lambdai added a commit to lambdai/envoy-dai that referenced this pull request May 1, 2020
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
istio-testing pushed a commit to istio/envoy that referenced this pull request May 12, 2020
* worker: provide removeFilterChain interface (envoyproxy#10528)

Signed-off-by: Yuchen Dai <silentdai@gmail.com>

* cherry-pick in place filter chain update

Signed-off-by: Yuchen Dai <silentdai@gmail.com>

* fix conficts

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants