refactor: return 500 when BackendTLSPolicy translation fails#4363
refactor: return 500 when BackendTLSPolicy translation fails#4363arkodg merged 3 commits intoenvoyproxy:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4363 +/- ##
==========================================
- Coverage 66.29% 66.28% -0.01%
==========================================
Files 209 209
Lines 31912 31956 +44
==========================================
+ Hits 21157 21183 +26
- Misses 9507 9523 +16
- Partials 1248 1250 +2 ☔ View full report in Codecov by Sentry. |
|
/retest |
zhaohuabing
left a comment
There was a problem hiding this comment.
@alexwo Thanks for picking up this! This PR looks great overall! I just have a few nits.
There was a problem hiding this comment.
updated, a route will not get translated, if it is a TCP/UDP route & contains invalid configurations
There was a problem hiding this comment.
@alexwo - let's maybe update the PR description and title so that the release notes will reflect the two behaviors:
- 500 for HTTP
- connection reset for TCP/UDP
In terms of UX, users experience a reset in this case right? Just want to make sure that the connections on the client side don't hang and wait for some timeout, so that it's clear to users that the request is explicitly rejected.
There was a problem hiding this comment.
If the route associated with the TLS connection is removed, I would expect ongoing connections that are relying on this route will be dropped. This will most likely result in a TCP reset (RST) or a termination of the TLS session, effectively severing the connection. Maybe we can try to confirm it.
internal/gatewayapi/testdata/envoyproxy-tls-settings-invalid.out.yaml
Outdated
Show resolved
Hide resolved
guydc
left a comment
There was a problem hiding this comment.
LGTM. Few questions on behavior and scope.
internal/gatewayapi/validate.go
Outdated
There was a problem hiding this comment.
Are the changes made here extending the scope of the PR to reject additional cases like invalid backend config, or is this already handled with a 500 response?
There was a problem hiding this comment.
Yes, these changes capture cases where the backend configuration is invalid.
There was a problem hiding this comment.
should the destination exist in this case?
|
/retes |
|
/retest |
1 similar comment
|
/retest |
|
/retest |
|
/retest |
There was a problem hiding this comment.
curious what happened here, and why this got deleted ?
There was a problem hiding this comment.
In this scenario, the route generally shouldn’t be operational. I’ll look into why the settings are being removed to better understand the problem. For now, the route is left as-is because fully invalidating it causes the conformance test to fail. Perhaps we can ignore the TLS route translation errors in the affected flow and create a separate task to address it.
There was a problem hiding this comment.
this 2 similar conditions here
Backend service validation failed: Service default/service-unknown
&&
Service default/service-unknown not found
There was a problem hiding this comment.
Would it be better to provide a static message explaining why the status is not accepted, such as “invalid configuration detected”? This approach would avoid repeating the message in the accepted status reason. The types, such as “Accepted” and “ResolvedRefs,” are already distinct.
There was a problem hiding this comment.
I think its fine for this PR
There was a problem hiding this comment.
similar comment as https://github.com/envoyproxy/gateway/pull/4363/files#r1792663080
internal/gatewayapi/route.go
Outdated
There was a problem hiding this comment.
curious why we need to change the status generation logic
There was a problem hiding this comment.
To clarify why the traffic cannot be routed, previously, the route was accepted, and the traffic was considered routable before this change.
There was a problem hiding this comment.
trying to better understand why do we need an additional check here, isnt if err != nil enough ?
There was a problem hiding this comment.
At this stage, it is already possible that a route is not applicable due to other reasons. This check attempts to retain the original reason, as it may be more specific. However, if that is the case, no traffic would be routed regardless to the settings validations errors.
There was a problem hiding this comment.
can you share an example of what the status looks like with and w/o this change ?
There was a problem hiding this comment.
Actually, this does not seem to happen, I did not notice any different translations when this extra condition is not in place. I have removed it.
internal/gatewayapi/route.go
Outdated
There was a problem hiding this comment.
internal/gatewayapi/route.go
Outdated
There was a problem hiding this comment.
this isnt added for HTTPRoute and GRPCRoute but is added for UDP and TCP ?
There was a problem hiding this comment.
For HTTP and TCP routes, we currently accept the route but return a 500 response. However, for UDP and TCP routes, the suggestion is to reject and not translate it at all incase of errors, and indicate this via the route status as traffic won't be routable at all.
internal/gatewayapi/route.go
Outdated
|
@alexwo I dont see the conformance tests passing. I'm worried about the size of this change that not only applies to BTP but every error while translating route rules and backendRefs. For example if one out of the many backendRefs is invalid, only a portion of the requests should receive a 503 |
@arkodg indeed, the impact of the change is large and it will render out many of the partial working, not fully configured routes, I can fully relate to it. We should be covered as long as many of the necessary tests should be in place,what can detect potential issues. The conformance tests are passing, but there was a linting issue previously that prevented them from running. Now, if a part of the backend reference configuration is invalid — for example, if one of the associated configurations cannot be processed — the key question is: should routing to this backend reference still occur in such cases? |
|
/retest |
According to the spec, the answer is a portion of the requests (based on weights) should continue to be sent to the valid backendRef |
We’re processing all backend routes and addressing only falty configured ones. This should ensures alignment with the spec, as valid backend routes will continue to function, while only those with invalid configurations will be skipped / not translated. |
|
/retest |
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions! |
|
/retest |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #4087