Conversation
da17eff to
d8203e4
Compare
hiltontj
left a comment
There was a problem hiding this comment.
I have some comments in line. I think it looks good, but would be good to let Jamie take a look as well.
influxdb3_server/src/lib.rs
Outdated
| common_state.trace_header_parser.clone(), | ||
| Arc::new(http_metrics), | ||
| common_state.trace_collector().clone(), | ||
| TRACE_HTTP_SERVER_NAME, |
There was a problem hiding this comment.
I think this TRACE_HTTP_SERVER_NAME should be parameterized to differentiate traces on the recovery server from traces on the main server.
There was a problem hiding this comment.
I've parameterized it, see the specific changes here
| let http_metrics = | ||
| RequestMetrics::new(Arc::clone(&common_state.metrics), MetricFamily::HttpServer); |
There was a problem hiding this comment.
Could the recovery server skip metrics, or even the tracing all together?
I think it still captures logs from the recovery server, so these may not be needed for something used sparingly.
There was a problem hiding this comment.
I did think about it, but given it's another HTTP server, any metrics we capture about this server could potentially be useful too is my thinking. I've just parameterized it so that you can differentiate (as per your suggestion above). Let me know if I'm missing something, I can take it out if that'd be a better choice.
There was a problem hiding this comment.
We would have to see how the request metrics from the main HTTP server mesh with those from the recovery server, since they are both using the same instance of metric::Registry. As long as they don't conflict with each other than I don't think this is an issue.
There was a problem hiding this comment.
I'll run a quick check and see how it looks.
There was a problem hiding this comment.
@hiltontj - is there something specific to start the process to allow any http request to be sent to trace server? I'm using --traces-exporter-jaeger-agent-port 6831 --traces-exporter-jaeger-service-name influxdb3-core --traces-exporter jaeger. I only see traffic going to jaeger when I run queries but all other requests don't seem to generate any traces. I checked the tcpdump too, there's definitely nothing sent by influxdb3.
There was a problem hiding this comment.
It may be that we only use the jaeger hooks from the query path. I was thinking more of the prometheus metrics.
So, you could test by starting up the server, doing a regenerate with the recovery endpoint, and then hitting the main server's /metric API to see if the metrics for the request to the recovery endpoint show up.
There was a problem hiding this comment.
Hmm looks like /metrics collects the details at the method, path level. So, it assumes a single http server. One thing is the same path is available in both servers, the recovery server allows it to be called without a password and the main server requires password to access the path. When looking at the /metrics below it isn't immediately obvious whether it was the recovery server or the main server that responded. Having said that, I'm not sure if it entirely muddies the line either. @hiltontj - I can take it out if you think that'd be better.
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.0025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.005"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.01"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.05"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.1"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.25"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="0.5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="1"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="2.5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="10"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="inf"} 0
http_request_duration_seconds_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error"} 0
http_request_duration_seconds_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.001"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.0025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.005"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.01"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.025"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.05"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.1"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.25"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="0.5"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="1"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="2.5"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="5"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="10"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="inf"} 1
http_request_duration_seconds_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok"} 0.003047878
http_request_duration_seconds_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok"} 1
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.001"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.0025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.005"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.01"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.05"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.1"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.25"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="0.5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="1"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="2.5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="10"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="inf"} 0
http_request_duration_seconds_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error"} 0
http_request_duration_seconds_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.001"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.0025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.005"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.01"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.025"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.05"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.1"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.25"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="0.5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="1"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="2.5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="5"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="10"} 0
http_request_duration_seconds_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="inf"} 0
http_request_duration_seconds_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response"} 0
http_request_duration_seconds_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response"} 0
http_requests_total{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="aborted"} 0
http_requests_total{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error"} 0
http_requests_total{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok"} 1
http_requests_total{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error"} 0
http_requests_total{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="100"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="1000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="10000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="100000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="1000000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="10000000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error",le="inf"} 0
http_response_body_size_bytes_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error"} 0
http_response_body_size_bytes_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="client_error"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="100"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="1000"} 1
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="10000"} 1
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="100000"} 1
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="1000000"} 1
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="10000000"} 1
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok",le="inf"} 1
http_response_body_size_bytes_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok"} 319
http_response_body_size_bytes_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="ok"} 1
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="100"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="1000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="10000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="100000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="1000000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="10000000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error",le="inf"} 0
http_response_body_size_bytes_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error"} 0
http_response_body_size_bytes_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="server_error"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="100"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="1000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="10000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="100000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="1000000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="10000000"} 0
http_response_body_size_bytes_bucket{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response",le="inf"} 0
http_response_body_size_bytes_sum{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response"} 0
http_response_body_size_bytes_count{method="POST",method_path="POST /api/v3/configure/token/admin/regenerate",path="/api/v3/configure/token/admin/regenerate",status="unexpected_response"} 0
influxdb3_catalog_operations_total{type="regenerate_admin_token"} 1
There was a problem hiding this comment.
I think this is okay. As long as it doesn't break the /metrics API, I think its fine.
6fb81a9 to
393c981
Compare
There was a problem hiding this comment.
First off, thank you for the PR and making the token regeneration experience better! :)
Security considerations
- User starts the server without passing in any ports to bind, in that case
0.0.0.0:8181is used as default main server and127.0.0.1:8182for the token recovery server.- In case of regeneration of admin token with admin token, the user can still use
0.0.0.0:8181.- If the user has lost the admin token, then the user can jump into the container and then run recovery or regeneration commands as explained in manual tests section below.
- When ran inside docker, we default to
127.0.0.1which means user cannot get to port8182(it's notexposed in Dockerfile).- For users running in non-containerized environments, it's possible to run on any socket address (host + port) but care must be taken to not expose it accidentally which in turn can lead to unauthorized actors to recycle the admin token.
@jdstrand - please poke any holes in this security considerations I've taken into account. Also, I've used
--admin-token-recovery-http-bindas it is similar to--http-bindalthough this is wordy. I'm happy to switch it to--regenerate-endpoint-listenif that's better option as mentioned in the issue comment or anything else that's more succinct.
I have blackbox tested this and for the most part it looks ok, but the default behavior is a problem (which I'll discuss why in a moment). I'm a bit confused on the intended behavior for listening on 127.0.0.1:8182. The bulleted list of considerations strongly suggests that the intention is to allow listening on 127.0.0.1:8182, but other statements suggest the default might only be for containers and that for non-containers that it is maybe optional, but to be careful.
That aside:
-
influxdb3 serveunconditionally listens on127.0.0.1:8182:$ influxdb3 serve --object-store=file --data-dir /tmp/data --node-id node0 ... $ ss -atunp|grep -E 'LISTEN .*influxdb3' tcp LISTEN 0 1024 127.0.0.1:8182 0.0.0.0:* users:(("influxdb3",pid=133872,fd=19)) tcp LISTEN 0 1024 0.0.0.0:8181 0.0.0.0:* users:(("influxdb3",pid=133872,fd=18)) -
While I was able to use
INFLUXDB3_ADMIN_TOKEN_RECOVERY_HTTP_BIND_ADDRand--admin-token-recovery-http-bindto change away from the default, I wasn't able to turn it off altogether. Eg:$ influxdb3 serve --object-store=file --data-dir /tmp/data --node-id node0 --admin-token-recovery-http-bind= error: invalid value '' for '--admin-token-recovery-http-bind <ADMIN_TOKEN_RECOVERY_BIND_ADDRESS>': Cannot parse socket address '': invalid socket address -
serve --helpandserve --help-alldon't seem to mention--admin-token-recovery-http-bind -
calling
create token --admin --regenerate --token "bad"works, which is a weird look (presumably since it defaults to 127.0.0.1:8182 which doesn't need a token) -
noticed that
influxdb3 create token --admin --host ...works butinfluxdb3 create token --admin -H ...doesn't -
bikeshedding:
--admin-recovery-listenis a bit less wordy yet still descriptive
In terms of the default behavior, I suspect this was optimized for the container case where it is presumed that network controls are in place to prevent access into the container. However, we must consider non-container environments, composed container environments (eg, docker compose with influxdb3_ui, grafana, etc alongside influxdb3), processing engine plugins and managed environments.
Listening on 127.0.0.1:8182 by default is a security hole since a local user or process running on the system (including a processing engine plugin) can hit this port to change the token which breaks the operator token for others and grants the caller with full database privileges. This would qualify as a high to critical CVE if we released with this. This is not theoretical or hyperbolic: in the last couple of months alone I've triaged several issues in software we and our users use (python urllib3, kafka client and grafana) that could be leveraged to recycle the token by attackers (not to mention if someone poorly coded a request trigger). SSRFs and other security issues that allow connections to localhost are common and we need to cautiously design for this.
This behavior must be opt in (eg, server restart with non-default option) and the documentation of that behavior should have huge warnings. I don't know Amazon's architecture, but depending on how they handle the operator token, they might even ask that the functionality be entirely compiled out of the binary (we might want to do the same for InfluxData hosted Enterprise 3).
Considering all of the above, I suggest something along the lines of:
- disabling listening on
127.0.0.1:8182unless--admin-recovery-listen(or similar) is specified (ie, no listening on the recovery endpoint by default) - if
--admin-recovery-listenis specified but without an argument, then listen on127.0.0.1:8182 - if
--admin-recovery-listen=...is specified, use current behavior (ie, do what the user told you) - when
--admin-recovery-listenis specified, after the user regenerates the token, stop listening on this endpoint
I believe this has the correct balance of usability, caution and security. If someone has forgotten their operator token, they need to take a deliberate step to restart the server with an option to enable the special recovery endpoint. Then, once the recover endpoint is used to regenerate a token, the server tears down the recovery endpoint (but keeps listening on the normal endpoint).
|
@jdstrand - thanks for the feedback 🙏 if I understood right, start this secondary server only when user opts in and then shutdown as soon as regeneration is done.
EDIT: I've got the Regarding containerised environments I was wondering what should be the default behaviour in terms of exposing ports in Dockerfile. I haven't exposed 8182, should it expose it by default given that influxdb3 will start and stop the server based on config and the first use. I need to also think through how to make it work only once in enterprise with multi node setup if all nodes are started with the flag as I'm guessing it'd be a global only once and not per node. |
Nice!
I'm inclined to think 'no'. Since the user needs to restart the server to add |
29dada9 to
e3a71f3
Compare
hiltontj
left a comment
There was a problem hiding this comment.
Looks good to me.
I have a few more comments in line, none of which are blocking, but the one about the test and the one about RAII Guard are pretty straight forward, so up to you if you want to address them on this or a follow-on PR.
| pub const DEFAULT_HTTP_BIND_ADDR: &str = "0.0.0.0:8181"; | ||
|
|
||
| /// The default bind address for admin token recovery HTTP API. | ||
| pub const DEFAULT_ADMIN_TOKEN_RECOVERY_BIND_ADDR: &str = "127.0.0.1:8182"; |
There was a problem hiding this comment.
Not sure that this matters, but the default here uses 127.0.0.1 vs. 0.0.0.0 for the main bind address
There was a problem hiding this comment.
It is intentional, we want it to default to only loop back address. It makes choosing to listen on any other address as an opt-in as this allows the clients with access to this interface regenerate an admin token without password.
There was a problem hiding this comment.
Nice, that makes sense, thanks for clarifying.
| } | ||
|
|
||
| #[test_log::test(tokio::test)] | ||
| async fn test_recovery_endpoint_auto_shutdown_after_regeneration() { |
There was a problem hiding this comment.
This test does not verify that the recovery endpoint is shutdown after the token is regenerated, but the name of the test implies that it does verify that.
There was a problem hiding this comment.
Hmm, I thought that test had the assertion. I'll add the assertion again.
influxdb3_server/src/http.rs
Outdated
| // Small delay to ensure HTTP response is fully sent | ||
| tokio::time::sleep(Duration::from_millis(100)).await; |
There was a problem hiding this comment.
I think you could do this reliably without the use of a sleep. Something like a RAII guard.
My suggestion would be something like this:
struct Guard {
shutdown_token: ShutdownToken,
}
impl Guard {
fn new(shutdown_token: ShutdownToken) -> Self {
Self { shutdown_token }
}
}
impl Drop for Guard {
fn drop(&mut self) {
self.shutdown_token.cancel();
}
}Then, in the above method, you could do something like
let guard = Guard::new(recovery_api.shutdown_token.clone());
// Extend the response with the guard so cancel is called when the response is dropped:
let mut builder = ResponseBuilder::new();
let mut extensions = builder.extensions_mut().unwrap();
extensions.insert(guard);
// finish building response and returnThere was a problem hiding this comment.
Using builder extension is a really neat idea ❤️ - I'll change it.
| let http_metrics = | ||
| RequestMetrics::new(Arc::clone(&common_state.metrics), MetricFamily::HttpServer); |
There was a problem hiding this comment.
It may be that we only use the jaeger hooks from the query path. I was thinking more of the prometheus metrics.
So, you could test by starting up the server, doing a regenerate with the recovery endpoint, and then hitting the main server's /metric API to see if the metrics for the request to the recovery endpoint show up.
f89d91b to
a4f3da2
Compare
There was a problem hiding this comment.
Thanks for the changes! I love the tests and comments in the code.
I did some manual blackbox testing (not exhaustive as test coverage is very good):
- GOOD:
influxdb3 serve --object-store=file --data-dir /tmp/data --node-id node0confirmed to only listen on the main server endpoint - GOOD: still requires auth when not using
--without-auth - GOOD: does not require auth when using
--without-auth - GOOD:
create token --admingenerates the initial operator token, and errors when try again - GOOD:
show databasesworks with operator token and fails with no token/bad token - OK:
create token --admin --regeneratecorrectly fails, but with a connection refused error when--admin-token-recovery-http-bindis not specified. This is fine for security but is a usability issue: we should send back 401 - BUG:
create token --admin --token "$TOKEN" --regeneratereturns with connection refused when NOT using--admin-token-recovery-http-bind. It seems that if--tokenis specified, the client should use the main server API, not this recovery API - BUG:
create token --admin --token "bad" --regeneratesucceeds when using--admin-token-recovery-http-bind. It seems that if--tokenis specified, the client should use the main server API, not this recovery API - BUG:
create token --admin --token "$TOKEN" -H http://127.0.0.1:8181 --regeneratefails because-His not accepted (but--hostis) - GOOD:
show databases --token "$TOKEN"works with new token after regeneration via main server API and fails with old - GOOD: starting with
--admin-token-recovery-http-bindstarts the main server and the recovery endpoint on127.0.0.1:8182 - GOOD: starting with
--admin-token-recovery-http-bind=127.0.0.1:8183starts the main server and the recovery endpoint on127.0.0.1:8183 - GOOD:
create token --admin --regeneratecorrectly regenerates the token when server started with--admin-token-recovery-http-bind - GOOD: the recovery endpoint shuts down after using
create token --admin --regenerate - GOOD:
show databases --token "$TOKEN"works with new token after regeneration via recovery API and fails with old
I'm approving as is, but it would be nice to fix the 3 non-security bugs (I think 2 can be fixed in the same way, unconditionally use the main server API when --token is specified) and one usability issue. Those can be in this PR or a followup.
| // Only create recovery listener if explicitly enabled | ||
| let admin_token_recovery_listener = if let Some(addr) = config.admin_token_recovery_bind_address | ||
| { | ||
| info!(%addr, "Admin token recovery endpoint enabled - WARNING: This allows unauthenticated admin token regeneration!"); |
@jdstrand - it looks up If this behavior is confusing, I can swap it such that when you pass in |
I do think the behavior is confusing. You could have the client decide based on the presence of |
|
@jdstrand - I'll swap it and ping you for another review. Thanks again for taking time to review it 🙏 |
|
@jdstrand - I've made the changes to address your 3 points,
As discussed above,
Help text now includes Manual Test ResultsTest Environment
1. Help Text Verification ✅1.1 Token Creation Help$ ./target/debug/influxdb3 create token --admin --helpResult: ✅ VERIFIED: Help text correctly shows:
1.2 Server Help$ ./target/debug/influxdb3 serve --help-all | grep -A5 -B5 "admin-token-recovery"Result: ✅ VERIFIED: Recovery endpoint parameter is shown with proper warning 2. Server Without Recovery Endpoint ✅2.1 Start Server$ cargo run -- serve --node-id node-1 --object-store file --data-dir /home/praveen/projects/influx/test-data/core --disable-telemetry-upload --http-bind 127.0.0.1:81812.2 Test Regenerate with Default Settings$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerateResult: ✅ VERIFIED: Correctly tries main server (8181) and gets 401 Unauthorized, NOT connection refused to 8182 3. Server With Default Recovery Endpoint ✅3.1 Start Server with Recovery Endpoint$ cargo run -- serve --node-id node-2 --object-store file --data-dir /home/praveen/projects/influx/test-data/core2 --disable-telemetry-upload --admin-token-recovery-http-bind --http-bind 127.0.0.1:8281Server logs show: 3.2 Create Initial Admin Token$ ./target/debug/influxdb3 create token --admin --host http://127.0.0.1:8281Result: Token created successfully 3.3 Test Regenerate Without Explicit Host$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:8281Result: ✅ VERIFIED: Tries main server, fails with 401 (not recovery endpoint) 3.4 Test Regenerate with Explicit Recovery Endpoint$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:8182Result: Successfully regenerated token Server logs show: 3.5 Verify Recovery Endpoint Shutdown$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:8182Result: ✅ VERIFIED: Recovery endpoint automatically shut down after successful regeneration 3.6 Test Regenerate on Main Server with Token$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:8281 --token <NEW_TOKEN>Result: Successfully regenerated token on main server 4. Server With Custom Recovery Endpoint ✅4.1 Start Server with Custom Address$ cargo run -- serve --node-id node-3 --object-store file --data-dir /home/praveen/projects/influx/test-data/core3 --disable-telemetry-upload --admin-token-recovery-http-bind 0.0.0.0:9999 --http-bind 127.0.0.1:8381Server logs show: 4.2 Create Admin Token$ ./target/debug/influxdb3 create token --admin --host http://127.0.0.1:8381Result: Token created successfully 4.3 Test Default Regenerate (Should Fail)$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:8381Result: ✅ VERIFIED: Does NOT try default 8182, correctly tries main server 4.4 Test with Custom Recovery Endpoint$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:9999Result: Successfully regenerated token 4.5 Verify Shutdown$ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:9999Result: Connection refused 5. Recovery Endpoint Isolation ✅5.1 Test Database Creation on Recovery Endpoint$ ./target/debug/influxdb3 create database --host http://127.0.0.1:8182 testdbResult: 5.2 Test Table Creation on Recovery Endpoint$ ./target/debug/influxdb3 create table --host http://127.0.0.1:8182 --database foo barResult: ✅ VERIFIED: Recovery endpoint correctly returns 404 for all non-recovery operations 6. Edge Cases ✅6.1 Server Without Admin TokenStart server with auth disabled: $ cargo run -- serve --node-id node-5 --object-store file --data-dir /home/praveen/projects/influx/test-data/core5 --disable-telemetry-upload --admin-token-recovery-http-bind 127.0.0.1:8183 --http-bind 127.0.0.1:8581 --without-authTry to regenerate: $ echo "yes" | ./target/debug/influxdb3 create token --admin --regenerate --host http://127.0.0.1:8183Result: ✅ VERIFIED: Correctly reports that there's no admin token to regenerate SummaryAll tests pass successfully! The key behavioral changes are verified:
The changes make the behavior more explicit and predictable, requiring users to consciously choose to use the less-secure recovery endpoint rather than having it as an automatic default. |
jdstrand
left a comment
There was a problem hiding this comment.
Thanks for the updates! I didn't do a full round of manual testing (I see that you did), but can confirm that it is behaving as expected:
- not listening on recovery endpoint when
--admin-token-recovery-http-bindis not specified, listening on127.0.0.1:8182when--admin-token-recovery-http-bindis specified and listening elsewhere when--admin-token-recovery-http-bindhas an argument create token --admin --regenerate(without -H) requires a token regardless of if--admin-token-recovery-http-bindis specified (ie, it is hitting the main API)create token --admin --regenerate -H <recovery endpoint>does not require a token and shuts down the endpoint after hitting it- token is properly regenerated in all cases
The usability issue, -H bug and connection refused bug with --regenerate when --admin-token-recovery-http-bind is not specified are all addressed.
We still have the issue with this command: create token --admin --token "bad" --regenerate --host <recovery endpoint> succeeding (because it is silently ignoring the --token argument). Not a blocker.
| long = "regenerate", | ||
| help = "Regenerate the operator token (uses port 8182 by default instead of 8181)" | ||
| help = "Regenerate the operator token. By default connects to the main server (http://127.0.0.1:8181). | ||
| To use the admin token recovery endpoint, specify --host with the recovery endpoint address" |
- new server to only serve admin token regeneration without an admin token has been added - minor refactors to allow reuse of some of the utilities like trace layer for metrics moved to their own functions to allow them to be instantiated for both servers - tests added to check if both the new server works right for regenerating token and also ensure none of the other functionalities are available on the admin token recovery server closes: #26330
- recovery server now only starts when `--admin-token-recovery-http-bind` is passed in - as soon as regeneration is done, the recovery server shuts itself down - the select! macro logic has been changed such that shutting down recovery server does not shutdown the main server
- when `--regenerate` is passed in, `--host` still defaults to the main server. To get to the recovery server, `--host` with the recovery server address should be passed in
2f78473 to
40102ae
Compare
Good point, I can make |
Perhaps it's sufficient to document that |
Summary
closes: #26330
Security considerations
The admin token recovery (using
--regenerate) currently requires a valid admin token to be passed in. If that's lost for some reason there is no way to get back into the server other than switching back to running--without-auth. This PR addresses it by assuming the following workflow,--admin-token-recovery-http-bindor--admin-token-recovery-http-bind $IP:$PORT. In either of those cases the recovery server will be started either on the explicit$IP:$PORTor on default127.0.0.1:8182when explicit ip/port is not specified.0.0.0.0:8181(the main server).--admin-token-recovery-http-bindargument. Note: this operation is only allowed once after the server restart. After successful regeneration the recovery server should shutdown immediately, but the main server will still be online.Manual Tests
Help sections show that
--regeneratedefaults to port127.0.0.1:8182, so that it is not necessary to mention--regenerate --host https://127.0.0.1:8182serve shows new parameter only in
--help-all, it's listed underCommon OptionssectionStarting server without
--admin-token-recovery-http-bindshould have no recovery server running--regenerate, it should failCreate admin token and see if it can be regenerated using default
127.0.0.1:8182.start the server
try regenerating admin token
Create admin token and see if it can be regenerated using non-default
$host:$port.create admin token
try regenerating admin token (note, when I started the server I passed in
--admin-token-recovery-http-bind 192.168.1.94:8181so it expects the host to be passed in), default127.0.0.1:8182it uses fails (todo: make these errors display easy to read messages than the full error)run
--regenerate, this time passing in--hostrun
--regenerateagain with--host http://192.168.1.94:8181(it should fail as the recovery server is shutdown after successful regeneration)Other endpoints are not accessible through this server, create db, table etc. returns a
404