rgw/kafka: set message timeout to 5 seconds#55952
Conversation
src/common/options/rgw.yaml.in
Outdated
| level: advanced | ||
| desc: This is the maximum time in milliseconds to deliver a message (including retries) | ||
| long_desc: Delivery error occurs when either the retry count or the message timeout are exceeded. | ||
| If set to zero, message will time out only based on retries. |
There was a problem hiding this comment.
but we are not setting retries here, so if someone sets to zero what will happen?
since we are allowing setting timeout, can we also set max_retry? message.send.max.retries & retry.backoff.ms
There was a problem hiding this comment.
you are right, the retry default is too high.
will add as a conf parameter with a lower value.
the retry backoff is 100ms up to max of 1s. these values make sense. do you think we should expose these?
There was a problem hiding this comment.
100ms seems reasonable , we could expose it, but keep the default value to same as kafka ?
There was a problem hiding this comment.
looking at the conf of the librdkafka version that we use: https://github.com/confluentinc/librdkafka/blob/v0.11.6/CONFIGURATION.md
the default retries are 2.
the backoof is 100ms
there is no max backoff (not sure there is exponential backoff).
anyway, until we upgrade our library version, we don't need to add these
There was a problem hiding this comment.
looking at the conf of the librdkafka version that we use: https://github.com/confluentinc/librdkafka/blob/v0.11.6/CONFIGURATION.md
the default retries are 2. the backoof is 100ms there is no max backoff (not sure there is exponential backoff). anyway, until we upgrade our library version, we don't need to add these
sorry, but why not add it ? i get its value is 2 max_retry but since we are doing the PR for TTL, we can add max_retry config as well, its only a single line change to add this config ? (purely for being a good citizen, since we add ttl, we also add the max_retry and let the default be same 2 as kafka) ?
There was a problem hiding this comment.
yes, will add. they changed it to MAX_INT in v1.6.1 (the version in rhel/centos9)
There was a problem hiding this comment.
only thing i'm not going to add is the max backoff - since it does not exist in the librdkafka versions we use
There was a problem hiding this comment.
could not get the max retry conf value to work.
at least for librdkafka v1.6.1, when message timeout was set to zero, the message never expired, even if max retry and the backoff numbers were set to low values.
so, to prevent this case, timeout cannot be set to zro, and will be set to 1ms if the user set it to zero.
| Note that the connection will not be considered idle, even if it is down, | ||
| as long as there are attempts to send messages to it. | ||
| default: 30 | ||
| default: 300 |
There was a problem hiding this comment.
why are we changing this value ?
There was a problem hiding this comment.
this should be used for garbage collection of the connections, and not for error handling (as it was done before).
30sec is too low for that purpose.
also increase the idle timeout to 30 seconds. test instructions: https://gist.github.com/yuvalif/33487bff19883e3409caa8a843a0b353 Fixes: https://tracker.ceph.com/issues/64710 Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
| // however, testing with librdkafka v1.6.1 did not expire the message in that case. hence, a value of zero is changed to 1ms | ||
| constexpr std::uint64_t min_message_timeout = 1; | ||
| const auto message_timeout = std::max(min_message_timeout, conn->cct->_conf->rgw_kafka_message_timeout); | ||
| if (rd_kafka_conf_set(conn->temp_conf, "message.timeout.ms", |
There was a problem hiding this comment.
just think more on this, i was wondering do we want to do this only for sync-notification, coz for non-sync notification having larger timeout means if there is a transient kafka downtime which is more than 5 seconds but less than 5 minutes (kafka default), the persistent notification will just wait and succeed eventually vs retrying to send it ?
for sync notification latency is associated, so having 5 seconds there make sense
There was a problem hiding this comment.
for persistenmt notifications a long timeout creates a different issue, as the coroutines of the notification manager will just pile up and consume significant amount of memory.
i think we did not see that as a huge problem so far, becasue of the 30 seconds idle timeout.
but if set to 5 minutes, it might create an issue.
There was a problem hiding this comment.
Ohh I see, so then you want to keep the behaviour for persistent same (30 seconds) since we have not seen any issues and seems to working as expected ?
Keeping 5 seconds will now exhaust the retries faster, and we would writing entries back to queue more often then before.
There was a problem hiding this comment.
Ohh I see, so then you want to keep the behaviour for persistent same (30 seconds) since we have not seen any issues and seems to working as expected ? Keeping 5 seconds will now exhaust the retries faster, and we would writing entries back to queue more often then before.
will have to test that to see the impact. but i agree that we dont want to see a RADOS spike when the broker is down.
an option would be to keep 2 connections per broker, one for persistent notifications (with 30sec message timeout), and one for sync notifications (with 5sec message timeout)?
There was a problem hiding this comment.
Why complicate with 2 connections. Rather have a default with 30 seconds for persistent and then override that value for sync notification based on conf value
There was a problem hiding this comment.
persistency is per topic. connection is per broker.
we don't create a connection per topic, so we would keep 2 connections per broker, each with different conf
There was a problem hiding this comment.
ohh i see, so how we want to handle this ?
test first or just separate out the 2 connections ?
There was a problem hiding this comment.
i tested it, and could not see any perf hit in cpu or disk.
looked at the rgw_notify.cc code, and it actually makes sense that there will be no impact on the kafka timeout:
- when a persistent notification gets an error, it does not do anything to the queue (no RADOS operation). the notification is deleted from the queue only when it gets an ack
- the "max retry" mechanism is memory based only. meaning, if we defined max retries, and get an error we update the count only in memory and not in the queue, so there is no RADOS operation here either
There was a problem hiding this comment.
thanks for the verification, so the only side effect now is that retries will happen faster compared to earlier which was waiting for the response for 30 seconds before retrying.
|
jenkins test make check |
|
jenkins test windows |
|
jenkins test api |
|
jenkins test docs |
|
jenkins test make check |
|
one failure in teuthology: https://pulpito.ceph.com/yuvalif-2024-03-13_06:01:17-rgw:notifications-wip-yuval-64710-distro-default-smithi/ |
also increase the idle timeout to 30 seconds.
test instructions:
https://gist.github.com/yuvalif/33487bff19883e3409caa8a843a0b353
Fixes: https://tracker.ceph.com/issues/64710
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e