rgw/notifications: add http request timeout and max inflight#63404
rgw/notifications: add http request timeout and max inflight#63404
Conversation
|
|
||
| static std::unique_ptr<RGWHTTPManager> s_http_manager; | ||
| static std::shared_mutex s_http_manager_mutex; | ||
| static std::atomic<unsigned> s_http_manager_inflight(0); |
There was a problem hiding this comment.
it would probably make more sense for this throttle to be per-endpoint rather than global, wouldn't it?
There was a problem hiding this comment.
the problem that was reported here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MGFY6Q5S7KC2BPHCKQCBM42DJ2O4WOH7/
was a coroutine stack overflow due to large number of open http requests.
this is not per endpoint, so, IMO, we should keep the counter global.
There was a problem hiding this comment.
ok, so we want a global throttle to limit how much work we'll schedule at once. is there a reason this only applies to http endpoints though, and not others?
There was a problem hiding this comment.
it exists in the kafka and amqp clients from the start :-)
https://github.com/ceph/ceph/blob/main/src/rgw/rgw_kafka.cc#L712
| const auto max_inflight = cct->_conf->rgw_http_notif_max_inflight; | ||
| if (max_inflight != 0 && | ||
| s_http_manager_inflight >= cct->_conf->rgw_http_notif_max_inflight) { |
There was a problem hiding this comment.
s_http_manager_inflight >= max_inflight since you made a local variable for it
| services: | ||
| - rgw | ||
| with_legacy: true | ||
| - name: rgw_http_notif_message_timeout |
There was a problem hiding this comment.
can we document these options somewhere? either in a 'config' section in https://docs.ceph.com/en/latest/radosgw/notifications/ or a new 'bucket notification' section similar to https://docs.ceph.com/en/latest/radosgw/config-ref/#topic-persistency-settings
and note that when rendered that way, it only shows the long_desc
| - ``rgw_http_notif_message_timeout``: This is the maximum time in seconds to deliver a notification. | ||
| Delivery error occurs when the message timeout is exceeded. | ||
| This value includes the connection time, and hence must be larger than the connection timeout. | ||
| If set to zero the http client will wait indefinitely. If not set the default will be 10 seconds. | ||
| - ``rgw_http_notif_connection_timeout``: This is the maximum time in seconds to connect to the endpoint. | ||
| Delivery error occurs when the message timeout is exceeded. | ||
| If set to zero the defaut value of 300 seconds will be used. If not set the default will be 5 seconds. | ||
| - ``rgw_http_notif_max_inflight``: This is the maximum number of messages in-flight (across all http endpoints). | ||
| Delivery error (BUSY) occurs when the number of messages is exceeded. | ||
| If set to zero there is not limit on the number of messages in-flight. If not set the default will be 8192. |
There was a problem hiding this comment.
we have a confval directive that makes this much easier. for example, doc/radosgw/config-ref.rst just has:
.. confval:: rgw_frontends
.. confval:: rgw_data
.. confval:: rgw_enable_apis
which renders to https://docs.ceph.com/en/latest/radosgw/config-ref/#ceph-object-gateway-config-reference
this avoids having to duplicate the text between docs and config options
| s_http_manager_inflight >= max_inflight) { | ||
| ldout(cct, 1) << "ERROR: send failed. http endpoint manager busy. in-flight requests: " << | ||
| s_http_manager_inflight << " >= " << max_inflight << dendl; | ||
| return -EBUSY; |
There was a problem hiding this comment.
how do the upper levels handle EBUSY? process_entry() returns Failure back to process_queue(). will that just retry immediately?
There was a problem hiding this comment.
if we keep returning EBUSY back to process_queue(), it will resend the cls_2pc_queue_list_entries() op every time and hammer the osd with reads. once we hit this limit, several process_queue() coroutines may end up doing this at once
i have a feeling that we'd be better off suspending the coroutine here until we're able to schedule this request. that's more complicated though
There was a problem hiding this comment.
in case of persistent notifications we retry according to the retry configuration we have there.
we use: rgw_topic_persistency_sleep_duration to space out retries of a specific entry. but we are not looking into the result code.
we are treating all errors in the same way.
would be a nice enhancement to treat EBUSY differently. maybe use larger value for rgw_topic_persistency_sleep_duration?
in case of non-persistent notifications we reply EBUSY to the frontend, which probably convert that 503
There was a problem hiding this comment.
looking at the use of rgw_topic_persistency_sleep_duration in https://github.com/ceph/ceph/blob/458d8862/src/rgw/driver/rados/rgw_notify.cc#L231-L246, that causes process_entry() to return Sleeping but doesn't cause process_queue() itself to sleep
the only sleep i see in process_queue() is for is_idle (https://github.com/ceph/ceph/blob/458d8862/src/rgw/driver/rados/rgw_notify.cc#L383-L389) which is false when the queue has entries to send/retry
so when process_entry() returns Sleeping, process_queue() will immediately loop back and resend cls_2pc_queue_list_entries(), then call process_entry() again with the same result. we don't want to keep sending those listing ops while we're throttled or waiting for retry
There was a problem hiding this comment.
the sleep indication is per entry, and the queue can have many entries in different state.
i guess that one optimization is to check if all entries in the queue are in "sleep" state, we can put the queue processing into sleep state.
note, however, that this is a different issue. the RGW will be busy going through the queue, but will not send the notification to the HTTP server (which was causing the issue).
dd9b95a to
4d8f27a
Compare
|
@cbodley can you please re-review? |
|
notification tests are passing: https://pulpito.ceph.com/yuvalif-2025-05-27_03:24:05-rgw:notifications-wip-yuval-71390-distro-default-smithi/ |
this should allow for proper shutdown of the queue handling code of persistent notifications. Fixes: https://tracker.ceph.com/issues/71390 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
|
overwritten the branch by mistake. |
|
replaced by: #64110 |
also make connection timeout configurable
Fixes: https://tracker.ceph.com/issues/71402
Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition