rgw/notifications: handle migration state between v1 and v2#55527
rgw/notifications: handle migration state between v1 and v2#55527yuvalif wants to merge 22 commits intoceph:wip-rgw-meta-topicfrom
Conversation
1cf5de1 to
85b49c2
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
44c9bf9 to
0bea9ba
Compare
|
@yuvalif regarding the get_url() commit, can you help me understand how that related to your forwarding issues? i thought the master zone was responding with 503s, which rgw_http_error_to_errno() should be mapping to we really don't want to run this retry logic for normal http errors like 503, because cc @jzhu116-bloomberg who added this in #53320 |
| int RadosStore::stat_topics_v1(const std::string& tenant, optional_yield y, const DoutPrefixProvider *dpp) { | ||
| return rgw_stat_system_obj(dpp, svc()->sysobj, svc()->zone->get_zone_params().log_pool, topics_oid(tenant), nullptr, nullptr, y, nullptr); |
There was a problem hiding this comment.
it may actually be better to send a read request here instead of a stat so we can initialize the metadata cache. if we stat here and go on to read it later, that requires two separate round trips to the osd. what do you think?
There was a problem hiding this comment.
it is only during migration that we expect this to return sucess.
if we are in v2 after migration we expect -ENOENT, and hopefuly just get a negative chache entry.
if we are in v2, we should not call this at all.
There was a problem hiding this comment.
it is only during migration that we expect this to return sucess.
yeah, i thought that was reason enough
if we are in v2, we should not call this at all
unrelated to read vs. stat, but v2 will always have to call this to determine whether we're in the migrating state - at least until a later release like U removes the v1 format and migration logic entirely
There was a problem hiding this comment.
sorrry, was a typo. i meant in v1 (where we expect to get actual values) we shhoulod not call this at all
There was a problem hiding this comment.
np. i just wanted to point out the interaction with stat and metadata caching
9b000d1 to
714fbb6
Compare
there could be some other issue, but without that commit i have the following issue. while on the 2ndary: with this commit the 2ndary also return 500. |
714fbb6 to
da48af8
Compare
removed the "get_url()" commit, and added |
src/rgw/rgw_http_errors.h
Outdated
| #pragma once | ||
|
|
||
| #include "rgw_common.h" | ||
| #include <asm-generic/errno-base.h> |
There was a problem hiding this comment.
was this really required? why not <errno.h>?
src/rgw/rgw_http_errors.h
Outdated
| case 409: | ||
| return -ENOTEMPTY; | ||
| case 500: | ||
| return -EAGAIN; |
There was a problem hiding this comment.
why EAGAIN instead of ERR_INTERNAL_ERROR? set_req_state_err() doesn't have a mapping for EAGAIN, so it relies on this:
dout(0) << "WARNING: set_req_state_err err_no=" << err_no
<< " resorting to 500" << dendl;
err.http_ret = 500;
err.err_code = "UnknownError";in contrast, rgw_http_s3_errors has a more specific mapping for { ERR_INTERNAL_ERROR, {500, "InternalError" }},
There was a problem hiding this comment.
changed to ERR_SERVICE_UNAVAILABLE and 503
da48af8 to
310edbd
Compare
…n multisite config Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
…ions in multisite config. Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
…een bucket and topics. Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
…ions in multisite config. Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
both locally and in teuthology Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
v1 could be enabled only in local tests. teuthology tests would run with v2 Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
make the SiteConfig available to all of RGWRados via svc.site instead of storing it in sal::RadosStore Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWPubSub constructor takes SiteConfig instead of zonegroup map replace do_all_zonegroups_support_notification_v2() with a generic function rgw::all_zonegroups_support() that handles non-realm configurations too remove unused sal::ZoneGroup::supports_feature() Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
metadata sync ignores entries other than MDLOG_STATUS_COMPLETE, so we don't need to write separate prepare/complete entries. metadata mutations can just call complete_entry() on success Signed-off-by: Casey Bodley <cbodley@redhat.com>
rename read_topics()/write_topics() to 'v1' and only call them from internal v1 call paths public get_topics() now calls read_topics_v1() for the v1 case, and does the paginated listing with driver->meta_list_keys_next() for v2 RGWPSListTopicsOp now uses the NextToken request/response params with the paginated get_topics(), limiting responses to 100 entries like AWS 'radosgw-admin topic list' also paginates the listing according to --max-entries to avoid reading everything into memory at once Signed-off-by: Casey Bodley <cbodley@redhat.com>
the format of topic metadata keys is agnostic to the backend, so the parsing/formatting functions should be in rgw_pubsub.h Signed-off-by: Casey Bodley <cbodley@redhat.com>
add a new interface for topic metadata that doesn't depend on metadata backends. this low-level interface is used by both RadosStore and the topic metadata handler remove Driver::delete_bucket_topic_mapping() from sal because the omap object is deleted internally by rgwrados::topic::remove() remove the RGWRados::topics_pool_ctx member Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
8e6de2e to
9dfdcee
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
test instructions: https://gist.github.com/yuvalif/21449e301732b719cd1ed97c3eeeabb2 * during migration all topic and notification operations must fail with HTTP error code 503 * read operations should return the values of the v1 topics and notifications * sending notifications should continue based on v1 values Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
310edbd to
7f92f74
Compare
4620245 to
666e79f
Compare
|
merged into #55657 |
test instructions:
https://gist.github.com/yuvalif/21449e301732b719cd1ed97c3eeeabb2
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e