objecter: use configurable to set local and balanced reads#56180
objecter: use configurable to set local and balanced reads#56180
Conversation
gregsfortytwo
left a comment
There was a problem hiding this comment.
Well, this is short and sweet. I've been leaning towards preferring per-component switches to allow more granular control of things, but the simplicity and universality here makes me think again.
| bool keep_balanced_budget = false; | ||
| bool honor_pool_full = true; | ||
|
|
||
| std::atomic<int> extra_read_flags{0}; |
There was a problem hiding this comment.
Eww do we really need to make this atomic and stick a memory barrier in the middle of all those constructor calls?
All we really need here is for the memory update to not be torn and I don't think that can happen for these sizes?
There was a problem hiding this comment.
Use the atomic, read/write races are undefined behavior otherwise.
There was a problem hiding this comment.
if we want to avoid the atomic here, we might consider removing the support for runtime changes. i think we care more about performance here than the convenience around config changes
There was a problem hiding this comment.
We don't need sequentially consistent memory ordering (default for a C++ atomic), release-acquire ordering should be sufficient - https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering .
On X86 with its strong memory model this results in no memory barriers in the generated code, just constraints on the compiler not to re-order code. On other architectures (e.g. ARM and POWER) release-acquire ordering adds synchronization instructions, but these are still have less overhead than the instructions added for sequential consistent memory order.
So using code like this when reading the flags would be more efficient: extra_read_flags.load(std::memory_order_acquire)
Given that the balanced and localized flags are just performance optimizations, and as such it doesn't really matter if a race hazard between threads mean a few I/Os use an out of date copy of flags, then a weaker memory ordering of memory_order_relaxed could be used
There was a problem hiding this comment.
Using a weaker memory ordering seems reasonable to me -- my main concern was simply that we use defined C++ memory ordering semantics.
| bool keep_balanced_budget = false; | ||
| bool honor_pool_full = true; | ||
|
|
||
| std::atomic<int> extra_read_flags{0}; |
There was a problem hiding this comment.
There is already a global_ops_flags which is also atomic and appears only to be use by the filer Client::set_filer_flags and Client::get_filer_flags to set the same LOCALIZE_READS flag.
Why do we need two separate sets of additional flags, why can't this code just set the global_ops_flags?
There was a problem hiding this comment.
global_ops_flags is used for all ops, extra_read_flags is used only for read ops
| bool keep_balanced_budget = false; | ||
| bool honor_pool_full = true; | ||
|
|
||
| std::atomic<int> extra_read_flags{0}; |
e209eae to
8dcf5ce
Compare
cbodley
left a comment
There was a problem hiding this comment.
i like this approach very much. is there a natural place in the rados suite to the add test coverage necessary to validate it?
5e5ed24 to
369544b
Compare
|
i see that the cc @rzarzynski |
| osd: | ||
| crush location: 'root=default host=host-$id' | ||
| client: | ||
| rados replica read policy: balance |
There was a problem hiding this comment.
@idryomov does the crush location required for balance read similar to local?
Do this feature supported in ec pool ? The name suggests it is replica feature
There was a problem hiding this comment.
@idryomov does the crush location required for balance read similar to local?
No, specifying just rados replica read policy: balance should be enough.
Do this feature supported in ec pool ? The name suggests it is replica feature
It's not supported on EC pools.
| bool keep_balanced_budget = false; | ||
| bool honor_pool_full = true; | ||
|
|
||
| std::atomic<int> extra_read_flags{0}; |
| bool keep_balanced_budget = false; | ||
| bool honor_pool_full = true; | ||
|
|
||
| std::atomic<int> extra_read_flags{0}; |
There was a problem hiding this comment.
nit: is there a particular reason for using signed type for the extra_read_flags?
mutate() takes flags as int, I see.
| bufferlist *pbl, | ||
| int flags) | ||
| int flags, | ||
| int flags_mask) |
There was a problem hiding this comment.
nit: mixed indentation.
mkogan1
left a comment
There was a problem hiding this comment.
Graphs are showing IO operations on each of the replica 3 OSDs while performing S3 GET operations of 1M 4KB objects.
On the left ./bin/radosgw ... --rados_replica_read_policy=default and on the right ./bin/radosgw ... --rados_replica_read_policy=balance where please note the IO operations are distributed almost evenly between the drives.

@rzarzynski do you have any feedback on this part? i see that you've approved - do you think this is ready for qa and merge? |
369544b to
2bda7d4
Compare
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
Hi folks, what needs to happen for this to merge? I thought it was in already and I think we've pulled it in to our product so I'd like to see it get in. :o |
ping. combinatorics aside, my main concern is that the addition of |
I agree. Even if this was a saving measure on the size of suite, it would be better to test for Let's add the missing yaml, rebase and merge. Thanks for finding this, @cbodley! |
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
for localized and balanced reads. Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
Many of the exec calls were using a read version of the API while they were calling a write objclass method. This modifies the calls to use the write API. Fixes: https://tracker.ceph.com/issues/65889 Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
These calls are read calls but can and have been used as write calls, which doesn't work with local (and balanced) affinity. Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
Signed-off-by: Yehuda Sadeh <ysadehwe@ibm.com>
fe19a13 to
24087ec
Compare
|
i pushed a rebase: src/cls/rgw/cls_rgw_client.cc conflicts came from #58448 which had already converted cls_rgw_lc_put_head(), cls_rgw_lc_rm_entry(), cls_rgw_lc_set_entry() to Object*Operation for use with async_operate() src/common/options/rgw.yaml.in had a temporary conflicts between the addition/removal of rgw_replica_read_policy and the new rgw_bucket_logging_obj_roll_time |
with read-affinity/ at the top level of the rados suite, teuthology would try to run it as a subsuite instead of mixing its yamls into an existing subsuite. move into rados/verify instead Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
thanks @rzarzynski. i added the empty default.yaml and an additional commit: let me know if you'd prefer it somewhere other than rados/verify, or if you want that symlinked into multiple rados subsuites |
idryomov
left a comment
There was a problem hiding this comment.
Ack from the RBD perspective (at least one more instance of exec() misuse has sneaked in into RBD in the interim, but we can take care of it separately)
|
This sets the relevant flags when a read op is created. The configurable allows for
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e