Skip to content

rgw: data sync fairness#51493

Merged
smanjara merged 4 commits intoceph:mainfrom
smanjara:wip-data-sync-fairness
Jul 14, 2023
Merged

rgw: data sync fairness#51493
smanjara merged 4 commits intoceph:mainfrom
smanjara:wip-data-sync-fairness

Conversation

@smanjara
Copy link
Contributor

@smanjara smanjara commented May 15, 2023

https://tracker.ceph.com/issues/61171

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@smanjara smanjara requested a review from a team as a code owner May 15, 2023 17:11
@github-actions github-actions bot added the rgw label May 15, 2023
@smanjara
Copy link
Contributor Author

not ready yet. still has a few missing pieces.

@smanjara smanjara force-pushed the wip-data-sync-fairness branch 3 times, most recently from 8f7781a to 4d0a400 Compare May 15, 2023 17:56
@smanjara
Copy link
Contributor Author

crashes on data sync start up:

#2  RGWCoroutinesStack::operate (this=0x557da9867b80, dpp=0x557da4140038, _env=_env@entry=0x7f07ffb07590)
    at /home/smanjara/ceph/src/rgw/rgw_coroutine.cc:250

#3  0x0000557da1e4412a in RGWCoroutinesManager::run (this=this@entry=0x557da90b3130, dpp=<optimized out>, 
    stacks=std::__cxx11::list = {...}) at /home/smanjara/ceph/src/rgw/rgw_coroutine.cc:653

#4  0x0000557da1e44f80 in RGWCoroutinesManager::run (this=this@entry=0x557da90b3130, dpp=<optimized out>, 
    op=0x557da939f800) at /home/smanjara/ceph/src/rgw/rgw_coroutine.cc:792

#5  0x0000557da21fbb0f in RGWRemoteDataLog::run_sync (this=this@entry=0x557da90b3130, dpp=<optimized out>, 
    dpp@entry=0x557da4140038, num_shards=128) at /home/smanjara/ceph/src/rgw/driver/rados/rgw_data_sync.cc:3193

@smanjara smanjara force-pushed the wip-data-sync-fairness branch from 4d0a400 to 3b293c6 Compare May 25, 2023 05:13
@smanjara
Copy link
Contributor Author

crashes on data sync start up:

#2  RGWCoroutinesStack::operate (this=0x557da9867b80, dpp=0x557da4140038, _env=_env@entry=0x7f07ffb07590)
    at /home/smanjara/ceph/src/rgw/rgw_coroutine.cc:250

#3  0x0000557da1e4412a in RGWCoroutinesManager::run (this=this@entry=0x557da90b3130, dpp=<optimized out>, 
    stacks=std::__cxx11::list = {...}) at /home/smanjara/ceph/src/rgw/rgw_coroutine.cc:653

#4  0x0000557da1e44f80 in RGWCoroutinesManager::run (this=this@entry=0x557da90b3130, dpp=<optimized out>, 
    op=0x557da939f800) at /home/smanjara/ceph/src/rgw/rgw_coroutine.cc:792

#5  0x0000557da21fbb0f in RGWRemoteDataLog::run_sync (this=this@entry=0x557da90b3130, dpp=<optimized out>, 
    dpp@entry=0x557da4140038, num_shards=128) at /home/smanjara/ceph/src/rgw/driver/rados/rgw_data_sync.cc:3193

resolved this. turns out the notify_stack cr was also being drained just before calling RGWDataSyncShardControlCR() in https://github.com/ceph/ceph/blob/cec15a5/src/rgw/driver/rados/rgw_data_sync.cc#L2417

@smanjara smanjara force-pushed the wip-data-sync-fairness branch from 3b293c6 to d0015a7 Compare May 25, 2023 16:57
@smanjara
Copy link
Contributor Author

fixed another crash due to incorrect ret handling of cr. did a very basic put bucket operations with three rgws per zone.

Needs a lot more testing. showing sync lock distribution:

./.././bin/rados ls -p zg1-2.rgw.log 2> /dev/null | grep datalog 2> /dev/null | foreach -Parallel {../.././bin/rados lock info $_ -p zg1-2.rgw.log sync_lock 2> /dev/null} | convertFrom-json | foreach {$_.lockers.name} | group | select Count, Name

Count Name
----- ----
   48 client.4223
   40 client.4248
   40 client.4278

@smanjara
Copy link
Contributor Author

Ran some more tests with a few thousand buckets in both full sync and incremental modes and I don't see any issues so far.
@cbodley please review.

@smanjara smanjara requested a review from cbodley May 30, 2023 19:41
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, minor comments

@mattbenjamin
Copy link
Contributor

really nice work, @smanjara !

@mkogan1
Copy link
Contributor

mkogan1 commented Jun 4, 2023

@smanjara very cool, when you think its ready stability wise for a 400M obj sync test run pls let know

@smanjara smanjara force-pushed the wip-data-sync-fairness branch from d0015a7 to 59dbfb5 Compare June 5, 2023 18:06
@smanjara
Copy link
Contributor Author

smanjara commented Jun 5, 2023

@smanjara very cool, when you think its ready stability wise for a 400M obj sync test run pls let know

@mkogan1 thanks Mark. there is still an issue while running our multisite test suite that I am trying to debug.
But it should not stop us from running workload tests.

Shilpa Jagannath added 3 commits June 14, 2023 19:59
…fication CR

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…lock and lost_bid

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
@smanjara smanjara force-pushed the wip-data-sync-fairness branch 2 times, most recently from d6887cf to cd4b9fb Compare June 15, 2023 00:04
@mkogan1
Copy link
Contributor

mkogan1 commented Jun 29, 2023

@smanjara
testing functionally (top main before this PR commits, bottom after)
under a sunny day scenario its fine:

primary:
image

secondary:
image

but during a 3x3 MS PUT workload (60M objects bi-dir in 2 stages)
after a segfault on the secondary (unrelated to the data sync AFAICT)

   -10> 2023-06-29T09:52:26.392+0000 7fffc8061700 20 RGW-SYNC:data:sync:shard[62]: shard_id=62 log_entry: 00000000000000000000:00000000000000442611:2023-06-29T09:08:26.96>
    -9> 2023-06-29T09:52:26.392+0000 7fffc8061700  1 RGW-SYNC:data:sync:shard[62]: incremental sync on test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1>
    -8> 2023-06-29T09:52:26.392+0000 7fffc8061700 10 RGW-SYNC:data:sync:shard[111]:entry[test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1:398[2]]:bucke>
    -7> 2023-06-29T09:52:26.393+0000 7fffc8061700  5 RGW-SYNC:data:sync:shard[47]:entry[test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1:590[2]]:bucket>
    -6> 2023-06-29T09:52:26.393+0000 7fffc8061700 20 RGW-SYNC:data:sync:shard[47]:entry[test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1:590[2]]:bucket>
    -5> 2023-06-29T09:52:26.393+0000 7fffc8061700 20 RGW-SYNC:data:sync:shard[47]:entry[test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1:590[2]]:bucket>
    -4> 2023-06-29T09:52:26.393+0000 7fffc8061700 20 RGW-SYNC:data:sync:shard[47]:entry[test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1:590[2]]:bucket>
    -3> 2023-06-29T09:52:26.393+0000 7fffc8061700 20 RGW-SYNC:data:sync:shard[47]:entry[test-100m-1000000000000:f5eea735-2164-49b3-824f-cb4bb8f88270.7050.1:590[2]]:bucket>
    -2> 2023-06-29T09:52:26.397+0000 7fffec0a9700  1 do_command 'perf dump' '{prefix=perf dump}'                                                                          
    -1> 2023-06-29T09:52:26.397+0000 7fffec0a9700  1 do_command 'perf dump' '{prefix=perf dump}' result is 0 bytes                                                        
     0> 2023-06-29T09:52:26.525+0000 7fffc8061700 -1 *** Caught signal (Segmentation fault) **                                                                            
 in thread 7fffc8061700 thread_name:data-sync

 ceph version 17.0.0-19701-ge96c29d1851 (e96c29d18512aefac05ea4b7af834302b5acbb5e) reef (dev)                                                                             
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7ffff4479cf0]
 2: (RGWSyncShardMarkerTrack<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std>
 3: (RGWDataSyncSingleEntryCR::operate(DoutPrefixProvider const*)+0x232d) [0x1357ead]                                                                                     
 4: (RGWCoroutinesStack::operate(DoutPrefixProvider const*, RGWCoroutinesEnv*)+0x1f4) [0xbe04f4]                                                                          
 5: (RGWCoroutinesManager::run(DoutPrefixProvider const*, std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0xcc3) [0xbe4983]               
 6: (RGWCoroutinesManager::run(DoutPrefixProvider const*, RGWCoroutine*)+0x118) [0xbe6e18]                                                                                
 7: (RGWRemoteDataLog::run_sync(DoutPrefixProvider const*, int)+0x302) [0x130f682]
 8: (RGWDataSyncProcessorThread::process(DoutPrefixProvider const*)+0x4e) [0xfe0a8e]
 9: (RGWRadosThread::Worker::entry()+0x12b) [0xf4772b]
 10: (Thread::entry_wrapper()+0xed) [0x7ffff69f90ed]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7ffff446f1ca]
 12: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.  

the 47 locks of the RGW that went down (47 client.4635) are not re-distributed yet roughly 6 hours after the segfault

 ❯ pwsh -c 'sudo ./bin/rados ls -p us-west.rgw.log | grep datalog | foreach -Parallel { sudo ./bin/rados lock info $_ -p us-west.rgw.log sync_lock } | convertFrom-json | foreach {$_.lockers.name} | group | select Count, Name'

Count Name
----- ----
   38 client.4628
   43 client.4644
❯ ls -lh ./out/radosgw*log
-rw-r--r--. 1 root root   0 Jun 28 08:08 ./out/radosgw.8000.log
-rw-r--r--. 1 root root 15G Jun 29 15:08 ./out/radosgw.8004.log
-rw-r--r--. 1 root root 79G Jun 29 16:05 ./out/radosgw.8005.log
-rw-r--r--. 1 root root 25G Jun 29 09:52 ./out/radosgw.8006.log
                                   ^^
-rw-r--r--. 1 root root 25G Jun 29 16:05 ./out/radosgw.8007.log
                                   ^^

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
@smanjara smanjara force-pushed the wip-data-sync-fairness branch from cd4b9fb to 96b6650 Compare June 30, 2023 00:41
@smanjara
Copy link
Contributor Author

@mkogan1 thanks Mark. could you please pull the changes and give it another try?

@mkogan1 mkogan1 self-requested a review July 5, 2023 11:01
@mkogan1
Copy link
Contributor

mkogan1 commented Jul 5, 2023

@mkogan1 thanks Mark. could you please pull the changes and give it another try?

@smanjara After the pull the segfault did not occur on 60M sync. LGTM

@smanjara
Copy link
Contributor Author

smanjara commented Jul 5, 2023

@smanjara After the pull the segfault did not occur on 60M sync. LGTM

@mkogan1 thank you!

@smanjara
Copy link
Contributor Author

jenkins test make check

@mattbenjamin
Copy link
Contributor

yay

@smanjara
Copy link
Contributor Author

thanks for reviewing @cbodley @mkogan1

@smanjara smanjara merged commit b51bafd into ceph:main Jul 14, 2023
smanjara added a commit to smanjara/ceph that referenced this pull request Sep 7, 2023
rgw: data sync fairness

Resolves rhbz#1740782
Reviewed-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit b51bafd)
smanjara pushed a commit to smanjara/ceph that referenced this pull request Oct 11, 2023
mkogan1 pushed a commit to mkogan1/ceph that referenced this pull request Jan 18, 2024
rgw: data sync fairness

Resolves rhbz#1740782
Reviewed-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit b51bafd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants