osd/PGLog.cc: Trim duplicates by number of entries by NitzanMordhai · Pull Request #45529 · ceph/ceph

NitzanMordhai · 2022-03-21T11:56:36Z

PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth.

Fixed: https://tracker.ceph.com/issues/53729
Signed-off-by: Nitzan Mordechai nmordec@redhat.com

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

trociny · 2022-03-22T07:47:43Z

src/osd/PGLog.cc

  }

-  while (!dups.empty()) {
+  while (!dups.empty() && dups.size() > cct->_conf->osd_pg_log_dups_tracked) {


nit: there seem no need for !dups.empty() check now, dups.size() > cct->_conf->osd_pg_log_dups_tracked would be enough.

@trociny agree, done

dvanders · 2022-03-22T08:27:43Z

This looks good. Will this PR also add dup trimming to ceph-objectstore-tool? (to fix OSDs which cannot be started)

neha-ojha · 2022-03-23T00:17:10Z

This looks good. Will this PR also add dup trimming to ceph-objectstore-tool? (to fix OSDs which cannot be started)

@NitzanMordhai I think we should take care of the ceph-objectstore-tool change in this PR

NitzanMordhai · 2022-03-23T05:48:58Z

@NitzanMordhai I think we should take care of the ceph-objectstore-tool change in this PR

Working on it

NitzanMordhai · 2022-03-24T11:51:17Z

@jdurgin \ @neha-ojha please review the tools

trociny

Wouldn't the osd_pg_log_dups_tracked config setting make the ceph-objectore-tool trim log result confusing for users in the cases when dups are not trimmed due to osd_pg_log_dups_tracked is large?

Probably need to output some warning? Or may be reset conf->osd_pg_log_dups_tracked to 0 in ceph-objectstore-tool before colling pg_log.trim?

trociny · 2022-03-24T12:26:44Z

src/tools/ceph_objectstore_tool.cc

+  pg_log.head = info.last_update;
+  pg_log.skip_can_rollback_to_to_head();
+
  ceph_assert(info.last_update.version > max_entries);


may we have this assert after we removed if (info.last_update.version - info.log_tail.version <= max_entries) check? max_entries comes from the config param and may be as big as a user wants. I suppose we need to update if it is more than info.last_update.version.

@trociny We may have duplicate entries that we need to trim, the check before I removed it skipped all the process if we knew in advance that we don't have nothing to trim, but in case of duplicate entries, we don't know it until we reconstruct PGLog

I understand why you removed the check. My point is that after its removal, the ceph_assert(info.last_update.version > max_entries) may fail. And we still need the condition info.last_update.version > max_entries to be valid (i.e. we can't just remove the assert), because we use info.last_update.version - max_entries below. So my idea is that we should set max_entries not just to conf->osd_max_pg_log_entries but to something like min(conf->osd_max_pg_log_entries, info.last_update.version - 1) (I have not checked though if last_update.version may be 0 though).

@trociny i done it differently, please review

trociny · 2022-03-24T12:35:27Z

src/tools/ceph_objectstore_tool.cc

+        pg_log_dup_t dup;
+	try {
+	  dup.decode(bp);
+	  pg_log.dups.push_back(dup);


Don't we want to check if (pg_log.dups.size() >= trim_at_once)? Someone might want to do this on a cluster with millions of dups created when running older version.

@trociny with the new change (the other commit in that PR for PGLog trim) we will trim the duplicate entries by the osd_pg_log_dups_tracked, I don't think trim_at_once (osd_pg_log_trim_max) is for duplicate entries, it is only for pg log entry.

But if we don't limit the number of duplicate entries trimmed in one transaction in any way (as we do this for log entries), I expect we may have problems if their number is large?

We do, by osd_pg_log_dups_tracked

osd_pg_log_dups_tracked is the analog of osd_max_pg_log_entries (for log entries), i.e. it says how many entries we want to keep. While osd_pg_log_trim_max says how many entries we want to trim in one operation, to have a limit and avoid a situation when the trim operation is too heavy.

@NitzanMordhai consider that your implementation here will only ever be needed for PGs with millions of dups, when the OSD cannot boot normally due to OOM.

One user showed an OSD with tens of PGs having 1.6 million dups each, and at least one PG with 3.2M dups.

Indeed the implementation here needs to focus on that scenario... be low on memory, split the bluestore operations.

@dvanders @trociny got it, i think i need to change the test for osd_pg_log_trim_max and move it, and compare it to pg_log.log.size() + pg_log.dups.size()

trociny · 2022-03-25T08:03:36Z

src/tools/ceph_objectstore_tool.cc

-
-  uint64_t max_entries = g_ceph_context->_conf->osd_max_pg_log_entries;
+  uint64_t max_entries = std::min<uint64_t>(g_ceph_context->_conf->osd_max_pg_log_entrie,
+					    g_ceph_context->_conf->osd_pg_log_trim_max);


I don't understand this. osd_max_pg_log_entrie is how many entries we went to keep and osd_pg_log_trim_max is how many entries we want to trim in one iteration in while (!done) loop below. How can you mix this in one variable?

trociny · 2022-03-25T08:25:52Z

src/tools/ceph_objectstore_tool.cc

+
+        ObjectStore::Transaction t;
+        if (!trimmed.empty()) {
+	  cout << "Removing keys " << *trimmed.begin() << " - " << *trimmed.rbegin() << std::endl;


nit: if we are changin this, I would prefer messages like this to go to cerr, not cout. Because they are more like log messages. The same is for other messages below.

trociny · 2022-03-25T08:51:05Z

src/tools/ceph_objectstore_tool.cc

-	break;
+      // We have enough pg logs entries to trim
+      if ((pg_log.log.size() + pg_log.dups.size()) >= trim_at_once) {
+        done = true;


I don't think we want to set done here. I think we want pg_log.trim to be called multiple times, being reset after processing a chunk, unless all the keys are processed.

Alternatively, we could not break here, collecting instead all potential entries to trim into pg_log. Then call pg_log.trim only once, obtaining the full lists of the trimmed keys, and then call t.omap_rmkeys multiple times in trim_at_once chunks. The problem is though that in this scenario pg_log.trim may build very large trimmed and trimmed_dups lists and this seems that we may not allow.

trociny · 2022-03-25T13:15:01Z

Just FYI, on one cluster we have a pg with 31 million of dups:

root@one1-ceph2:~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-20 --op log --pgid 2.88 > pg_log.2.88.json
root@one1-ceph2:~# jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)' < pg_log.2.88.json
3852
31282035

I think we will want to try ceph-objectstore-tool patched with this PR (when it is ready) to trim.

trociny · 2022-03-26T08:27:09Z

Another question to rise. What will happen when a user that has a problem like us, i.e. 30 million of dups in a pg, but is not aware of it, upgrades to the version with the fixed pg_log trim, and it starts trimming? Am I right understanding that with the current implementation the trim will build the full set of 30 million dups and will try to remove it in one transaction?

And it looks like if one is unlucky enough to have debug osd level set to 6 or higher (even if it is just for memory buffer) it will try to print the trimmed_dups set in PGLog::write_log_and_missing.

NitzanMordhai · 2022-03-28T11:37:46Z

ok, I misunderstood the osd_pg_log_trim_max, now I got it correctly.
We don't know the number of dups, we will need to keep the last N dups entries (osd_pg_log_dups_tracked), to do it in 1 pass of the reading file, I added a deque that will fill until it reaches N entries, if a new entry found we will push it and the pop the oldest one (which will be trimmed later).

trociny

@NitzanMordhai I have some questions for your ceph-objectstore-tool trim implementation (see the comments), but my main concern is still how the osd pg_log trim will behave after the upgrade when it needs to trim millions dups. And if it is a real problem (as I currently think) then it should be resolved first, and then the ceph-objectstore-tool trim implementation might be updated accordingly.

trociny · 2022-03-28T13:53:33Z

src/tools/ceph_objectstore_tool.cc

-
-  uint64_t max_entries = g_ceph_context->_conf->osd_max_pg_log_entries;
+  uint64_t max_entries = std::min<uint64_t>(g_ceph_context->_conf->osd_max_pg_log_entries,
+					    info.last_update.version - 1);


I hope it is safe to assume that info.last_update.version will be always > 0 here...

trociny · 2022-03-28T14:25:55Z

src/tools/ceph_objectstore_tool.cc

+      pg_log.trim(g_ceph_context, trim_to, &trimmed, &trimmed_dups, &write_from_dups);
+      // Is there any trimming to do?
+      if (!trimmed.empty() || !trimmed_dups.empty()) {
+        new_tail = pg_log.log.back().version;


Isn't possible that pg_log.log may be empty here?

yes, we can use pg_log.tail for that since trim will set it

trociny · 2022-03-28T14:27:52Z

src/tools/ceph_objectstore_tool.cc

+	  e.decode_with_checksum(bp);
+	  if (e.version > trim_to) {
+	    continue;
+	  }


That looks weird. It seems the idea was to use pg_log.trim in order not to do the trimming manually, but we still need this.

i thought we can stop collecting entries, but yes, we can let trim function do it for us

trociny · 2022-03-28T14:41:19Z

src/tools/ceph_objectstore_tool.cc

  if (!dry_run && new_tail !=  eversion_t()) {
-    info.log_tail = new_tail;
+    if (latest_dups.size() > 0) {
+      info.log_tail = latest_dups.back().version;


This looks suspicious to me. Isn't possible a situation when latest_dups.size() > 0 but info.log_tail should be set to new_tail?

need to set it always to new_tail. nothing to do with dups. fixed

NitzanMordhai · 2022-03-31T16:56:27Z

jenkins render docs

jdurgin · 2022-04-06T05:13:07Z

src/tools/ceph_objectstore_tool.cc

+
  ceph_assert(info.last_update.version > max_entries);
-  version_t trim_to = info.last_update.version - max_entries;
+  eversion_t trim_to(info.last_update.epoch, info.last_update.version - max_entries);


using eversion_t like this will trim many more entries since it's sorted by epoch, then version. don't think it matters in practice in this case since this is an emergency recovery tool though.

trociny · 2022-04-06T07:26:08Z

@jdurgin So do you think my concern [1, 2] about how pg log trim would behave after upgrading an osd with many millions of dups is not valid?

Just as information, we have been running ceph-objectstore-tool built with this PR to trim a pg with 32 millions dups. After it being run for almost a day, we interrupted it to check the progress. 15 millions still remained. We restarted it yesterday and it is still running. Setting --osd_pg_log_trim_max=100000 instead of the default 10000 could probably speed it up, but I just noticed that I had made a mistake when restarting the tool, missed one 0 in that setting, and it is still running with the default. So this is not checked yet. I suppose we will have an opportunity to check on other osds.

Unfortunately, we can't check how an upgraded osd with so many dups would behave on this cluster, as it is a production cluster.

[1] #45529 (comment)
[2] #45529 (review)

dvanders · 2022-04-08T06:53:57Z

@jdurgin So do you think my concern [1, 2] about how pg log trim would behave after upgrading an osd with many millions of dups is not valid?

I think any OSD that ends up hitting this will quickly go OOM, and need offline recovery, so it doesn't seem necessary to handle in the online trimming.

Are you sure? Some users have OSDs consuming a few GBs of ram, still running. If they upgrade, all OSDs hosting a PG will trim at the same time. If ::trim takes a long time, the OSDs and PG will be marked down.
We might need a trim_max in ::trim or to trim the PG log during load_pgs before boot.
Yeah, Mykola echod the same.

dvanders · 2022-04-08T10:07:09Z

src/tools/ceph_objectstore_tool.cc

+					    );
  if (info.last_update.version - info.log_tail.version <= max_entries) {
    cerr << "Log not larger than osd_max_pg_log_entries " << max_entries << std::endl;
    return 0;


On a PG with 18M entries, we get

Log bounds are: (1314488'70894144,1314494'70896562] Log not larger than osd_max_pg_log_entries 10000 Finished trimming pg log

Can we also check len(dups) before aborting the trim ?

@NitzanMordhai ^

dvanders · 2022-04-08T10:28:06Z

@NitzanMordhai We are testing using an octopus build from @trociny

Here is the output. It took ~10 minutes to trim 18M entries :-)

root@ceph-r6-5143:~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-742/ --op log --pgid 3.10 | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'
2418
18131032

root@ceph-r6-5143:~# CEPH_ARGS="--osd_pg_log_trim_max=500000 --osd_max_pg_log_entries=2000 " LD_LIBRARY_PATH=. ./ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-742 --op trim-pg-log --pgid 3.10
Log bounds are: (1314488'70894144,1314494'70896562]
Removing keys 0001314488.00000000000070894145 - 0001314493.00000000000070896510
Removing Dups dup_0001073877.00000000000092039960 - dup_0001097203.00000000000004724151
Removing Dups dup_0001097203.00000000000004724152 - dup_0001099439.00000000000005221099
Removing Dups dup_0001099441.00000000000005221100 - dup_0001101713.00000000000005718047
Removing Dups dup_0001101713.00000000000005718048 - dup_0001104044.00000000000006214995
Removing Dups dup_0001104044.00000000000006214996 - dup_0001104427.00000000000006711943
Removing Dups dup_0001104427.00000000000006711944 - dup_0001104427.00000000000007208891
Removing Dups dup_0001104427.00000000000007208892 - dup_0001104822.00000000000007705839
Removing Dups dup_0001104822.00000000000007705840 - dup_0001109927.00000000000008202787
Removing Dups dup_0001109927.00000000000008202788 - dup_0001115297.00000000000008699735
Removing Dups dup_0001115297.00000000000008699736 - dup_0001116223.00000000000009196683
Removing Dups dup_0001116223.00000000000009196684 - dup_0001116887.00000000000009693631
Removing Dups dup_0001116887.00000000000009693632 - dup_0001117277.00000000000010190579
Removing Dups dup_0001117277.00000000000010190580 - dup_0001118505.00000000000010687527
Removing Dups dup_0001118505.00000000000010687528 - dup_0001123919.00000000000011184475
Removing Dups dup_0001123919.00000000000011184476 - dup_0001126117.00000000000011681423
Removing Dups dup_0001126117.00000000000011681424 - dup_0001130231.00000000000012178371
Removing Dups dup_0001130231.00000000000012178372 - dup_0001131560.00000000000012675319
Removing Dups dup_0001131560.00000000000012675320 - dup_0001136888.00000000000013172267
Removing Dups dup_0001136888.00000000000013172268 - dup_0001142999.00000000000013669215
Removing Dups dup_0001142999.00000000000013669216 - dup_0001146940.00000000000014166163
Removing Dups dup_0001146940.00000000000014166164 - dup_0001149293.00000000000014663111
Removing Dups dup_0001149293.00000000000014663112 - dup_0001149704.00000000000015160059
Removing Dups dup_0001149704.00000000000015160060 - dup_0001150128.00000000000015657007
Removing Dups dup_0001150128.00000000000015657008 - dup_0001150952.00000000000016153955
Removing Dups dup_0001150952.00000000000016153956 - dup_0001151361.00000000000016650903
Removing Dups dup_0001151361.00000000000016650904 - dup_0001160712.00000000000017147851
Removing Dups dup_0001160712.00000000000017147852 - dup_0001165684.00000000000017644799
Removing Dups dup_0001165684.00000000000017644800 - dup_0001169427.00000000000018141747
Removing Dups dup_0001169427.00000000000018141748 - dup_0001171615.00000000000018638695
Removing Dups dup_0001171615.00000000000018638696 - dup_0001173129.00000000000019135643
Removing Dups dup_0001173129.00000000000019135644 - dup_0001179813.00000000000021718244
Removing Dups dup_0001179813.00000000000021718245 - dup_0001186083.00000000000022400383
Removing Dups dup_0001186083.00000000000022400384 - dup_0001186495.00000000000022897331
Removing Dups dup_0001186495.00000000000022897332 - dup_0001187291.00000000000023394279
Removing Dups dup_0001187291.00000000000023394280 - dup_0001190921.00000000000023929837
Removing Dups dup_0001190921.00000000000023929838 - dup_0001314034.00000000000070653874
Removing Dups dup_0001314034.00000000000070653875 - dup_0001314487.00000000000070891144
Finished trimming, now compacting...
Finished trimming pg log
root@ceph-r6-5143:~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-742/ --op log --pgid 3.10 | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'
52
3000

This osd would OOM with > 350GB ram before trimming. Now it starts. (~~But we are still recovering other replicas so I don't yet know if this really fixes everything~~). HEALTH_OK!
@cervigni

jdurgin · 2022-04-08T14:17:54Z

I think any OSD that ends up hitting this will quickly go OOM, and need offline recovery, so it doesn't seem necessary to handle in the online trimming.

Not necessary.

Just to make clear. We had two clusters with this issue. On the first cluster the problem (several millions dups for many PGs) was detected when we changed pg_num, and the osds started to go OOM on pg split. In this scenario yes, the way to proceed is either to remove the problem pg or fix it with ceph-objectstore-tool. We resolved it with removing because we didn't have the patched ceph-objectstore-tool at that time yet.

On another cluster the problem (several tens of millions of dups per PG) was detected only because we started to investigate the cluster strange behavior: long (up to 10 minutes) pg load on start and instability due to osds higher memory usage when an osd is restarted. In this case, after understanding the root cause, we just waited for the patch for ceph-objectstore-tool to be ready and started to fix the pgs by removing the dups. BTW, the largest dups number we saw so far was 37M dups and it took 2 hours for ceph-objectstore-tool (run with --osd_pg_log_trim_max=500000) to remove the dups.

Now imagine if we had not done this investigation and just upgraded the cluster with "fixed" osds. I suppose we would have had problems with our cluster after the upgrade. And I can imagine there are many clusters in the wild with this problem the users are not aware of. What will happen to them after the upgrade?

I see, in that case a trim max in online trimming is needed. Thanks for explaining!

Trimming everything at load_pgs time would potentially delay startup too long and cause availability problems, so I think the trim max approach is better. It does rely on I/O to trigger it though.

NitzanMordhai · 2022-04-13T17:34:08Z

Trimming everything at load_pgs time would potentially delay startup too long and cause availability problems, so I think the trim max approach is better. It does rely on I/O to trigger it though.

@jdurgin I had a discussion with @neha-ojha today regarding that, should we do that in another PR, or do we need it in that PR? there are a few users that waiting for it and if we want to include it in the next release we should have it out.

cervigni · 2022-04-19T00:52:20Z

@NitzanMordhai as one of those users with large clusters (and already being greatly impacted by the issue #45529 (comment) ), I would like if possible for it to be released as soon as possible with the next patch.

gadLinux · 2022-04-19T19:12:57Z

We are waiting for this to be in master like water in the desert... Thank you for your great job!ç
I'm just building the branch and giving it a try...

NitzanMordhai · 2022-04-20T12:36:29Z

@jdurgin I had a discussion with @neha-ojha today regarding that, should we do that in another PR, or do we need it in that PR? there are a few users that waiting for it and if we want to include it in the next release we should have it out.

Ok, made small change in that PR to support OSDs that will OOM during start, we won't load all the dups entries, only the latest ones up to osd_pg_log_dups_tracked

jdurgin

Only loading the tracked dups off disk sounds good for the online solution. It means a small amount of space waste, but that should be negligible and can be worked around with the ceph-objectstore-tool if desired.

trociny · 2022-04-21T07:08:47Z

src/osd/PGLog.h

 	  }
 	  dups.push_back(dup);
+	  if (dups.size() >= g_ceph_context->_conf->osd_pg_log_dups_tracked) {
+	    dups.pop_front();


It is a bit unfortunate that on every start we will need to waste time (it was up to 10 minutes in our case) reading all dups unless the user runs ceph-objectstore-tool to trim the log manually. And the user will not have any clue what is going on.

May be we should add a counter here and log at least a warning message with the recommendation to trim the log manually, when the counter is large enough?

@trociny if the osd sopped again, we will save the new duplicate log, so we will load the trimmed list next time.

So all the old dup keys will be removed from rocksdb on the save? I was not aware of it. Then I think there is no problem.

Adding duplicate entries trimming to trim-pg-log opertion, we will use the exist PGLog trim function to find out the set of entries\dup entries that we suppose to trim. to use it we need to build the PGLog from disk. Fixed: https://tracker.ceph.com/issues/53729 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>

neha-ojha · 2022-04-25T16:31:33Z

src/osd/PGLog.h

 	    ceph_assert(dups.back().version < dup.version);
 	  }
 	  dups.push_back(dup);
+	  if (dups.size() >= g_conf()->osd_pg_log_dups_tracked) {


make check is failing with https://jenkins.ceph.com/job/ceph-pull-requests/94203/consoleFull#-1643918307e4bfde06-44c1-4b3c-8379-d9ee175fb257

PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Fixed: https://tracker.ceph.com/issues/53729 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>

ljflores · 2022-05-10T16:04:53Z

jenkins test api

ljflores · 2022-05-10T16:05:01Z

jenkins test windows

NitzanMordhai · 2022-05-11T05:43:15Z

https://shaman.ceph.com/builds/ceph/wip-yuri5-testing-2022-04-28-1007/847220033f596a97c5c107497f89b7e0027da12a/
http://pulpito.front.sepia.ceph.com/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi
http://pulpito.front.sepia.ceph.com/yuriw-2022-05-04_21:19:19-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi

A few rados tests failed

Failures, unrelated:
opened new trackers:
6813955 - Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown - RADOS - Ceph
6813921 - Bug #55607: tests CmpOmap.cmp_vals_u64_empty_default\CmpOmap.cmp_rm_keys_u64_empty failed - RADOS - Ceph

updated trackers:
6813824 - Bug #49591: no active mgr (MGR_DOWN)" in cluster log - RADOS - Ceph
6813940 - Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_rados.TestWatchNotify.test - RADOS - Ceph
6814082- Bug #48440: log [ERR] : scrub mismatch - RADOS - Ceph
6813848 - Bug #55531: octopus: No remote osd logs captured in dead jobs - Ceph - Ceph

Details:

log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.y: unknown operation
tests CmpOmap.cmp_vals_u64_empty_default\CmpOmap.cmp_rm_keys_u64_empty failed
no active mgr (MGR_DOWN) - It seems that the monitor had trouble communicating with the manager: 2022-04-29T17:13:31.437489+0000 mon.a (mon.0) 87 : cluster [ERR] Health check failed: no active mgr (MGR_DOWN)
test_rados.TestWatchNotify.test - known and unfixed issue with API test
scrub mismatch after slow op.
this looks like a Teuthology bug

quincy: revert backport of #45529 Reviewed-by: Laura Flores <lflores@redhat.com>

github-actions bot added core tests labels Mar 21, 2022

NitzanMordhai requested review from jdurgin and neha-ojha March 21, 2022 11:57

neha-ojha requested a review from markhpc March 21, 2022 23:29

trociny reviewed Mar 22, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from 6416d56 to 9693c98 Compare March 22, 2022 08:42

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from e2d2d22 to 9efb207 Compare March 24, 2022 11:56

trociny reviewed Mar 24, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from 9efb207 to d751600 Compare March 24, 2022 19:15

NitzanMordhai requested review from dvanders and trociny March 24, 2022 19:16

dvanders requested a review from ifed01 March 24, 2022 19:22

trociny reviewed Mar 25, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from d751600 to bf283f0 Compare March 28, 2022 11:37

trociny reviewed Mar 28, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch 2 times, most recently from 7628ca8 to 2050279 Compare March 29, 2022 08:00

NitzanMordhai requested a review from trociny March 31, 2022 07:13

jdurgin reviewed Apr 6, 2022

View reviewed changes

jdurgin approved these changes Apr 6, 2022

View reviewed changes

dvanders reviewed Apr 8, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from 2050279 to 8fc4a63 Compare April 20, 2022 12:29

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from 8fc4a63 to b40738f Compare April 20, 2022 13:07

jdurgin approved these changes Apr 20, 2022

View reviewed changes

jdurgin added the needs-qa label Apr 20, 2022

trociny approved these changes Apr 21, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from b40738f to d8a4c5e Compare April 25, 2022 10:25

yuriw added the wip-yuri4-testing label Apr 25, 2022

neha-ojha reviewed Apr 25, 2022

View reviewed changes

yuriw removed the wip-yuri4-testing label Apr 25, 2022

osd/PGLog.cc: Trim duplicates by number of entries

0d253bc

PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Fixed: https://tracker.ceph.com/issues/53729 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>

NitzanMordhai force-pushed the wip-nitzan-pglog-dups-not-trimmed branch from d8a4c5e to 0d253bc Compare April 26, 2022 05:57

yuriw added the wip-yuri5-testing label Apr 26, 2022

yuriw merged commit 6322d1a into ceph:master May 11, 2022

This was referenced May 12, 2022

quincy: osd/PGLog.cc: Trim duplicates by number of entries #46251

Merged

pacific: osd/PGLog.cc: Trim duplicates by number of entries #46252

Merged

octopus: osd/PGLog.cc: Trim duplicates by number of entries #46253

Merged

yuriw added a commit that referenced this pull request Jun 10, 2022

Merge pull request #46605 from rzarzynski/wip-55981-quincy

a60ba40

quincy: revert backport of #45529 Reviewed-by: Laura Flores <lflores@redhat.com>

neha-ojha mentioned this pull request Jun 15, 2022

tools: ceph-objectstore-tool is able to trim solely pg log dups' entries #46630

Merged

14 tasks

Conversation

NitzanMordhai commented Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dvanders commented Mar 22, 2022

Uh oh!

neha-ojha commented Mar 23, 2022

Uh oh!

NitzanMordhai commented Mar 23, 2022

Uh oh!

NitzanMordhai commented Mar 24, 2022

Uh oh!

trociny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trociny Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trociny Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dvanders Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trociny commented Mar 25, 2022

Uh oh!

trociny commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NitzanMordhai commented Mar 28, 2022

Uh oh!

trociny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NitzanMordhai Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NitzanMordhai commented Mar 21, 2022 •

edited

Loading

trociny Mar 24, 2022 •

edited

Loading

trociny Mar 24, 2022 •

edited

Loading

dvanders Mar 24, 2022 •

edited

Loading

trociny commented Mar 26, 2022 •

edited

Loading

NitzanMordhai Mar 29, 2022 •

edited

Loading

trociny commented Apr 6, 2022 •

edited

Loading

dvanders commented Apr 8, 2022 •

edited

Loading

dvanders commented Apr 8, 2022 •

edited

Loading

gadLinux commented Apr 19, 2022 •

edited

Loading