Skip to content

tools: ceph-objectstore-tool is able to trim solely pg log dups' entries#46630

Merged
yuriw merged 3 commits intoceph:mainfrom
rzarzynski:wip-pglog-trim-dups
Jun 17, 2022
Merged

tools: ceph-objectstore-tool is able to trim solely pg log dups' entries#46630
yuriw merged 3 commits intoceph:mainfrom
rzarzynski:wip-pglog-trim-dups

Conversation

@rzarzynski
Copy link
Contributor

@rzarzynski rzarzynski commented Jun 11, 2022

Reasons for reverting #45529

  • Brainstorming it has brought questions
    on the OSD's behaviour during an upgrade if there are tons of
    dups in the log. What must be double-checked before bringing
    it back is ensuring we chunk the deletions properly to not
    impose OOMs / stalls in, to exemplify, RocksDB.

  • Although the chunking in off-line dups trimming (via COT) seems
    fine, the ceph-objectstore-tool is a client of trim() of
    PGLog::IndexedLog which means than a partial revert is not
    possible without extensive changes. Moreover, trimming pg log
    is not enough without modifying pg_info_t accordingly which
    the reverted patch lacks.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

This reverts commit 0d253bc.
which is the in-OSD part of the fix for accumulation of `dup`
entries in a PG Log. Brainstorming it has brought questions
on the OSD's behaviour during an upgrade if there are tons of
dups in the log. What must be double-checked before bringing
it back is ensuring we chunk the deletions properly to not
impose OOMs / stalls in, to exemplify, RocksDB.

Fixes: https://tracker.ceph.com/issues/53729
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
This reverts commit 9fb7ec6.

Although the chunking in off-line `dups` trimming (via COT) seems
fine, the `ceph-objectstore-tool` is a client of `trim()` of
`PGLog::IndexedLog` which means than a partial revert is not
possible without extensive changes. Moreover, trimming pg log
is not enough without modifying pg_info_t accordingly which
the reverted patch lacks.

Fixes: https://tracker.ceph.com/issues/53729
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
The main assumption is trimming just dups doesn't need any update
to the corresponding pg_info_t.

Testing:

1. cluster without the autoscaler
```
rzarz@ubulap:~/dev/ceph/build$ MON=1 MGR=1 OSD=3 MGR=1 MDS=0 ../src/vstart.sh -l -b -n -o "osd_pg_log_dups_tracked=3000000" -o "osd_pool_default_pg_autoscale_mode=off"
```

2. 8 PGs in the testing pool.
```
rzarz@ubulap:~/dev/ceph/build$ bin/ceph osd pool create test-pool 8 8
```

3. Provisioning dups with rados bench
```
bin/rados bench -p test-pool 300 write -b 4096  --no-cleanup
...
Total time run:         300.034
Total writes made:      103413
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     1.34637
Stddev Bandwidth:       0.589071
Max bandwidth (MB/sec): 2.4375
Min bandwidth (MB/sec): 0.902344
Average IOPS:           344
Stddev IOPS:            150.802
Max IOPS:               624
Min IOPS:               231
Average Latency(s):     0.0464151
Stddev Latency(s):      0.0183627
Max latency(s):         0.0928424
Min latency(s):         0.0131932
```

4. Killing osd.0
```
rzarz@ubulap:~/dev/ceph/build$ kill 2572129 # pid of osd.0
```

5. Listing PGs on osd.0 and calculating number of pg log's entries and
dups:

```
rzarz@ubulap:~/dev/ceph/build$ bin/ceph-objectstore-tool --data-path dev/osd0 --op list-pgs --pgid 2.c > osd0_pgs.txt
rzarz@ubulap:~/dev/ceph/build$ for pgid in `cat osd0_pgs.txt`; do echo $pgid; bin/ceph-objectstore-tool --data-path dev/osd0 --op log --pgid $pgid | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'; done
2.7
10020
3100
2.6
10100
3000
2.3
10012
2800
2.1
10049
2900
2.2
10057
2700
2.0
10027
2900
2.5
10077
2700
2.4
10072
2900
1.0
97
0
```

6. Trimming dups
```
rzarz@ubulap:~/dev/ceph/build$ CEPH_ARGS="--osd_pg_log_dups_tracked 2500 --osd_pg_log_trim_max=100" bin/ceph-objectstore-tool --data-path dev/osd0 --op trim-pg-log-dups --pgid 2.7
max_dup_entries=2500 max_chunk_size=100
Removing keys dup_0000000020.00000000000000000001 - dup_0000000020.00000000000000000100
Removing keys dup_0000000020.00000000000000000101 - dup_0000000020.00000000000000000200
Removing keys dup_0000000020.00000000000000000201 - dup_0000000020.00000000000000000300
Removing keys dup_0000000020.00000000000000000301 - dup_0000000020.00000000000000000400
Removing keys dup_0000000020.00000000000000000401 - dup_0000000020.00000000000000000500
Removing keys dup_0000000020.00000000000000000501 - dup_0000000020.00000000000000000600
Finished trimming, now compacting...
Finished trimming pg log dups
```

7. Checking number of pg log's entries and dups
```
rzarz@ubulap:~/dev/ceph/build$ for pgid in `cat osd0_pgs.txt`; do echo $pgid; bin/ceph-objectstore-tool --data-path dev/osd0 --op log --pgid $pgid | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'; done
2.7
10020
2500
2.6
10100
3000
2.3
10012
2800
2.1
10049
2900
2.2
10057
2700
2.0
10027
2900
2.5
10077
2700
2.4
10072
2900
1.0
97
0
```

Fixes: https://tracker.ceph.com/issues/53729
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
@ljflores
Copy link
Member

Rados suite results:

https://pulpito.ceph.com/yuriw-2022-06-16_18:33:18-rados-wip-yuri5-testing-2022-06-16-0649-distro-default-smithi/
https://pulpito.ceph.com/yuriw-2022-06-17_13:52:49-rados-wip-yuri5-testing-2022-06-16-0649-distro-default-smithi/

Failures, unrelated:
1. https://tracker.ceph.com/issues/55853
2. https://tracker.ceph.com/issues/52321
3. https://tracker.ceph.com/issues/45721
4. https://tracker.ceph.com/issues/55986
5. https://tracker.ceph.com/issues/44595
6. https://tracker.ceph.com/issues/55854
7. https://tracker.ceph.com/issues/56097 -- opened a new Tracker for this; historically, this has occurred previously on a Pacific test branch, so it does not seem related to this PR.
8. https://tracker.ceph.com/issues/56098 -- opened a new Tracker for this; this is the first sighting that I am aware of, but it does not seem related to the tested PR.

Details:

  1. test_cls_rgw.sh: failures in 'cls_rgw.index_list' and 'cls_rgw.index_list_delimited` - Ceph - RGW
  2. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
  3. CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_rados.TestWatchNotify.test - Ceph - RADOS
  4. cephadm: Test failure: test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) - Ceph - Cephadm
  5. cache tiering: Error: oid 48 copy_from 493 returned error code -2 - Ceph - RADOS
  6. Datetime AssertionError in test_health_history (tasks.mgr.test_insights.TestInsights) - Ceph - Mgr
  7. Timeout on sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.1 flush_pg_stats - Ceph - RADOS
  8. api_tier_pp: failure on LibRadosTwoPoolsPP.ManifestRefRead - Ceph - RADOS

@yuriw yuriw merged commit 3087698 into ceph:main Jun 17, 2022
rzarzynski added a commit to rzarzynski/ceph that referenced this pull request Aug 23, 2022
… dups

This commit aggregates changes for multiple PR:

* Offline: ceph#46630
* Online: ceph#47046

* Offline fix: ceph#46706
* Online fix: ceph#47688

* Offline fix: ceph#46631
* Online fix: ceph#47701

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
rzarzynski added a commit to rzarzynski/ceph that referenced this pull request Aug 23, 2022
… dups

This commit aggregates changes for multiple PR:

* Offline: ceph#46630
* Online: ceph#47046

* Offline fix: ceph#46706
* Online fix: ceph#47688

* Offline fix: ceph#46631
* Online fix: ceph#47701

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
rzarzynski added a commit to rzarzynski/ceph that referenced this pull request Aug 23, 2022
… dups

This commit aggregates changes for multiple PR:

main
----
* Offline: ceph#46630
* Online: ceph#47046

quincy
------
* Offline fix: ceph#46706
* Online fix: ceph#47688

pacific
-------
* Offline fix: ceph#46631
* Online fix: ceph#47701

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
rzarzynski added a commit to rzarzynski/ceph that referenced this pull request Aug 23, 2022
… dups

This commit aggregates changes for multiple PR:

main
----
* Offline: ceph#46630
* Online: ceph#47046

quincy
------
* Offline fix: ceph#46706
* Online fix: ceph#47688

pacific
-------
* Offline fix: ceph#46631
* Online fix: ceph#47701

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants