tools: ceph-objectstore-tool is able to trim solely pg log dups' entries#46630
tools: ceph-objectstore-tool is able to trim solely pg log dups' entries#46630
Conversation
This reverts commit 0d253bc. which is the in-OSD part of the fix for accumulation of `dup` entries in a PG Log. Brainstorming it has brought questions on the OSD's behaviour during an upgrade if there are tons of dups in the log. What must be double-checked before bringing it back is ensuring we chunk the deletions properly to not impose OOMs / stalls in, to exemplify, RocksDB. Fixes: https://tracker.ceph.com/issues/53729 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
This reverts commit 9fb7ec6. Although the chunking in off-line `dups` trimming (via COT) seems fine, the `ceph-objectstore-tool` is a client of `trim()` of `PGLog::IndexedLog` which means than a partial revert is not possible without extensive changes. Moreover, trimming pg log is not enough without modifying pg_info_t accordingly which the reverted patch lacks. Fixes: https://tracker.ceph.com/issues/53729 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
The main assumption is trimming just dups doesn't need any update to the corresponding pg_info_t. Testing: 1. cluster without the autoscaler ``` rzarz@ubulap:~/dev/ceph/build$ MON=1 MGR=1 OSD=3 MGR=1 MDS=0 ../src/vstart.sh -l -b -n -o "osd_pg_log_dups_tracked=3000000" -o "osd_pool_default_pg_autoscale_mode=off" ``` 2. 8 PGs in the testing pool. ``` rzarz@ubulap:~/dev/ceph/build$ bin/ceph osd pool create test-pool 8 8 ``` 3. Provisioning dups with rados bench ``` bin/rados bench -p test-pool 300 write -b 4096 --no-cleanup ... Total time run: 300.034 Total writes made: 103413 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 1.34637 Stddev Bandwidth: 0.589071 Max bandwidth (MB/sec): 2.4375 Min bandwidth (MB/sec): 0.902344 Average IOPS: 344 Stddev IOPS: 150.802 Max IOPS: 624 Min IOPS: 231 Average Latency(s): 0.0464151 Stddev Latency(s): 0.0183627 Max latency(s): 0.0928424 Min latency(s): 0.0131932 ``` 4. Killing osd.0 ``` rzarz@ubulap:~/dev/ceph/build$ kill 2572129 # pid of osd.0 ``` 5. Listing PGs on osd.0 and calculating number of pg log's entries and dups: ``` rzarz@ubulap:~/dev/ceph/build$ bin/ceph-objectstore-tool --data-path dev/osd0 --op list-pgs --pgid 2.c > osd0_pgs.txt rzarz@ubulap:~/dev/ceph/build$ for pgid in `cat osd0_pgs.txt`; do echo $pgid; bin/ceph-objectstore-tool --data-path dev/osd0 --op log --pgid $pgid | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'; done 2.7 10020 3100 2.6 10100 3000 2.3 10012 2800 2.1 10049 2900 2.2 10057 2700 2.0 10027 2900 2.5 10077 2700 2.4 10072 2900 1.0 97 0 ``` 6. Trimming dups ``` rzarz@ubulap:~/dev/ceph/build$ CEPH_ARGS="--osd_pg_log_dups_tracked 2500 --osd_pg_log_trim_max=100" bin/ceph-objectstore-tool --data-path dev/osd0 --op trim-pg-log-dups --pgid 2.7 max_dup_entries=2500 max_chunk_size=100 Removing keys dup_0000000020.00000000000000000001 - dup_0000000020.00000000000000000100 Removing keys dup_0000000020.00000000000000000101 - dup_0000000020.00000000000000000200 Removing keys dup_0000000020.00000000000000000201 - dup_0000000020.00000000000000000300 Removing keys dup_0000000020.00000000000000000301 - dup_0000000020.00000000000000000400 Removing keys dup_0000000020.00000000000000000401 - dup_0000000020.00000000000000000500 Removing keys dup_0000000020.00000000000000000501 - dup_0000000020.00000000000000000600 Finished trimming, now compacting... Finished trimming pg log dups ``` 7. Checking number of pg log's entries and dups ``` rzarz@ubulap:~/dev/ceph/build$ for pgid in `cat osd0_pgs.txt`; do echo $pgid; bin/ceph-objectstore-tool --data-path dev/osd0 --op log --pgid $pgid | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'; done 2.7 10020 2500 2.6 10100 3000 2.3 10012 2800 2.1 10049 2900 2.2 10057 2700 2.0 10027 2900 2.5 10077 2700 2.4 10072 2900 1.0 97 0 ``` Fixes: https://tracker.ceph.com/issues/53729 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
|
Rados suite results: https://pulpito.ceph.com/yuriw-2022-06-16_18:33:18-rados-wip-yuri5-testing-2022-06-16-0649-distro-default-smithi/ Failures, unrelated: Details:
|
… dups This commit aggregates changes for multiple PR: * Offline: ceph#46630 * Online: ceph#47046 * Offline fix: ceph#46706 * Online fix: ceph#47688 * Offline fix: ceph#46631 * Online fix: ceph#47701 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
… dups This commit aggregates changes for multiple PR: * Offline: ceph#46630 * Online: ceph#47046 * Offline fix: ceph#46706 * Online fix: ceph#47688 * Offline fix: ceph#46631 * Online fix: ceph#47701 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
… dups This commit aggregates changes for multiple PR: main ---- * Offline: ceph#46630 * Online: ceph#47046 quincy ------ * Offline fix: ceph#46706 * Online fix: ceph#47688 pacific ------- * Offline fix: ceph#46631 * Online fix: ceph#47701 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
… dups This commit aggregates changes for multiple PR: main ---- * Offline: ceph#46630 * Online: ceph#47046 quincy ------ * Offline fix: ceph#46706 * Online fix: ceph#47688 pacific ------- * Offline fix: ceph#46631 * Online fix: ceph#47701 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reasons for reverting #45529
Brainstorming it has brought questions
on the OSD's behaviour during an upgrade if there are tons of
dups in the log. What must be double-checked before bringing
it back is ensuring we chunk the deletions properly to not
impose OOMs / stalls in, to exemplify, RocksDB.
Although the chunking in off-line
dupstrimming (via COT) seemsfine, the
ceph-objectstore-toolis a client oftrim()ofPGLog::IndexedLogwhich means than a partial revert is notpossible without extensive changes. Moreover, trimming pg log
is not enough without modifying pg_info_t accordingly which
the reverted patch lacks.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows