Bug #68960: ec-pool-snaps-few-objects-overwrites: failed to complete snap trimming before timeout - RADOS - Ceph

Actions

Copy link

Bug #68960

open

ec-pool-snaps-few-objects-overwrites: failed to complete snap trimming before timeout

Added by Matan Breizman over 1 year ago. Updated 7 days ago.

Status:

New

Priority:

Normal

Assignee:

Matan Breizman

Category:

Target version:

% Done:

Source:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Tags (freeform):

Merge Commit:

Fixed In:

Released In:

Upkeep Timestamp:

Description

/a/yuriw-2024-10-29_22:45:20-rados-quincy-release-distro-default-smithi/7972687

2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d3 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d4 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d5 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d6 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d8 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1da in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1dd in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1de in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e5 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e6 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e8 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e9 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ea in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1eb in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ed in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ee in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f0 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f1 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f2 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f4 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f5 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f6 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f8 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1fb in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ff in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.200 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.202 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.205 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.206 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.209 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.20b in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.20d in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.20e in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.210 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.213 in trimming, state: active+clean+snaptrim_wait

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

I can't find any tracker this one would duplicate.
Either it's new (regression on quincy?) or – more probably, I hope – it's an environment issue (starvation?).

Let's observe.

Actions

Copy link

Updated by Laura Flores over 1 year ago

Bump up

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: no reoccurrences so far, observing.

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago · Edited

let's wait.

Actions

Copy link

Updated by Radoslaw Zarzynski about 1 year ago

scrub note: still no reoccurance.

Actions

Copy link

Updated by Konstantin Shalygin about 1 year ago

Backport deleted (~~quincy~~)

Actions

Copy link

Updated by Laura Flores about 2 months ago

description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_pg_log
2-recovery-overrides/{more-async-partial-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-3}
backoff/normal ceph clusters/{fixed-4} crc-failures/default d-balancer/crush-compat
mon_election/connectivity msgr-failures/osd-dispatch-delay msgr/async-v1only objectstore/{bluestore/{alloc$/{stupid}
base mem$/{low} onode-segment$/{512K} write$/{v1/{compr$/{yes$/{zlib}} v1}}}} rados
supported-random-distro$/{ubuntu_latest} thrashers/morepggrow thrashosds-health
workloads/pool-snaps-few-objects}

/a/lflores-2026-01-26_23:21:06-rados-wip-yuri12-testing-2026-01-22-2045-distro-default-trial/19086

Actions

Copy link

Updated by Laura Flores about 2 months ago

Severity seems low, let's monitor for now.

Actions

Copy link

Updated by Connor Fawcett about 2 months ago

/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19858

Actions

Copy link

#10

Updated by Radoslaw Zarzynski about 1 month ago

New hardware? Bump up.

Actions

Copy link

#11

Updated by Radoslaw Zarzynski about 1 month ago

Bump up.

Actions

Copy link

#12

Updated by Laura Flores 28 days ago

Bug scrub note: let's observe for a bit longer

Actions

Copy link

#13

Updated by Sridhar Seshasayee 10 days ago

/a/skanta-2026-03-04_23:53:38-rados-wip-bharath1-testing-2026-03-04-1011-distro-default-trial/85628

Test Description:
rados/thrash/0-size-min-size-overrides-min-size 1-pg-log-overrides/short_pg_log 2-recovery-overrides/{more-async-partial-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-1} backoff/normal ceph clusters/{fixed-4} crc-failures/default d-balancer/crush-compat mon_election/connectivity msgr-failures/osd-dispatch-delay msgr/async-v1only objectstore/{bluestore/{alloc$/{btree} base mem$/{normal-1} onode-segment$/{256K} write$/{v1/{compr$/{yes$/{zstd}} v1}}}} rados supported-random-distro$/{centos_latest} thrashers/morepggrow thrashosds-health workloads/pool-snaps-few-objects}

Actions

Copy link

#14

Updated by Laura Flores 7 days ago

Assignee set to Matan Breizman

@Matan Breizman can you take a look at this? We think it could perhaps be a timing/new machine issue with the trial nodes due to recent occurrences happening roughly when the trial nodes were introduced.

The common workload among the failures is `workloads/pool-snaps-few-objects`.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Tags

Custom queries

Bug #68960

ec-pool-snaps-few-objects-overwrites: failed to complete snap trimming before timeout

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago · Edited

Updated by Radoslaw Zarzynski about 1 year ago

Updated by Konstantin Shalygin about 1 year ago

Updated by Laura Flores about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Connor Fawcett about 2 months ago

Updated by Radoslaw Zarzynski about 1 month ago

Updated by Radoslaw Zarzynski about 1 month ago

Updated by Laura Flores 28 days ago

Updated by Sridhar Seshasayee 10 days ago

Updated by Laura Flores 7 days ago