Project

General

Profile

Actions

Bug #68960

open

ec-pool-snaps-few-objects-overwrites: failed to complete snap trimming before timeout

Added by Matan Breizman over 1 year ago. Updated 7 days ago.

Status:
New
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

/a/yuriw-2024-10-29_22:45:20-rados-quincy-release-distro-default-smithi/7972687

2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d3 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d4 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d5 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.118 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d6 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1d8 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1da in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1dd in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1de in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e5 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e6 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e8 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1e9 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ea in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1eb in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ed in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ee in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f0 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f1 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f2 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.119 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f4 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f5 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f6 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1f8 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1fb in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.1ff in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.200 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.202 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.205 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.206 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.209 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.20b in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.20d in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.20e in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.210 in trimming, state: active+clean+snaptrim_wait
2024-10-30T09:06:20.120 INFO:tasks.ceph.ceph_manager.ceph:pg 3.213 in trimming, state: active+clean+snaptrim_wait
Actions #1

Updated by Radoslaw Zarzynski over 1 year ago

I can't find any tracker this one would duplicate.
Either it's new (regression on quincy?) or – more probably, I hope – it's an environment issue (starvation?).

Let's observe.

Actions #2

Updated by Laura Flores over 1 year ago

Bump up

Actions #3

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: no reoccurrences so far, observing.

Actions #4

Updated by Radoslaw Zarzynski over 1 year ago · Edited

let's wait.

Actions #5

Updated by Radoslaw Zarzynski about 1 year ago

scrub note: still no reoccurance.

Actions #6

Updated by Konstantin Shalygin about 1 year ago

  • Backport deleted (quincy)
Actions #7

Updated by Laura Flores about 2 months ago

description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_pg_log
2-recovery-overrides/{more-async-partial-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-3}
backoff/normal ceph clusters/{fixed-4} crc-failures/default d-balancer/crush-compat
mon_election/connectivity msgr-failures/osd-dispatch-delay msgr/async-v1only objectstore/{bluestore/{alloc$/{stupid}
base mem$/{low} onode-segment$/{512K} write$/{v1/{compr$/{yes$/{zlib}} v1}}}} rados
supported-random-distro$/{ubuntu_latest} thrashers/morepggrow thrashosds-health
workloads/pool-snaps-few-objects}

/a/lflores-2026-01-26_23:21:06-rados-wip-yuri12-testing-2026-01-22-2045-distro-default-trial/19086

Actions #8

Updated by Laura Flores about 2 months ago

Severity seems low, let's monitor for now.

Actions #9

Updated by Connor Fawcett about 2 months ago

/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19858

Actions #10

Updated by Radoslaw Zarzynski about 1 month ago

New hardware? Bump up.

Actions #11

Updated by Radoslaw Zarzynski about 1 month ago

Bump up.

Actions #12

Updated by Laura Flores 28 days ago

Bug scrub note: let's observe for a bit longer

Actions #13

Updated by Sridhar Seshasayee 10 days ago

/a/skanta-2026-03-04_23:53:38-rados-wip-bharath1-testing-2026-03-04-1011-distro-default-trial/85628

Test Description:
rados/thrash/0-size-min-size-overrides-min-size 1-pg-log-overrides/short_pg_log 2-recovery-overrides/{more-async-partial-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-1} backoff/normal ceph clusters/{fixed-4} crc-failures/default d-balancer/crush-compat mon_election/connectivity msgr-failures/osd-dispatch-delay msgr/async-v1only objectstore/{bluestore/{alloc$/{btree} base mem$/{normal-1} onode-segment$/{256K} write$/{v1/{compr$/{yes$/{zstd}} v1}}}} rados supported-random-distro$/{centos_latest} thrashers/morepggrow thrashosds-health workloads/pool-snaps-few-objects}

Actions #14

Updated by Laura Flores 7 days ago

  • Assignee set to Matan Breizman

@Matan Breizman can you take a look at this? We think it could perhaps be a timing/new machine issue with the trial nodes due to recent occurrences happening roughly when the trial nodes were introduced.

The common workload among the failures is `workloads/pool-snaps-few-objects`.

Actions

Also available in: Atom PDF