Project

General

Profile

Actions

Bug #69438

closed

thrash/short_pg_log: bluestore op got ENOENT - setattr

Added by Matan Breizman about 1 year ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

This failure was noticed when testing: https://github.com/ceph/ceph-ci/commits/wip-matanb-crimson-testing-january-5/
short_pg_log was added to the suite not recently so it might be related to backfill

https://pulpito.ceph.com/matan-2025-01-06_10:21:37-crimson-rados-wip-matanb-crimson-testing-january-5-distro-crimson-smithi/8064628

ceph-osd-1

ERROR 2025-01-06 14:01:06,344 none - /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.3.0-6702-g438c8a71/rpm/el9/BUILD/ceph-19.3.0-6702-g438c8a71/src/os/bluestore/BlueStore.cc:15878 : In function 'void BlueStore::_txc_add_transaction(TransContext*, ObjectStore::Transaction*)', abort(%s)
unexpected error
Aborting.
Backtrace:
DEBUG 2025-01-06 14:01:06,344 [shard 0:main] osd - maybe_share_osdmap updating peer 0 session's projected_epochfrom 270 to ping map epoch of 275
DEBUG 2025-01-06 14:01:06,344 [shard 0:main] osd - maybe_share_osdmap peer 0 projected_epoch 275 is already later than our osdmap epoch of 275
DEBUG 2025-01-06 14:01:06,345 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) _txc_create osr 0x514000049a40 = 0x5180000dac80
DEBUG 2025-01-06 14:01:06,345 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1).collection(3.a_head 0x5160002bd480) get_onode oid #3:5b0ca7b2:::smithi02632879-62:68# key 0x7F80000000000000035B0CA7B2'!smithi02632879-62!='0x0000000000000068FFFFFFFFFFFFFFFF6F
DEBUG 2025-01-06 14:01:06,345 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1).collection(3.a_head 0x5160002bd480)  r -2 v.len 0
DEBUG 2025-01-06 14:01:06,345 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction op 14 got ENOENT on #3:5b0ca7b2:::smithi02632879-62:68#
ERROR 2025-01-06 14:01:06,345 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction error (2) No such file or directory not handled on operation 14 (op 0, counting from 0)
ERROR 2025-01-06 14:01:06,345 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-1) unexpected error code
WARN  2025-01-06 14:01:06,345 [shard 0:main] bluestore - _dump_transaction transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "setattr",
            "collection": "3.a_head",
            "oid": "#3:5b0ca7b2:::smithi02632879-62:68#",
            "name": "_",
            "length": 232
        },
        {
            "op_num": 1,
            "op_name": "setattrs",
            "collection": "3.a_head",
            "oid": "#3:5b0ca7b2:::smithi02632879-62:head#",
            "attr_lens": {
                "_": 232,
                "snapset": 195
            }
        },
        {
            "op_num": 2,
            "op_name": "omap_setkeys",
            "collection": "3.a_head",
            "oid": "#3:50000000:.internal_pg_local::snapmapper:head#",
            "attr_lens": {
                "OBJ_0000000000000003.AD035ED4.68.smithi02632879-62..": 178
            }
        },
        {
            "op_num": 3,
            "op_name": "omap_rmkeys",
            "collection": "3.a_head",
            "oid": "#3:50000000:.internal_pg_local::snapmapper:head#",
            "attrs": [
                "SNA_3_0000000000000055_0000000000000003.AD035ED4.68.smithi02632879-62..",
                "SNA_3_0000000000000057_0000000000000003.AD035ED4.68.smithi02632879-62..",
                "SNA_3_0000000000000060_0000000000000003.AD035ED4.68.smithi02632879-62..",
                "SNA_3_0000000000000068_0000000000000003.AD035ED4.68.smithi02632879-62.." 
            ]
        },
        {
            "op_num": 4,
            "op_name": "omap_setkeys",
            "collection": "3.a_head",
            "oid": "#3:50000000::::head#",
            "attr_lens": {
                "0000000275.00000000000000000546": 313,
                "0000000275.00000000000000000547": 197,
                "_fastinfo": 186
            }
        }
    ]
}

 3# ceph::__ceph_abortf(char const*, int, char const*, char const*, ...) in ceph-osd
 4# BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*) in ceph-osd
 5# BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*) in ceph-osd
 6# ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*) in ceph-osd


Related issues 2 (0 open2 closed)

Related to crimson - Bug #69412: crimson: sparse read omitted non-zero data ...ResolvedSamuel Just

Actions
Related to crimson - Bug #69702: crimson: take obc lock on replica during push commitResolvedSamuel Just

Actions
Actions #1

Updated by Matan Breizman about 1 year ago

  • Description updated (diff)
Actions #2

Updated by Matan Breizman about 1 year ago

  • Related to Bug #69412: crimson: sparse read omitted non-zero data ... added
Actions #3

Updated by Matan Breizman about 1 year ago ยท Edited

short_pg_log: https://pulpito.ceph.com/matan-2025-01-07_14:57:06-crimson-rados-main-distro-crimson-smithi/8066354/
main 4b0d458ab0c155c0d167417730d2a3d59b161d98
osd.1:

DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - maybe_unpin 0x525000025900 #3:a0000000::::head# touched
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - maybe_unpin 0x525000025900 #3:a0000000:.internal_pg_local::snapmapper:head# touched
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - maybe_unpin 0x525000025900 #3:ac747253:::smithi04533857-324:head# touched
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) _kv_finalize_thread sleep
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) queue_transactions ch 0x516000267c80 3.5_head
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) _txc_create osr 0x514000045440 = 0x518000176880
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1).collection(3.5_head 0x516000267c80) get_onode oid #3:ac135736:::smithi04533857-931:62# key 0x7F8000000000000003AC13'W6!smithi04533857-931!='0x0000000000000062FFFFFFFFFFFFFFFF6F
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1).collection(3.5_head 0x516000267c80)  r -2 v.len 0
DEBUG 2025-01-07 16:35:58,504 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction op 14 got ENOENT on #3:ac135736:::smithi04533857-931:62#
ERROR 2025-01-07 16:35:58,504 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction error (2) No such file or directory not handled on operation 14 (op 0, counting from 0)
ERROR 2025-01-07 16:35:58,504 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-1) unexpected error code
WARN  2025-01-07 16:35:58,505 [shard 0:main] bluestore - _dump_transaction transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "setattr",
            "collection": "3.5_head",
            "oid": "#3:ac135736:::smithi04533857-931:62#",
            "name": "_",
            "length": 232
        },
        {
            "op_num": 1,
            "op_name": "setattrs",
            "collection": "3.5_head",
            "oid": "#3:ac135736:::smithi04533857-931:head#",
            "attr_lens": {
                "_": 232,
                "snapset": 163
            }
        },
        {
            "op_num": 2,
            "op_name": "omap_setkeys",
            "collection": "3.5_head",
            "oid": "#3:a0000000:.internal_pg_local::snapmapper:head#",
            "attr_lens": {
                "OBJ_0000000000000003.538CAEC6.62.smithi04533857-931..": 147
            }
        },
        {
            "op_num": 3,
            "op_name": "omap_rmkeys",
            "collection": "3.5_head",
            "oid": "#3:a0000000:.internal_pg_local::snapmapper:head#",
            "attrs": [
                "SNA_3_0000000000000038_0000000000000003.538CAEC6.62.smithi04533857-931..",
                "SNA_3_0000000000000048_0000000000000003.538CAEC6.62.smithi04533857-931..",
                "SNA_3_0000000000000058_0000000000000003.538CAEC6.62.smithi04533857-931..",
                "SNA_3_000000000000005D_0000000000000003.538CAEC6.62.smithi04533857-931..",
                "SNA_3_000000000000005F_0000000000000003.538CAEC6.62.smithi04533857-931.." 
            ]
        },
        {
            "op_num": 4,
            "op_name": "omap_setkeys",
            "collection": "3.5_head",
            "oid": "#3:a0000000::::head#",
            "attr_lens": {
                "0000000318.00000000000000000454": 282,
                "0000000318.00000000000000000455": 198,
                "_fastinfo": 186
            }
        }
    ]
}

Actions #10

Updated by Matan Breizman about 1 year ago

  • Related to Bug #69702: crimson: take obc lock on replica during push commit added
Actions #11

Updated by Matan Breizman about 1 year ago

Matan Breizman wrote in #note-9:

https://pulpito.ceph.com/matan-2025-03-13_09:29:53-crimson-rados-wip-matanb-cmake-crimson-only-v3-distro-crimson-smithi/8186783
osd.1

See:

DEBUG 2025-03-13 10:37:40,443 [shard 1:main] osd -  pg_epoch 104 pg[3.0( v 95'294 lc 93'290 (83'282,95'294] local-lis/les=103/104 n=79 ec=18/18 lis/c=103/91 les/c/f=104/92/0 sis=103) [3,2,1] r=2 lpr=103 pi=[91,103)/1 crt=95'294 lcod 0'0 active  ReplicatedRecoveryBackend::handle_push: MOSDPGPush(3.0 104/103 [PushOp(3:0cca4c40:::smithi16035759-301:f, version: 95'291, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:0cca4c40:::smithi16035759-301:f@95'291, size: 3042100, copy_subset: [], clone_subset: {}, snapset: f=[]:{f: [f, e, d, c, a, 9, 7, 3]}, object_exist: true), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 0, data_complete: true, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: true, error: false))]) v4
DEBUG 2025-03-13 10:37:40,476 [shard 1:main] osd -  pg_epoch 104 pg[3.0( v 95'294 lc 95'291 (83'282,95'294] local-lis/les=103/104 n=79 ec=18/18 lis/c=103/91 les/c/f=104/92/0 sis=103) [3,2,1] r=2 lpr=103 pi=[91,103)/1 crt=95'294 active  ReplicatedRecoveryBackend::handle_push: MOSDPGPush(3.0 104/103 [PushOp(3:0cca4c40:::smithi16035759-301:head, version: 95'292, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(3:0cca4c40:::smithi16035759-301:head@95'292, size: 3832227, copy_subset: [], clone_subset: {}, snapset: 0=[]:{}, object_exist: true), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 0, data_complete: true, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: true, error: false))]) v4
DEBUG 2025-03-13 10:37:40,492 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction op 15 got ENOENT on #3:0cca4c40:::smithi16035759-301:head#

Could be explained by: https://tracker.ceph.com/issues/69702

Actions #12

Updated by Matan Breizman 12 months ago

  • Status changed from New to Need More Info

https://tracker.ceph.com/issues/69702 is resolved. Let's keep this one open to see if still occurs.

Actions #13

Updated by Matan Breizman 9 months ago

  • Status changed from Need More Info to Closed
Actions

Also available in: Atom PDF