osd/ECExtentCache: prevent use-after-free by checking Object existence#64062
osd/ECExtentCache: prevent use-after-free by checking Object existence#64062
Conversation
|
please see the attached report from ASan: |
|
@aainscow hi Alex, could you please help review this change? |
aainscow
left a comment
There was a problem hiding this comment.
Requesting changes for now, happy to have a discussion on this.
| delete_maybe(); | ||
| if (pg.objects.contains(oid)) { | ||
| delete_maybe(); | ||
| } |
There was a problem hiding this comment.
I have a fix for this in #63873. You don't need the if clause, it is good enough to simply delete the delete_maybe() as the delete always happens at a higher level.
There was a problem hiding this comment.
@aainscow as i explained in details in the commit message, we have to check for its existence. otherwise i still have the use-after-free failure.
regarding removing the delete_maybe(), i tried. but it failed a test. some object was not removed at the end of test.
i will paste the error messages from the failing test once i have access to my linux box.
There was a problem hiding this comment.
@aainscow this is the failing test if we drop delete_maybe() call in unpin().
226: [ RUN ] ECExtentCache.test_invalidate
226: /home/kefu/dev/ceph/src/osd/ECExtentCache.cc: In function 'void ECExtentCache::on_change2() const' thread 7f3e21c00200 time 2025-06-21T06:43:37.436513+0800
226: /home/kefu/dev/ceph/src/osd/ECExtentCache.cc: 335: FAILED ceph_assert(objects.empty())
226: ceph version Development (no_version) tentacle (dev - Debug)
226: 1: (ceph::ClibBackTrace::ClibBackTrace(int)+0x121) [0x7f3e1f4626b9]
226: 2: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x443) [0x7f3e1f45dc85]
226: 3: (ECExtentCache::on_change2() const+0x76) [0x55a174bc9550]
226: 4: (ECExtentCache_test_invalidate_Test::TestBody()+0x1f0a) [0x55a174a8950e]
226: 5: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0xa4) [0x55a174b8e8be]
226: 6: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0xfe) [0x55a174b79bd5]
226: 7: (testing::Test::Run()+0x173) [0x55a174b25ca3]
226: 8: (testing::TestInfo::Run()+0x396) [0x55a174b27202]
226: 9: (testing::TestSuite::Run()+0x367) [0x55a174b283ab]
226: 10: (testing::internal::UnitTestImpl::RunAllTests()+0x8c8) [0x55a174b4980e]
226: 11: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0xa4) [0x55a174b92839]
226: 12: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0xfe) [0x55a174b7c7b1]
226: 13: (testing::UnitTest::Run()+0x178) [0x55a174b46058]
226: 14: (RUN_ALL_TESTS()+0x11) [0x55a174acfae3]
226: 15: main()
226: 16: /usr/lib/libc.so.6(+0x276b5) [0x7f3e1c8376b5]
226: 17: __libc_start_main()
226: 18: _start()
2/2 Test #226: unittest_extent_cache ............Subprocess aborted***Exception: 0.09 sec
or, probably what you suggest was to remove it from Object::erase_line() ?
There was a problem hiding this comment.
Thanks. I am investigating the failure on my branch.
There was a problem hiding this comment.
I don't get this same failure in my branch, which does contain other fixes to the extent cache. You may well be hitting another failure.
I ran valgrind against the same test you were running and I don't see any use-after-free errors in my code anymore.
Can we wait until my PR has gone in and address this then?
There was a problem hiding this comment.
I don't get this same failure in my branch, which does contain other fixes to the extent cache. You may well be hitting another failure.
I ran valgrind against the same test you were running and I don't see any use-after-free errors in my code anymore.
i am using ASan. probably, as you suggest, some other fixes already address it.
Can we wait until my PR has gone in and address this then?
sure. this PR is not a blocker at this moment. #56537 is pending on some other PRs as well.
Prevent use-after-free errors when clearing cache lines by preserving member variables and verifying Object existence before cleanup. When clearing the last cache line referencing an Object, the Object may be automatically removed from the cache and deleted. Calling delete_maybe() on a deleted Object would result in a use-after-free error. This change: - Preserves necessary member variables on the stack before lines.clear() - Queries the cache with the Object's oid to check it still exists - Only calls delete_maybe() if the Object is still present in cache This ensures safe cleanup by avoiding operations on potentially deleted Objects while maintaining proper cache management. Signed-off-by: Kefu Chai <tchaikov@gmail.com>
caf9014 to
9a6426e
Compare
|
closing in favor of #63873. |
Prevent use-after-free errors when clearing cache lines by preserving member variables and verifying Object existence before cleanup.
When clearing the last cache line referencing an Object, the Object may be automatically removed from the cache and deleted. Calling delete_maybe() on a deleted Object would result in a use-after-free error.
This change:
This ensures safe cleanup by avoiding operations on potentially deleted Objects while maintaining proper cache management.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition