erasure-code/isa: fix cache collision causing buffer overflow#66894
erasure-code/isa: fix cache collision causing buffer overflow#66894NitzanMordhai merged 1 commit intoceph:mainfrom
Conversation
12069bf to
8d27247
Compare
351c1f5 to
044abd8
Compare
This commit fixes a critical cache key collision bug in the ISA erasure
code plugin that could lead to heap-buffer-overflow and silent data
corruption.
Problem:
--------
The decoding table cache was indexed only by matrix type and erasure
signature (available/missing chunk pattern), but did NOT include the
(k,m) erasure code configuration parameters. This caused different EC
configurations with similar erasure patterns to collide in the cache,
leading to incorrectly-sized cached buffers being reused.
AddressSanitizer Report:
------------------------
==4904==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x5160001397b8 at pc 0x5de8e415296b bp 0x7ffc82260310 sp 0x7ffc8225fad0
READ of size 576 at 0x5160001397b8 thread T0
#0 __asan_memcpy
#1 ErasureCodeIsaTableCache::getDecodingTableFromCache()
.../ErasureCodeIsaTableCache.cc:260:5
#2 ErasureCodeIsaDefault::isa_decode()
.../ErasureCodeIsa.cc:490:15
0x5160001397b8 is located 0 bytes after 568-byte region
[0x516000139580,0x5160001397b8) allocated by:
#0 posix_memalign
#1 ceph::buffer::raw_combined::alloc_data_n_controlblock()
#2 ErasureCodeIsaTableCache::putDecodingTableToCache()
.../ErasureCodeIsaTableCache.cc:319:18
Root Cause:
-----------
Scenario illustrating the bug:
1. First decode operation: k=2, m=1, erasure pattern "+0+2-1"
- Creates cache entry with key "+0+2-1"
- Allocates buffer: 2*(1+2)*32 = 192 bytes
2. Second decode operation: k=3, m=3, same erasure pattern "+0+2-1"
- Looks up cache with key "+0+2-1" → COLLISION
- Retrieves 192-byte buffer but needs 3*(3+3)*32 = 576 bytes
- Result: Heap-buffer-overflow (reads 384 bytes beyond allocation)
Worse scenario (silent corruption):
1. First decode: k=3, m=3 → caches 576-byte table
2. Second decode: k=2, m=1 → retrieves wrong table
- Uses incorrect decoding matrix
- Result: Silent data corruption during recovery
Solution:
---------
Include k and m parameters in cache signature
- Old format: "+0+2+3-1-4"
- New format: "k3m2a+0+2+3e-1-4"
Test Fix:
---------
Also fixes a buffer overflow in TestErasureCodePlugins.cc where
hashes_bl offset was calculated using chunk_size instead of sizeof(uint32_t),
causing reads beyond the CRC buffer.
Production Impact:
------------------
Backward Compatibility: FULLY COMPATIBLE
- Cache is ephemeral (in-memory only, not persisted)
- Cache cleared on process restart
- Rolling upgrades safe - each OSD restart gets fixed code
- Old cache entries automatically invalidated on upgrade
- No wire protocol or on-disk format changes
- No configuration changes required
- No breaking changes
Data Integrity:
- Eliminates silent data corruption risk
- Eliminates heap-buffer-overflow crashes
- Cache now correctly isolated by (k,m) configuration
- Correct decoding tables always used for recovery
- No risk of corrupting user data from the fix itself
Why Users Haven't Complained:
------------------------------
Several factors likely prevented widespread reports:
1. Low probability conditions required:
- Need multiple EC pools with DIFFERENT (k,m) configurations
- Need similar erasure patterns across pools
- Need cache collision to occur during actual recovery operations
- Recovery operations are relatively rare in healthy clusters
2. Crash vs silent corruption detection:
- Buffer overflows (easier to detect) occur when k2,m2 > k1,m1
- Silent corruption (harder to detect) occurs when k2,m2 < k1,m1
- Crashes might be attributed to other causes
- Data corruption only detected during scrub or data verification
3. Common deployment patterns:
- Many deployments use single EC configuration cluster-wide
- Default EC configurations (k=2,m=1 or k=4,m=2) reduce collision space
- Erasure pattern variety may be insufficient for collisions
4. ISA plugin usage:
- Not universally deployed (requires Intel ISA-L library)
- Some sites use jerasure plugin instead
- Plugin selection depends on hardware and configuration
5. Detection difficulty:
- ASan not enabled in production builds
- Silent corruption only appears during:
* Degraded reads with recovery
* Scrub operations
* Deep-scrub verification
- Corrupted data might not be immediately accessed
Fixes: https://tracker.ceph.com/issues/74382
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
044abd8 to
7e66871
Compare
|
jenkins test api |
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
|
@ceph/core Hello maintainers, could you please review this PR when you have a chance? I believe this addresses a critical issue that may warrant prioritized attention. Thank you! |
| } | ||
| } | ||
|
|
||
| std::string erasure_signature(sig_buf); |
There was a problem hiding this comment.
It's all about the uniqueness across instances of ErasureCodeIsaTableCache. The factory class ErasureCodePluginIsa has exactly one shared among all created ErasureCodeIsaDefault instances.
class ErasureCodePluginIsa : public ceph::ErasureCodePlugin {
public:
ErasureCodeIsaTableCache tcache;
int factory(const std::string &directory,
ceph::ErasureCodeProfile &profile,
ceph::ErasureCodeInterfaceRef *erasure_code,
std::ostream *ss) override;
};int ErasureCodePluginIsa::factory(const std::string &directory,
ceph::ErasureCodeProfile &profile,
ceph::ErasureCodeInterfaceRef *erasure_code,
std::ostream *ss)
{
ErasureCodeIsa *interface;
std::string technique;
technique = profile.find("technique")->second;
std::string _m = profile.find("m")->second;
if ((technique == "reed_sol_van")) {
interface = new ErasureCodeIsaDefault(tcache,
technique,
ErasureCodeIsaDefault::kVandermonde,
_m);
} else if ((technique == "cauchy")) {
interface = new ErasureCodeIsaDefault(tcache,
technique,
ErasureCodeIsaDefault::kCauchy,
_m);
} else {
*ss << "technique=" << technique << " is not a valid coding technique. "
<< " Choose one of the following: "
<< "reed_sol_van,"
<< "cauchy" << std::endl;
return -ENOENT;
}
int r = interface->init(profile, ss);
if (r) {
delete interface;
return r;
}
*erasure_code = ceph::ErasureCodeInterfaceRef(interface);
return 0;
}The alternative solution could be to grow the number of cache instances available.
The bug seems to very old, dating back to at least 2014.
Other, non-Jerasure plugins might be affected as well.
There was a problem hiding this comment.
The bug seems to very old, dating back to at least 2014.
please see the tracker ticket, this issue was introduced in the day 0, so all branches are impacted.
Other, non-Jerasure plugins might be affected as well.
as you explained above and also see the tracker ticket, this caching machinery is specific to the ISA erasure code plugin. so only ISA erasure code plugin is impacted.
There was a problem hiding this comment.
this caching machinery is specific to the ISA erasure code plugin. so only ISA erasure code plugin is impacted.
Yeah, Shec also has a global tcache but the naming schema is different than for ISA.
There was a problem hiding this comment.
indeed. what shec's ErasureCodeShecTableCache is using is a combination of techinique, k, m, c and w. so it's immune to this problem.
see
|
I have been trying to trigger the data corruption bug on a vstart cluster, but have not succeeded yet. A couple of points I noticed that I think reduce the chance of this bug being hit:
|
This commit fixes a critical cache key collision bug in the ISA erasure code plugin that could lead to heap-buffer-overflow and silent data corruption.
Problem:
The decoding table cache was indexed only by matrix type and erasure signature (available/missing chunk pattern), but did NOT include the (k,m) erasure code configuration parameters. This caused different EC configurations with similar erasure patterns to collide in the cache, leading to incorrectly-sized cached buffers being reused.
AddressSanitizer Report:
Root Cause:
Scenario illustrating the bug:
Worse scenario (silent corruption):
Solution:
Include k and m parameters in cache signature
Test Fix:
Also fixes a buffer overflow in TestErasureCodePlugins.cc where hashes_bl offset was calculated using chunk_size instead of sizeof(uint32_t), causing reads beyond the CRC buffer.
Production Impact:
Backward Compatibility: FULLY COMPATIBLE
Data Integrity: SIGNIFICANTLY IMPROVED
Performance: IMPROVED
Why Users Haven't Complained:
Several factors likely prevented widespread reports:
Low probability conditions required:
Crash vs silent corruption detection:
Common deployment patterns:
ISA plugin usage:
Detection difficulty:
Fixes: https://tracker.ceph.com/issues/74382
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.