Bug #74382: ISA Erasure Code Cache Collision Causing Buffer Overflow and Data Corruption - RADOS - Ceph

Actions

Copy link

Bug #74382

open

ISA Erasure Code Cache Collision Causing Buffer Overflow and Data Corruption

Added by Kefu Chai 2 months ago. Updated about 1 month ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Kefu Chai

Category:

EC Pools

Target version:

% Done:

Source:

Backport:

reef,squid,tentacle

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v10.0.4, Ceph - v19.0.0, Ceph - v0.94.10, Ceph - v0.94.11, Ceph - v0.94.2, Ceph - v0.94.3, Ceph - v0.94.4, Ceph - v0.94.5, Ceph - v0.94.6, Ceph - v0.94.7, Ceph - v0.94.8, Ceph - v0.94.9, Ceph - v10.0.0, Ceph - v10.1.1, Ceph - v10.2.0, Ceph - v10.2.1, Ceph - v10.2.10, Ceph - v10.2.11, Ceph - v10.2.12, Ceph - v10.2.2, Ceph - v10.2.3, Ceph - v10.2.4, Ceph - v10.2.5, Ceph - v10.2.6, Ceph - v10.2.7, Ceph - v10.2.8, Ceph - v10.2.9, Ceph - v11.1.0, Ceph - v11.2.0, Ceph - v11.2.1, Ceph - v11.2.2, Ceph - v12.0.0, Ceph - v12.1.0, Ceph - v12.2.0, Ceph - v12.2.1, Ceph - v12.2.10, Ceph - v12.2.11, Ceph - v12.2.12, Ceph - v12.2.13, Ceph - v12.2.14, Ceph - v12.2.2, Ceph - v12.2.3, Ceph - v12.2.4, Ceph - v12.2.5, Ceph - v12.2.6, Ceph - v12.2.7, Ceph - v12.2.8, Ceph - v12.2.9, Ceph - v13.0.0, Ceph - v13.2.0, Ceph - v13.2.1, Ceph - v13.2.10, Ceph - v13.2.11, Ceph - v13.2.2, Ceph - v13.2.3, Ceph - v13.2.4, Ceph - v13.2.5, Ceph - v13.2.6, Ceph - v13.2.7, Ceph - v13.2.8, Ceph - v13.2.9, Ceph - v14.0.0, Ceph - v14.2.0, Ceph - v14.2.1, Ceph - v14.2.10, Ceph - v14.2.11, Ceph - v14.2.12, Ceph - v14.2.13, Ceph - v14.2.14, Ceph - v14.2.15, Ceph - v14.2.16, Ceph - v14.2.17, Ceph - v14.2.18, Ceph - v14.2.19, Ceph - v14.2.2, Ceph - v14.2.20, Ceph - v14.2.21, Ceph - v14.2.22, Ceph - v14.2.23, Ceph - v14.2.3, Ceph - v14.2.4, Ceph - v14.2.5, Ceph - v14.2.6, Ceph - v14.2.7, Ceph - v14.2.8, Ceph - v14.2.9, Ceph - v15.0.0, Ceph - v15.2.1, Ceph - v15.2.10, Ceph - v15.2.11, Ceph - v15.2.12, Ceph - v15.2.13, Ceph - v15.2.14, Ceph - v15.2.15, Ceph - v15.2.16, Ceph - v15.2.17, Ceph - v15.2.2, Ceph - v15.2.3, Ceph - v15.2.4, Ceph - v15.2.5, Ceph - v15.2.6, Ceph - v15.2.7, Ceph - v15.2.8, Ceph - v15.2.9, Ceph - v16.0.0, Ceph - v16.0.1, Ceph - v16.1.0, Ceph - v16.1.1, Ceph - v16.2.0, Ceph - v16.2.1, Ceph - v16.2.10, Ceph - v16.2.11, Ceph - v16.2.12, Ceph - v16.2.13, Ceph - v16.2.14, Ceph - v16.2.15, Ceph - v16.2.2, Ceph - v16.2.3, Ceph - v16.2.4, Ceph - v16.2.5, Ceph - v16.2.6, Ceph - v16.2.7, Ceph - v16.2.8, Ceph - v16.2.9, Ceph - v17.0.0, Ceph - v17.2.1, Ceph - v17.2.2, Ceph - v17.2.3, Ceph - v17.2.4, v17.2.4, Ceph - v17.2.5, Ceph - v17.2.6, v17.2.6, Ceph - v17.2.7, Ceph - v17.2.8, Ceph - v18.0.0, Ceph - v18.1.0, Ceph - v18.1.1, Ceph - v18.1.2, Ceph - v18.1.3, Ceph - v18.2.0, Ceph - v18.2.1, Ceph - v18.2.2, Ceph - v18.2.3, Ceph - v18.2.4, Ceph - v18.2.5, Ceph - v18.2.6, Ceph - v18.2.7, Ceph - v18.2.8, Ceph - v19.1.0, Ceph - v19.1.1, Ceph - v19.2.0, Ceph - v19.2.1, Ceph - v19.2.2, Ceph - v19.2.3, Ceph - v19.2.4, Ceph - v20.0.0, Ceph - v20.2.0, Ceph - v20.2.1, Ceph - v21.0.0, Ceph - v9.1.1, Ceph - v9.2.1, Ceph - v9.2.2

ceph-qa-suite:

Component(RADOS):

Pull request ID:

66894

Tags (freeform):

backport_processed

Merge Commit:

b549669973b8186b6bb10f59a6108a571a3e44e1

Fixed In:

v20.3.0-5335-gb549669973

Released In:

Upkeep Timestamp:

2026-02-15T15:22:01+00:00

Description

The ISA erasure code plugin has a critical cache key collision bug that can cause heap-buffer-overflow crashes and silent data corruption during recovery operations. The decoding table cache does not include (k,m) parameters in the cache key, allowing different erasure code configurations to collide and reuse incorrectly-sized cached buffers.

Problem¶

The ISA erasure code plugin caches decoding tables to improve performance during data recovery operations. The cache key is constructed from:
- Matrix type (Cauchy or Vandermonde)
- Erasure signature (pattern of available/missing chunks)

However, the cache key does not include the (k,m) erasure code configuration parameters. This allows different EC configurations with similar erasure patterns to collide in the cache, causing:

1. Buffer overflow crashes when a smaller cached buffer is accessed with a larger size
2. Silent data corruption when wrong decoding matrices are used for recovery

Root Cause¶

In ErasureCodeIsa.cc, the isa_decode() function constructs the cache signature as:

std::string erasure_signature;
for (i = 0, r = 0; i < k; i++, r++) {
    // ... adds "+X" for available chunks
}
for (int p = 0; p < nerrs; p++) {
    // ... adds "-Y" for missing chunks
}

The signature includes only the chunk availability pattern (e.g., "+0+2+3-1-4") but not the k,m values. Since the decoding table size is `k * (m + k) * 32` bytes, different (k,m) configurations produce different-sized tables.

Exploit Scenario 1: Buffer Overflow (Crash)¶

First decode operation with k=2, m=1:
- Erasure pattern: "+0+2-1" (chunks 0,2 available, chunk 1 missing)
- Cache key: "+0+2-1"
- Buffer allocated: 2 * (1+2) * 32 = 192 bytes

Second decode operation with k=3, m=3:
- Same erasure pattern: "+0+2-1" (chunks 0,2 available, chunk 1 missing)
- Cache key lookup: "+0+2-1" → COLLISION!
- Retrieves 192-byte buffer
- Attempts to copy: 3 * (3+3) * 32 = 576 bytes
- Result: Heap-buffer-overflow, reads 384 bytes beyond allocation

Exploit Scenario 2: Silent Data Corruption (Worse)¶

First decode operation with k=3, m=3:
- Cache key: "+0+2+3-1-4"
- Stores 576-byte decoding table for k=3, m=3

Second decode operation with k=2, m=1:
- Same cache key: "+0+2+3-1-4" → COLLISION!
- Retrieves decoding table for k=3, m=3
- Uses incorrect matrix to decode k=2, m=1 data
- Result: Silent data corruption, wrong data recovered

Test Case¶

A test case demonstrating the issue is available in `src/test/erasure-code/TestErasureCodePlugins.cc`. Running:


ctest -R unittest_erasure_code_plugins --verbose

With AddressSanitizer enabled produces:

==4904==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x5160001397b8 at pc 0x5de8e415296b bp 0x7ffc82260310 sp 0x7ffc8225fad0
READ of size 576 at 0x5160001397b8 thread T0
    #0 __asan_memcpy
    #1 ErasureCodeIsaTableCache::getDecodingTableFromCache()
       src/erasure-code/isa/ErasureCodeIsaTableCache.cc:260:5
    #2 ErasureCodeIsaDefault::isa_decode()
       src/erasure-code/isa/ErasureCodeIsa.cc:490:15

0x5160001397b8 is located 0 bytes after 568-byte region
[0x516000139580,0x5160001397b8) allocated by:
    #0 posix_memalign
    #1 ceph::buffer::raw_combined::alloc_data_n_controlblock()
    #2 ErasureCodeIsaTableCache::putDecodingTableToCache()
       src/erasure-code/isa/ErasureCodeIsaTableCache.cc:319:18