Skip to content

Conversation

@nhorman
Copy link
Contributor

@nhorman nhorman commented Aug 7, 2025

The newly added handshake-memfail test has flagged several asan and assert errors that arise from various memory failure conditions. This PR addresses those so that the test fails gracefully instead of crashing.

The new malloc failure test caught an asan error in this code:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
2025-08-07T03:22:20.3655117Z     #0 0x7fb88d8fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
2025-08-07T03:22:20.3655796Z     #1 0x5584f0e4670a in CRYPTO_malloc crypto/mem.c:211
2025-08-07T03:22:20.3656291Z     #2 0x5584f0e4679d in CRYPTO_zalloc crypto/mem.c:231
2025-08-07T03:22:20.3657040Z     #3 0x5584f11c4c10 in EVP_RAND_CTX_new crypto/evp/evp_rand.c:353
2025-08-07T03:22:20.3657656Z     openssl#4 0x5584f0e93b27 in rand_new_drbg crypto/rand/rand_lib.c:666
2025-08-07T03:22:20.3658289Z     openssl#5 0x5584f0e949d0 in rand_get0_public crypto/rand/rand_lib.c:843
2025-08-07T03:22:20.3658914Z     openssl#6 0x5584f0e9305b in RAND_bytes_ex crypto/rand/rand_lib.c:490
2025-08-07T03:22:20.3659486Z     openssl#7 0x5584f0b2405f in SSL_CTX_new_ex ssl/ssl_lib.c:4191
2025-08-07T03:22:20.3660183Z     openssl#8 0x5584f0ae313c in create_ssl_ctx_pair test/helpers/ssltestlib.c:958
2025-08-07T03:22:20.3660871Z     openssl#9 0x5584f0adeaf6 in do_handshake test/handshake-memfail.c:56
2025-08-07T03:22:20.3661539Z     openssl#10 0x5584f0adee50 in test_alloc_failures test/handshake-memfail.c:125
2025-08-07T03:22:20.3662161Z     openssl#11 0x5584f0cd9da8 in run_tests test/testutil/driver.c:342
2025-08-07T03:22:20.3662664Z     openssl#12 0x5584f0cda9e5 in main test/testutil/main.c:31
2025-08-07T03:22:20.3663450Z     openssl#13 0x7fb88d42a1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
2025-08-07T03:22:20.3664630Z     openssl#14 0x7fb88d42a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
2025-08-07T03:22:20.3666608Z     openssl#15 0x5584f0ade864 in _start (/home/runner/work/openssl/openssl/test/handshake-memfail+0x22a864) (BuildId: 19659a44d8bed2c082918d25425f77e3a98df534)

It occurs because when rand_get0_public/rand_get0_private sets an
EVP_RAND_CTX object in its thread local storage, it neglects to check
the return code of the operation, which may fail when the associated
sparse array is expanded.

fix it by checking the return code and failing the get0_[public|private]
operation so the failure is graceful.

Fixes openssl/project#1315
@nhorman nhorman self-assigned this Aug 7, 2025
@nhorman nhorman marked this pull request as draft August 7, 2025 14:24
@nhorman nhorman linked an issue Aug 7, 2025 that may be closed by this pull request
during memfail testing:
https://github.com/openssl/openssl/actions/runs/16794088536/job/47561223902

We get lots of test failures in ossl_rcu_read_lock.  This occurs
because we have a few cases in the read lock path that attempt mallocs,
which, if they fail, trigger an assert or a silent failure, which isn't
really appropriate.  We should instead fail gracefully, by informing the
caller that the lock failed, like we do for CRYPTO_THREAD_read_lock.

Fortunately, these are all internal apis, so we can convert
ossl_rcu_read_lock to return an int indicating success/failure, and fail
gracefully during the test, rather than hitting an assert abort.

Fixes openssl/project#1315
@nhorman nhorman force-pushed the fix-memfail-asan-issues branch from 393062b to aa70a06 Compare August 7, 2025 14:28
@nhorman nhorman added branch: master Applies to master branch approval: review pending This pull request needs review by a committer labels Aug 7, 2025
@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Aug 7, 2025
@nhorman nhorman marked this pull request as ready for review August 7, 2025 14:39
@vavroch2010 vavroch2010 requested review from a team and vdukhovni August 7, 2025 15:11
Copy link
Member

@beldmit beldmit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@InfoHunter InfoHunter added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Aug 8, 2025
case DO_LOOKUP:
ossl_ht_read_lock(m_ht);
if (!ossl_ht_read_lock(m_ht))
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to fail here? i.e. set worker_exits[num] and return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, but we don't have to. The lookups in this test just find an element in the hash table and confirm that its supposed to be there. Failure to get a lock in this case just represents a transient failure that doesn't impact the outcome of what the test is actually testing. It would be a problem if we could never get the lock for the duration of the test, that might be a problem, as it means we never tested a lookup operation. But a single failure isn't really a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this ever fail? Are there legitimate reasons why it might do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than a transient malloc failure outside of the memfail test, no, I can't think of any. But a transient memory allocation failure on a heavily loaded system seems like it might be possible.

@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Aug 9, 2025
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@nhorman
Copy link
Contributor Author

nhorman commented Aug 9, 2025

merged to master, thank you!

@nhorman nhorman closed this Aug 9, 2025
openssl-machine pushed a commit that referenced this pull request Aug 9, 2025
The new malloc failure test caught an asan error in this code:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
2025-08-07T03:22:20.3655117Z     #0 0x7fb88d8fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
2025-08-07T03:22:20.3655796Z     #1 0x5584f0e4670a in CRYPTO_malloc crypto/mem.c:211
2025-08-07T03:22:20.3656291Z     #2 0x5584f0e4679d in CRYPTO_zalloc crypto/mem.c:231
2025-08-07T03:22:20.3657040Z     #3 0x5584f11c4c10 in EVP_RAND_CTX_new crypto/evp/evp_rand.c:353
2025-08-07T03:22:20.3657656Z     #4 0x5584f0e93b27 in rand_new_drbg crypto/rand/rand_lib.c:666
2025-08-07T03:22:20.3658289Z     #5 0x5584f0e949d0 in rand_get0_public crypto/rand/rand_lib.c:843
2025-08-07T03:22:20.3658914Z     #6 0x5584f0e9305b in RAND_bytes_ex crypto/rand/rand_lib.c:490
2025-08-07T03:22:20.3659486Z     #7 0x5584f0b2405f in SSL_CTX_new_ex ssl/ssl_lib.c:4191
2025-08-07T03:22:20.3660183Z     #8 0x5584f0ae313c in create_ssl_ctx_pair test/helpers/ssltestlib.c:958
2025-08-07T03:22:20.3660871Z     #9 0x5584f0adeaf6 in do_handshake test/handshake-memfail.c:56
2025-08-07T03:22:20.3661539Z     #10 0x5584f0adee50 in test_alloc_failures test/handshake-memfail.c:125
2025-08-07T03:22:20.3662161Z     #11 0x5584f0cd9da8 in run_tests test/testutil/driver.c:342
2025-08-07T03:22:20.3662664Z     #12 0x5584f0cda9e5 in main test/testutil/main.c:31
2025-08-07T03:22:20.3663450Z     #13 0x7fb88d42a1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
2025-08-07T03:22:20.3664630Z     #14 0x7fb88d42a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
2025-08-07T03:22:20.3666608Z     #15 0x5584f0ade864 in _start (/home/runner/work/openssl/openssl/test/handshake-memfail+0x22a864) (BuildId: 19659a44d8bed2c082918d25425f77e3a98df534)

It occurs because when rand_get0_public/rand_get0_private sets an
EVP_RAND_CTX object in its thread local storage, it neglects to check
the return code of the operation, which may fail when the associated
sparse array is expanded.

fix it by checking the return code and failing the get0_[public|private]
operation so the failure is graceful.

Fixes openssl/project#1315

Reviewed-by: Paul Yang <paulyang.inf@gmail.com>
Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
(Merged from #28195)
openssl-machine pushed a commit that referenced this pull request Aug 9, 2025
during memfail testing:
https://github.com/openssl/openssl/actions/runs/16794088536/job/47561223902

We get lots of test failures in ossl_rcu_read_lock.  This occurs
because we have a few cases in the read lock path that attempt mallocs,
which, if they fail, trigger an assert or a silent failure, which isn't
really appropriate.  We should instead fail gracefully, by informing the
caller that the lock failed, like we do for CRYPTO_THREAD_read_lock.

Fortunately, these are all internal apis, so we can convert
ossl_rcu_read_lock to return an int indicating success/failure, and fail
gracefully during the test, rather than hitting an assert abort.

Fixes openssl/project#1315

Reviewed-by: Paul Yang <paulyang.inf@gmail.com>
Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
(Merged from #28195)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Applies to master branch severity: fips change The pull request changes FIPS provider sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix memfail test asan issue

9 participants