ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE by DDvO · Pull Request #14833 · openssl/openssl

DDvO · 2021-04-12T08:32:39Z

This proposes a simple fix of #6251. Maybe this is already sufficient?
It also takes into account thoughts given in #5781 and #6217.

t8m · 2021-04-12T09:42:00Z

You'd have to clean up all the calls where ERR_R_MALLOC_FAILURE is raised otherwise there would be double error reporting wouldn't it?

crypto/mem.c

DDvO · 2021-04-12T10:02:28Z

You'd have to clean up all the calls where ERR_R_MALLOC_FAILURE is raised otherwise there would be double error reporting wouldn't it?

Yes, but I'd spend this major effort only if/when the core solution is firmly accepted.

Marking this PR as WIP for now, and post-3.0.0.

DDvO · 2021-04-12T11:32:27Z

My goodness.
For testing purposes, I injected pseudo-malloc failures after a given number of {m,re}alloc calls
in the beginning of report_malloc_failure() as follows:

    static int count = 31000;
    if (--count < 0)
        result = NULL;

Then, for instance make -j4 -s test TESTS="test_cms" fails.
When using 32000, this works fine.
So running an innocent OpenSSL app can cause more than 31000 memory allocation calls!

Raising the limit to 5300000 still causes make test to fail with this outcome:

Test Summary Report
-------------------
99-test_fuzz_server.t            (Wstat: 256 Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
99-test_fuzz_asn1.t              (Wstat: 256 Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
04-test_encoder_decoder.t        (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1

while for 5400000 only the above two fuzz tests fail.

richsalz · 2021-04-12T14:23:28Z

Suppose OPENSSL_malloc tries to raise an error condition, and the alloc at line 615 of err.c gets invoked. Infinite recursion I think.

DDvO · 2021-04-12T14:33:18Z

Suppose OPENSSL_malloc tries to raise an error condition, and the alloc at line 615 of err.c gets invoked. Infinite recursion I think.

That's what the following code is for:

        /* prevent error-in-error-handler malloc loop */
        if (len < 6
            ? strcmp(file, "err.c") != 0
            : strcmp(file + len - 6, "/err.c") != 0) {

richsalz · 2021-04-12T14:44:49Z

And you're sure all of the other possible "init" functions can't cause a loop? I guess not since they'll end up falling through to that err code. okay. thanks.

davidben · 2021-04-12T15:07:11Z

Conditioning based on the name of the file seems rather fragile. If __FILE__ behaves funny on some system, it will break. If someone renames err.c without realizing the malloc implementation has this check, it will break. If some other file matches the name check, it will break. Moreover, when it breaks, the problem is only visible on malloc failure, so it will go unnoticed for a long time. I would suggest just sticking to the straightforward thing and just making two functions.

This also isn't quite how you check for malloc failure. Some malloc implementations return NULL on zero, so you need to check both values.

richsalz · 2021-04-12T15:16:29Z

somebody has to be at the bottom; either malloc or error-stuff. making them interdependent, which this PR (as of this writing) does is like David said, fragile.

It's a pain to report malloc failures all over the place. This fix seems wrong.

DDvO · 2021-04-12T15:44:34Z

I had a closer look at what is potentially called through

            ERR_new();
            ERR_set_debug(file, line, NULL);
            ERR_set_error(ERR_LIB_CRYPTO, ERR_R_MALLOC_FAILURE, NULL);

and it's a whole mess of stuff, where as @richsalz wrote one cannot really be sure that this does not include any indirect memory allocation calls.
So indeed a different solution is needed.

DDvO · 2021-04-12T16:03:19Z

I'm now considering options to use a pre-allocated place that is reserved per thread
solely for the purpose of reporting the first occurrence of a malloc failure,
basically with the following idea in mind:

When a malloc failure is encountered and this place is available and not (yet) occupied, it is used,
else simply no new error entry is generated.
The probability that this pre-allocated place is not available should be negligible because it means that already during initialization the thread application runs out of memory.
The situation that the place is already occupied means that the thread ran into a malloc failure before, which should be the actual failure to be reported, while any subsequent ones are just part of the aftermath;
anyway all one can then hope is that the application is still able to output its error queue before dying.

Does anyone see a problem with this approach?

richsalz · 2021-04-12T17:28:42Z

The probability that this pre-allocated place is not available should be negligible because it means that already during initialization the thread runs out of memory.

In my experience, most of the time when malloc returns NULL it is because of a pointer problem (clobbering state) rather than (virtual) memory being exhausted. And also, we know some people are writing applications that spawn and reap multiple threads, rather than a fixed number near start-up time. So I think your probability estimate is wrong.

DDvO · 2021-04-12T17:43:58Z

The probability that this pre-allocated place is not available should be negligible because it means that already during initialization the thread runs out of memory.

In my experience, most of the time when malloc returns NULL it is because of a pointer problem (clobbering state) rather than (virtual) memory being exhausted.

Well, in such a case, memory is corrupted anyway,
so being able to (correctly) output error messages is then a matter of good luck.

And also, we know some people are writing applications that spawn and reap multiple threads, rather than a fixed number near start-up time. So I think your probability estimate is wrong.

Good point. This brings me to an improvement of the sketched new approach:
have the pre-allocated space not per thread, but per application,
which is actually simpler and should be sufficient.
Also the OpenSSL error queue is allocated per application, right?

richsalz · 2021-04-12T18:04:23Z

Also the OpenSSL error queue is allocated per application, right?

No it's per-thread. If that's not a typo, then I have to wonder if you really read the err.c code.

Quietly making "malloc failed" a global error, unlike all other errors, is not something to be just tossed into a PR but should be discussed by the OTC. I am going to not comment any more until after 3.0 is out.

DDvO · 2021-04-12T18:29:18Z

Also the OpenSSL error queue is allocated per application, right?

No it's per-thread. If that's not a typo, then I have to wonder if you really read the err.c code.

Argh, I should stop working after 12 hours. Apparently I had too many context switches today.
Sure, it's per thread: state = CRYPTO_THREAD_get_local(&err_thread_local);

Quietly making "malloc failed" a global error, unlike all other errors, is not something to be just tossed into a PR but should be discussed by the OTC. I am going to not comment any more until after 3.0 is out.

In the light of the above clarification, reverting my proposal to an extra space per thread.
Even if the probability of not being able to allocate it along with the normal error queue entries
is a little higher than what I thought initially, this is not really a problem:
If a new thread is able to allocate sizeof(ERR_STATE), it should be able to allocate also 17 instead of 16 (= ERR_NUM_ERRORS) slots. And even if not, the application will be in severe trouble already at that point anyway (and the worst thing that can happen with the new approach is that the ERR_R_MALLOC_FAILURE could be dropped - so what).

Fixes openssl#6251

DDvO · 2022-08-15T15:16:00Z

Finally, I took the time to continue working in on this, taking into account the review comments above.
I now take a more robust approach for preventing potential error-in-error-handler malloc loops:
counting per thread how many malloc errors are being reported.

@t8m, as discussed above, I'm going to clean up all the calls where ERR_R_MALLOC_FAILURE is raised
when I got positive feedback on the solution taken here.

t8m · 2022-08-23T12:08:42Z

OTC: The current state of the PR is not acceptable because it possibly reports error on 0 bytes allocated. If this is resolved this PR can be merged if there will be followup PR to fix the duplicated ERR_raise(ERR_R_MALLOC_FAILURE) calls in one way or another depending on the context.

Since the former is done now, the question remains how best to handle the existing ERR_raise(ERR_LIB_FOO, ERR_R_MALLOC_FAILURE) calls.

My clear preference is to replace them by ERR_raise(ERR_LIB_FOO, ERR_R_CRYPTO_LIB) for the following reasons:
1. This can be done automatically and can be approved easily because each such change is trivial,
   which is a pragmatically important argument given that there are more than 1200 such calls.

2. The new pseudo reason `ERR_R_CRYPTO_LIB` is not anymore misleading in case the actual failure reason is something else than `ERR_R_MALLOC_FAILURE`.

3. This keeps the existing traceability while not much polluting the error queue because actual malloc failures are rare.

OTC wanted to resolve them based on the context. If the ERR_raise() call is in the same function where the OPENSSL_*alloc() is done and is done solely as the result of the malloc failure, it should NOT be kept. For the rest, the replacement as you suggest would be OK.

t8m

LGTM apart from placing the docs into a different .pod file.

doc/man3/ERR_put_error.pod

…loc() and friends report ERR_R_MALLOC_FAILURE

Changed since

paulidale · 2022-08-24T01:07:13Z

crypto/err/err.c

+void *CRYPTO_malloc_with_err(void *(*fn)(size_t, const char *, int), size_t num,
+                             const char *file, int line, const char *func)
+{
+    void *ptr = (*fn)(num, file, line);


I'm not a fan of an indirect call here. We already do one to get to malloc et al and this makes it two. Moreover, this one won't be predicted at all well because so many functions are funneling through here.

The alternative would be having each of the allocation functions separately defined which is ugly but won't hit performance.

Sigh, a smart compiler might flatten this out, but likely we cannot hope for that.
Would be so nice to use just a macro for CRYPTO_malloc_with_err et al., but this is made impossible by the non-linear use of the num argument.

Anyway, I do not really see the very slight decrease in performance using the function pointer as an issue here.
Memory allocation is in itself a pretty expensive operation, so an extra function pointer dereference should not make any noticeable difference.
Moreover, memory allocation is very unusual in tight loops.

Could we make the functions static inline? Not sure if that doesn't open another can of worms though.

I'm uncertain if it is possible to declare CRYPTO_malloc_with_err et al. static because they need to be declared in a public header (crypto.h) for use with public macros such as OPENSSL_malloc().
I was also considering making CRYPTO_malloc_with_err et al. inline, but as there is s pretty large number of calls to OPENSSL_malloc() et al., this would likely add quite a bit to the code size, while the performance gain would be tiny.
So also in this direction I see no point trying to optimize on performance.

It has a spurious state to s change, that's not a big deal.

It adds new xxx_with_err functions that might not be needed if the decision is to use backtrace or its equivalent. (And if those functions are only for internal use, why are they in public header files and documentation?) It does no work to change the source base to use those with_err functions.

This seems like a partial change, no promise of future work to complete it and a known pending open issue about the direction in which to go.

@richsalz occasionally it feels like you wanna re-enter the OpenSSL project?

malloc shouldn't be a huge expense most of the time. Modern libc's implement it pretty well. A branch misprediction takes hundreds of cycles which represents quite an expense.

The other concern is about what we make public. Once public, forever public (almost). Do we want CRYPTO_malloc_with_err() as a public API?

Thinking a little more about all this, I found a solution that does not use function pointers and the like and
avoids using and adding (actually internal) aux functions to public headers. It even does not change header files at all.

It reports malloc errors once and for all in CRYPTO_malloc()
while making sure to prevent mem alloc error loops simply as follows:
ossl_err_get_state_int() in err.c uses the special call CRYPTO_zalloc(num, NULL, 0) for ERR_STATE allocation,
and CRYPTO_*alloc() do not report an malloc failure in this special case (i.e., file == NULL && line == 0).
All other OPENSSL_*malloc() and CRYPTO_*alloc() calls lead to file != NULL || line != 0 and thus do report any malloc error.

…PTO_malloc() and friends report ERR_R_MALLOC_FAILURE

levitte · 2022-08-25T18:56:31Z

@t8m:

If the ERR_raise() call is in the same function where the OPENSSL_*alloc() is done and is done solely as the result of the malloc failure, it should NOT be kept.

I disagree with this, especially with the latest changes

t8m · 2022-08-26T05:29:05Z

I disagree with this, especially with the latest changes

Well, with the latest changes keeping the original ERR_raises makes some sense. I'd still change them to ERR_R_CRYPTO_LIB.

@DDvO given this should be now a simple sed command, could you please create a PR to do this based on top of this PR?

t8m · 2022-08-26T05:38:01Z

Could you please test that if CRYPTO_*alloc() returns always NULL we do not get an infinite recursion on OPENSSL_init_crypto() call?

DDvO · 2022-08-26T07:07:46Z

Well, with the latest changes keeping the original ERR_raises makes some sense. I'd still change them to ERR_R_CRYPTO_LIB.

@DDvO given this should be now a simple sed command, could you please create a PR to do this based on top of this PR?

Done in #19072.

DDvO · 2022-08-26T07:11:02Z

Could you please test that if CRYPTO_*alloc() returns always NULL we do not get an infinite recursion on OPENSSL_init_crypto() call?

I just made all CRYPTO_*alloc() return NULL (after the changes done in #19072) and ran the tests.
Of course, a lot of them failed, but none of them hang.

DDvO · 2022-08-27T07:40:59Z

Merged - thanks a lot @t8m, @paulidale, @levitte, and @richsalz for the discussion and the approvals.
Very glad that we finally got this tricky thing through, even with a pretty elegant solution.

Fixes #6251 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: David von Oheimb <david.von.oheimb@siemens.com> (Merged from #14833)

openssl-machine · 2022-08-27T08:00:15Z

24 hours has passed since 'approval: done' was set, but as this PR has been updated in that time the label 'approval: ready to merge' is not being automatically set. Please review the updates and set the label manually.

DDvO · 2022-08-27T08:18:47Z

Very unfortunate that the malloc error reporting was not handled in a good way right from the beginning.
This lead to much extra work and redundancy on the callers' side over all the years since,
and even to partly misleading reasons being reported, which is now handled by #19072.

Also finding, discussing, and trying out solutions was a considerable amount of work.
Thanks also to @bernd-edlinger, @mspncp , and @mattcaswell who contributed to the discussion in #6251.

Fixes openssl#6251 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: David von Oheimb <david.von.oheimb@siemens.com> (Merged from openssl#14833)

DDvO added the approval: otc review pending label Apr 12, 2021

DDvO mentioned this pull request Apr 12, 2021

X509_NAME: Fix encoding of_NULL-DN in x509_name_canon() #14819

Closed

DDvO added this to the Post 3.0.0 milestone Apr 12, 2021

DDvO changed the title ~~ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE~~ WIP: ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Apr 12, 2021

paulidale reviewed Apr 12, 2021

View reviewed changes

crypto/mem.c Outdated Show resolved Hide resolved

DDvO force-pushed the malloc_err_reporting_fix_6251 branch from 40e0bcd to 1a8143c Compare April 12, 2021 10:16

DDvO mentioned this pull request Apr 12, 2021

Extreme number of memory allocation calls #14837

Closed

t8m added branch: master Applies to master branch triaged: feature The issue/pr requests/adds a feature labels Aug 6, 2021

DDvO mentioned this pull request Aug 7, 2022

X509_STORE_CTX_set_default(): improve error handling and reporting, also in its use #18918

Closed

ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE

4ffb7d2

Fixes openssl#6251

DDvO changed the title ~~WIP: ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE~~ ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Aug 15, 2022

DDvO force-pushed the malloc_err_reporting_fix_6251 branch from 1a8143c to 5802ad1 Compare August 15, 2022 15:16

fixup! ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE

25ab843

DDvO force-pushed the malloc_err_reporting_fix_6251 branch from 5802ad1 to 25ab843 Compare August 15, 2022 15:48

t8m added approval: review pending This pull request needs review by a committer approval: otc review pending labels Aug 23, 2022

t8m reviewed Aug 23, 2022

View reviewed changes

doc/man3/ERR_put_error.pod Outdated Show resolved Hide resolved

fixup! fixup! fixup! fixup! fixup! fixup! fixup! ERR: Make CRYPTO_mal…

6415854

…loc() and friends report ERR_R_MALLOC_FAILURE

t8m approved these changes Aug 23, 2022

View reviewed changes

t8m removed the approval: otc review pending label Aug 23, 2022

t8m requested a review from paulidale August 23, 2022 15:02

paulidale reviewed Aug 24, 2022

View reviewed changes

fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! ERR: Make CRY…

fefa6aa

…PTO_malloc() and friends report ERR_R_MALLOC_FAILURE

github-actions bot removed the severity: fips change The pull request changes FIPS provider sources label Aug 25, 2022

paulidale approved these changes Aug 26, 2022

View reviewed changes

paulidale requested a review from t8m August 26, 2022 03:40

DDvO mentioned this pull request Aug 26, 2022

ERR: Adapt ERR_raise() calls after making CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE #19072

Closed

t8m approved these changes Aug 26, 2022

View reviewed changes

t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Aug 26, 2022

DDvO closed this Aug 27, 2022

Uh oh!

Conversation

DDvO commented Apr 12, 2021

Uh oh!

t8m commented Apr 12, 2021

Uh oh!

Uh oh!

DDvO commented Apr 12, 2021

Uh oh!

DDvO commented Apr 12, 2021

Uh oh!

richsalz commented Apr 12, 2021

Uh oh!

DDvO commented Apr 12, 2021

Uh oh!

richsalz commented Apr 12, 2021

Uh oh!

davidben commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richsalz commented Apr 12, 2021

Uh oh!

DDvO commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDvO commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richsalz commented Apr 12, 2021

Uh oh!

DDvO commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richsalz commented Apr 12, 2021

Uh oh!

DDvO commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDvO commented Aug 15, 2022

Uh oh!

t8m commented Aug 23, 2022

Uh oh!

t8m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paulidale Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

DDvO Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

DDvO Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

t8m Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

DDvO Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richsalz Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DDvO Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

paulidale Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

DDvO Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulidale Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

levitte commented Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

davidben commented Apr 12, 2021 •

edited

Loading

DDvO commented Apr 12, 2021 •

edited

Loading

DDvO commented Apr 12, 2021 •

edited

Loading

DDvO commented Apr 12, 2021 •

edited

Loading

DDvO commented Apr 12, 2021 •

edited

Loading

DDvO Aug 24, 2022 •

edited

Loading

richsalz Aug 24, 2022 •

edited

Loading

DDvO Aug 25, 2022 •

edited

Loading

levitte commented Aug 25, 2022 •

edited

Loading

t8m commented Aug 26, 2022 •

edited

Loading