Fix getrandom(2) by haesbaert · Pull Request #344 · ocaml-multicore/eio

haesbaert · 2022-10-11T22:31:53Z

Fix some mistakes and bad practices:

The Uring backend getrandom(2) was broken in commit 6f6b2ee, there is no
guarantee getrandom(2) will fill the whole buffer in one go.
Make sure we either get all the requested bytes or fail hard. I've checked
libuv and it already does the right thing (tm).
Fail as hard and as soon possible, don't let users think of catching an
exception, don't propagate anything. If we can't get entropy, nothing should
run. This is one of the only cases where a library should kill a program.
Don't truncate a buffer in Luv if it would overflow, fail hard (this could be
improved to get in chunks).
While here, sanitize the includes.

I didn't know much about getrandom(2), and I'm surprised it was designed allowing short counts even though they based it off getentropy(2), which always does the right thing.

"""
The behavior when a call to getrandom() that is blocked while reading from the urandom source is interrupted by a signal handler depends on the initialization state of the entropy buffer and on the request size, buflen. If the entropy is not yet initialized, then the call fails with the EINTR error. If the entropy pool has been initialized and the request size is large (buflen > 256), the call either succeeds, returning a partially filled buffer, or fails with the error EINTR. If the entropy pool has been initialized and the quest size is small (buflen <= 256), then getrandom() will not fail with EINTR. Instead, it will return all of the bytes that have been requested.
"""

talex5

I don't see why the old behaviour was wrong. read_into is always allowed to do short reads; retrying is handled by e.g. read_exact.

I also disagree about killing the application. The application should be able to catch the exception and display it in a dialog box (or whatever), not be killed and lose the user's data.

To get this wrong, an application would have to use read instead of read_exact, and then explicitly ignore the result of the read. Though possibly we should rename read to read_upto or read_single.

haesbaert · 2022-10-12T10:15:33Z

I don't see why the old behaviour was wrong. read_into is always allowed to do short reads; retrying is handled by e.g. read_exact.

Then the use should only be able to call read_exact and nothing else.
Exporting an API under the name of secure which can return a short-count on randomized a buffer is a security hazard. Users will shoot themselves in the foot because they won't always check the return and make sure the whole buffer is filled.
No one is ever interested in a partial random "secure" buffer.

I also disagree about killing the application. The application should be able to catch the exception and display it in a dialog box (or whatever), not be killed and lose the user's data.

Code will shift with time and errors will be lost, so in this case we should be super aggressive.

To get this wrong, an application would have to use read instead of read_exact, and then explicitly ignore the result of the read. Though possibly we should rename read to read_upto or read_single.

Yes, you can also call Eio_linux.Low_level.getrandom directly.

We can bet some money a not insignificant amount of people will do Flow.read secure_random buf |> ignore 😺 .

haesbaert · 2022-10-12T10:20:21Z

Another note: It's also not a coincidence that libuv makes sure it can never return a short-count there

      int uv_random(uv_loop_t *loop, uv_random_t *req, void *buf, size_t  buflen,  unsigned
       int flags, uv_random_cb cb)
              Fill  buf  with  exactly buflen cryptographically strong random bytes acquired
              from the system CSPRNG. flags is reserved for future extension and  must  cur‐
              rently be 0.

              Short  reads  are  not possible. When less than buflen random bytes are avail‐
              able, a non-zero error value is returned or passed to the callback.

              The synchronous version may block indefinitely  when  not  enough  entropy  is
              available. The asynchronous version may not ever finish when the system is low

talex5 · 2022-10-14T15:04:00Z

Summarising this:

We provide a "secure-random" flow, which is a source of random bytes.
Users should take as many bytes from it as they need.
But, we don't trust users to be able to do this successfully.

We're worried that users will do something stupid, like trying to read from a flow by using Flow.read. Despite the name, this function is really exposing a low-level implementation detail of how flows work. The odoc-comment on Flow.read explains all this and points you at some alternative functions, but it's easy to believe someone would use it without reading that, and possible that they'd even explicitly ignore the result without reading it.

So, Flow.read is a very badly named and we should fix that. Any suggestions on a better name?

In addition to that, forcing the OS to fill the buffer probably won't have many bad side effects. The only ones I can think of are:

You can't show a progress indicator showing how much random data has been read.
If you are using Buf_read, it will wait until the entire buffer is filled, even if there was enough data already.
It may confuse people about how flows are supposed to work.

But I think a modern OS will just fill the buffer immediately anyway, rather than giving a few bytes and waiting, so (1) and (2) likely don't matter in this case.

haesbaert · 2022-10-14T15:21:00Z

Summarising this:

We provide a "secure-random" flow, which is a source of random bytes.

Users should take as many bytes from it as they need.

But, we don't trust users to be able to do this successfully.

Agreed.

We're worried that users will do something stupid, like trying to read from a flow by using Flow.read. Despite the name, this function is really exposing a low-level implementation detail of how flows work. The odoc-comment on Flow.read explains all this and points you at some alternative functions, but it's easy to believe someone would use it without reading that, and possible that they'd even explicitly ignore the result without reading it.

Agreed.

So, Flow.read is a very badly named and we should fix that. Any suggestions on a better name?

I was thinking that maybe Flow.single_read, not Flow.read_single, it already obscures it enough ?
Reality is most users will type Flow.r<wait for IDE completions>.

Also on non STREAM things, more often than not what you want is read/single_read, say a datagram socketpair (which I think you can't use recv(2) !?)

In addition to that, forcing the OS to fill the buffer probably won't have many bad side effects. The only ones I can think of are:

You can't show a progress indicator showing how much random data has been read.

If you are using Buf_read, it will wait until the entire buffer is filled, even if there was enough data already.

It may confuse people about how flows are supposed to work.

But I think a modern OS will just fill the buffer immediately anyway, rather than giving a few bytes and waiting, so (1) and (2) likely don't matter in this case.

I agree, I would just add that (1) is just a matter of asking smaller chunks.

This is a low-level function and it is easy to use it incorrectly by ignoring the possibility of short reads. The name `Flow.read` made it sound like a suitable default way to read from a flow. See discussion in ocaml-multicore#344.

talex5 · 2022-10-26T10:16:55Z

The rename of Flow.read is now merged, in #353.

The datagram problem is tracked in #342.

- Prevent users from ever seeing a short-count in secure_random, luv already did this. - Don't truncate the luv buffer, make subsequent calls to fill the rest. (I still think it should be guaranteed as lowest as possible (in the stub)), but this was the compromise made in the Peace Accords of Whereby, October of 2022 AD. -- I didn't know much about getrandom(2), and I'm surprised it was designed allowing short counts even though they based it off getentropy(2), which always does the right thing. """ The behavior when a call to getrandom() that is blocked while reading from the urandom source is interrupted by a signal handler depends on the initialization state of the entropy buffer and on the request size, buflen. If the entropy is not yet initialized, then the call fails with the EINTR error. If the entropy pool has been initialized and the request size is large (buflen > 256), the call either succeeds, returning a partially filled buffer, or fails with the error EINTR. If the entropy pool has been initialized and the quest size is small (buflen <= 256), then getrandom() will not fail with EINTR. Instead, it will return all of the bytes that have been requested.re """

haesbaert · 2022-10-26T10:24:18Z

Rebased and force-pushed, no changes.

@talex5

CHANGES: API changes: - Unify IO errors as `Eio.Io` (@talex5 ocaml-multicore/eio#378). This makes it easy to catch and log all IO errors if desired. The exception payload gives the type and can be used for matching specific errors. It also allows attaching extra information to exceptions, and various functions were updated to do this. - Add `Time.Mono` for monotonic clocks (@bikallem @talex5 ocaml-multicore/eio#338). Using the system clock for timeouts, etc can fail if the system time is changed during the wait. - Allow datagram sockets to be created without a source address (@bikallem @haesbaert ocaml-multicore/eio#360). The kernel will allocate an address in this case. You can also now control the `reuse_addr` and `reuse_port` options. - Add `File.stat` and improve `Path.load` (@haesbaert @talex5 ocaml-multicore/eio#339). `Path.load` now uses the file size as the initial buffer size. - Add `Eio_unix.pipe` (@patricoferris ocaml-multicore/eio#350). This replaces `Eio_linux.pipe`. - Avoid short reads from `getrandom(2)` (@haesbaert ocaml-multicore/eio#344). Guards against buggy user code that might not handle this correctly. - Rename `Flow.read` to `Flow.single_read` (@talex5 ocaml-multicore/eio#353). This is a low-level function and it is easy to use it incorrectly by ignoring the possibility of short reads. Bug fixes: - Eio_luv: Fix non-tail-recursive continue (@talex5 ocaml-multicore/eio#378). Affects the `Socket_of_fd` and `Socketpair` effects. - Eio_linux: UDP sockets were not created close-on-exec (@talex5 ocaml-multicore/eio#360). - Eio_linux: work around io_uring non-blocking bug (@haesbaert ocaml-multicore/eio#327 ocaml-multicore/eio#355). The proper fix should be in Linux 6.1. - `Eio_mock.Backend`: preserve backtraces from `main` (@talex5 ocaml-multicore/eio#349). - Don't lose backtrace in `Switch.run_internal` (@talex5 ocaml-multicore/eio#369). Documentation: - Use a proper HTTP response in the README example (@talex5 ocaml-multicore/eio#377). - Document that read_dir excludes "." and ".." (@talex5 ocaml-multicore/eio#379). - Warn about both operations succeeding in `Fiber.first` (@talex5 ocaml-multicore/eio#358, reported by @iitalics). - Update README for OCaml 5.0.0~beta2 (@talex5 ocaml-multicore/eio#375). Backend-specific changes: - Eio_luv: add low-level process support (@patricoferris ocaml-multicore/eio#359). A future release will add Eio_linux support and a cross-platform API for this. - Expose `Eio_luv.Low_level.Stream.write` (@patricoferris ocaml-multicore/eio#359). - Expose `Eio_luv.Low_level.get_loop` (@talex5 ocaml-multicore/eio#371). This is needed if you want to create resources directly and then use them with Eio_luv. - `Eio_linux.Low_level.openfile` is gone (@talex5 ocaml-multicore/eio#378). It was just left-over test code.

haesbaert force-pushed the getrandom branch from 31356b4 to 5b7d265 Compare October 11, 2022 22:48

talex5 reviewed Oct 12, 2022

View reviewed changes

haesbaert force-pushed the getrandom branch from 5b7d265 to 80b23ab Compare October 13, 2022 00:37

talex5 mentioned this pull request Oct 20, 2022

Rename Flow.read to Flow.single_read #353

Merged

hannesm mentioned this pull request Oct 25, 2022

Add eio backend for ocaml-dns client mirage/ocaml-dns#312

Closed

5 tasks

haesbaert force-pushed the getrandom branch from 80b23ab to 2601e44 Compare October 26, 2022 10:23

talex5 approved these changes Oct 26, 2022

View reviewed changes

talex5 merged commit 849391e into ocaml-multicore:main Oct 26, 2022

talex5 mentioned this pull request Dec 7, 2022

[new release] eio_main, eio_luv, eio_linux and eio (0.7) ocaml/opam-repository#22619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix getrandom(2)#344

Fix getrandom(2)#344
talex5 merged 1 commit intoocaml-multicore:mainfrom
haesbaert:getrandom

haesbaert commented Oct 11, 2022 •

edited

Loading

Uh oh!

talex5 left a comment

Uh oh!

haesbaert commented Oct 12, 2022

Uh oh!

haesbaert commented Oct 12, 2022

Uh oh!

talex5 commented Oct 14, 2022

Uh oh!

haesbaert commented Oct 14, 2022 •

edited

Loading

Uh oh!

talex5 commented Oct 26, 2022 •

edited

Loading

Uh oh!

haesbaert commented Oct 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haesbaert commented Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

talex5 left a comment

Choose a reason for hiding this comment

Uh oh!

haesbaert commented Oct 12, 2022

Uh oh!

haesbaert commented Oct 12, 2022

Uh oh!

talex5 commented Oct 14, 2022

Uh oh!

haesbaert commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

talex5 commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haesbaert commented Oct 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haesbaert commented Oct 11, 2022 •

edited

Loading

haesbaert commented Oct 14, 2022 •

edited

Loading

talex5 commented Oct 26, 2022 •

edited

Loading