Skip to content

Fix getrandom(2)#344

Merged
talex5 merged 1 commit intoocaml-multicore:mainfrom
haesbaert:getrandom
Oct 26, 2022
Merged

Fix getrandom(2)#344
talex5 merged 1 commit intoocaml-multicore:mainfrom
haesbaert:getrandom

Conversation

@haesbaert
Copy link
Copy Markdown
Contributor

@haesbaert haesbaert commented Oct 11, 2022

Fix some mistakes and bad practices:

  • The Uring backend getrandom(2) was broken in commit 6f6b2ee, there is no
    guarantee getrandom(2) will fill the whole buffer in one go.
  • Make sure we either get all the requested bytes or fail hard. I've checked
    libuv and it already does the right thing (tm).
  • Fail as hard and as soon possible, don't let users think of catching an
    exception, don't propagate anything. If we can't get entropy, nothing should
    run. This is one of the only cases where a library should kill a program.
  • Don't truncate a buffer in Luv if it would overflow, fail hard (this could be
    improved to get in chunks).
  • While here, sanitize the includes.

I didn't know much about getrandom(2), and I'm surprised it was designed allowing short counts even though they based it off getentropy(2), which always does the right thing.

"""
The behavior when a call to getrandom() that is blocked while reading from the urandom source is interrupted by a signal handler depends on the initialization state of the entropy buffer and on the request size, buflen. If the entropy is not yet initialized, then the call fails with the EINTR error. If the entropy pool has been initialized and the request size is large (buflen > 256), the call either succeeds, returning a partially filled buffer, or fails with the error EINTR. If the entropy pool has been initialized and the quest size is small (buflen <= 256), then getrandom() will not fail with EINTR. Instead, it will return all of the bytes that have been requested.
"""

Copy link
Copy Markdown
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why the old behaviour was wrong. read_into is always allowed to do short reads; retrying is handled by e.g. read_exact.

I also disagree about killing the application. The application should be able to catch the exception and display it in a dialog box (or whatever), not be killed and lose the user's data.

To get this wrong, an application would have to use read instead of read_exact, and then explicitly ignore the result of the read. Though possibly we should rename read to read_upto or read_single.

@haesbaert
Copy link
Copy Markdown
Contributor Author

I don't see why the old behaviour was wrong. read_into is always allowed to do short reads; retrying is handled by e.g. read_exact.

Then the use should only be able to call read_exact and nothing else.
Exporting an API under the name of secure which can return a short-count on randomized a buffer is a security hazard. Users will shoot themselves in the foot because they won't always check the return and make sure the whole buffer is filled.
No one is ever interested in a partial random "secure" buffer.

I also disagree about killing the application. The application should be able to catch the exception and display it in a dialog box (or whatever), not be killed and lose the user's data.

Code will shift with time and errors will be lost, so in this case we should be super aggressive.

To get this wrong, an application would have to use read instead of read_exact, and then explicitly ignore the result of the read. Though possibly we should rename read to read_upto or read_single.

Yes, you can also call Eio_linux.Low_level.getrandom directly.

We can bet some money a not insignificant amount of people will do Flow.read secure_random buf |> ignore 😺 .

@haesbaert
Copy link
Copy Markdown
Contributor Author

Another note: It's also not a coincidence that libuv makes sure it can never return a short-count there

      int uv_random(uv_loop_t *loop, uv_random_t *req, void *buf, size_t  buflen,  unsigned
       int flags, uv_random_cb cb)
              Fill  buf  with  exactly buflen cryptographically strong random bytes acquired
              from the system CSPRNG. flags is reserved for future extension and  must  cur‐
              rently be 0.

              Short  reads  are  not possible. When less than buflen random bytes are avail‐
              able, a non-zero error value is returned or passed to the callback.

              The synchronous version may block indefinitely  when  not  enough  entropy  is
              available. The asynchronous version may not ever finish when the system is low

@talex5
Copy link
Copy Markdown
Collaborator

talex5 commented Oct 14, 2022

Summarising this:

  1. We provide a "secure-random" flow, which is a source of random bytes.
  2. Users should take as many bytes from it as they need.
  3. But, we don't trust users to be able to do this successfully.

We're worried that users will do something stupid, like trying to read from a flow by using Flow.read. Despite the name, this function is really exposing a low-level implementation detail of how flows work. The odoc-comment on Flow.read explains all this and points you at some alternative functions, but it's easy to believe someone would use it without reading that, and possible that they'd even explicitly ignore the result without reading it.

So, Flow.read is a very badly named and we should fix that. Any suggestions on a better name?

In addition to that, forcing the OS to fill the buffer probably won't have many bad side effects. The only ones I can think of are:

  1. You can't show a progress indicator showing how much random data has been read.
  2. If you are using Buf_read, it will wait until the entire buffer is filled, even if there was enough data already.
  3. It may confuse people about how flows are supposed to work.

But I think a modern OS will just fill the buffer immediately anyway, rather than giving a few bytes and waiting, so (1) and (2) likely don't matter in this case.

@haesbaert
Copy link
Copy Markdown
Contributor Author

haesbaert commented Oct 14, 2022

Summarising this:

  1. We provide a "secure-random" flow, which is a source of random bytes.
  2. Users should take as many bytes from it as they need.
  3. But, we don't trust users to be able to do this successfully.

Agreed.

We're worried that users will do something stupid, like trying to read from a flow by using Flow.read. Despite the name, this function is really exposing a low-level implementation detail of how flows work. The odoc-comment on Flow.read explains all this and points you at some alternative functions, but it's easy to believe someone would use it without reading that, and possible that they'd even explicitly ignore the result without reading it.

Agreed.

So, Flow.read is a very badly named and we should fix that. Any suggestions on a better name?

I was thinking that maybe Flow.single_read, not Flow.read_single, it already obscures it enough ?
Reality is most users will type Flow.r<wait for IDE completions>.

Also on non STREAM things, more often than not what you want is read/single_read, say a datagram socketpair (which I think you can't use recv(2) !?)

In addition to that, forcing the OS to fill the buffer probably won't have many bad side effects. The only ones I can think of are:

  1. You can't show a progress indicator showing how much random data has been read.
  2. If you are using Buf_read, it will wait until the entire buffer is filled, even if there was enough data already.
  3. It may confuse people about how flows are supposed to work.

But I think a modern OS will just fill the buffer immediately anyway, rather than giving a few bytes and waiting, so (1) and (2) likely don't matter in this case.

I agree, I would just add that (1) is just a matter of asking smaller chunks.

talex5 added a commit to talex5/eio that referenced this pull request Oct 20, 2022
This is a low-level function and it is easy to use it incorrectly by
ignoring the possibility of short reads. The name `Flow.read` made it
sound like a suitable default way to read from a flow.

See discussion in ocaml-multicore#344.
@talex5
Copy link
Copy Markdown
Collaborator

talex5 commented Oct 26, 2022

The rename of Flow.read is now merged, in #353.

The datagram problem is tracked in #342.

 - Prevent users from ever seeing a short-count in secure_random, luv already
   did this.
 - Don't truncate the luv buffer, make subsequent calls to fill the rest.

(I still think it should be guaranteed as lowest as possible (in the stub)), but
this was the compromise made in the Peace Accords of Whereby, October of 2022 AD.

--

I didn't know much about getrandom(2), and I'm surprised it was designed
allowing short counts even though they based it off getentropy(2), which always
does the right thing.

"""
The behavior when a call to getrandom() that is blocked while reading from the
urandom source is interrupted by a signal handler depends on the initialization
state of the entropy buffer and on the request size, buflen.  If the entropy is
not yet initialized, then the call fails with the EINTR error.  If the entropy
pool has been initialized and the request size is large (buflen > 256), the call
either succeeds, returning a partially filled buffer, or fails with the error
EINTR.  If the entropy pool has been initialized and the quest size is small
(buflen <= 256), then getrandom() will not fail with EINTR.  Instead, it will
return all of the bytes that have been requested.re
"""
@haesbaert
Copy link
Copy Markdown
Contributor Author

Rebased and force-pushed, no changes.

@talex5 talex5 merged commit 849391e into ocaml-multicore:main Oct 26, 2022
talex5 added a commit to talex5/opam-repository that referenced this pull request Dec 7, 2022
CHANGES:

API changes:

- Unify IO errors as `Eio.Io` (@talex5 ocaml-multicore/eio#378).
  This makes it easy to catch and log all IO errors if desired.
  The exception payload gives the type and can be used for matching specific errors.
  It also allows attaching extra information to exceptions, and various functions were updated to do this.

- Add `Time.Mono` for monotonic clocks (@bikallem @talex5 ocaml-multicore/eio#338).
  Using the system clock for timeouts, etc can fail if the system time is changed during the wait.

- Allow datagram sockets to be created without a source address (@bikallem @haesbaert ocaml-multicore/eio#360).
  The kernel will allocate an address in this case.
  You can also now control the `reuse_addr` and `reuse_port` options.

- Add `File.stat` and improve `Path.load` (@haesbaert @talex5 ocaml-multicore/eio#339).
  `Path.load` now uses the file size as the initial buffer size.

- Add `Eio_unix.pipe` (@patricoferris ocaml-multicore/eio#350).
  This replaces `Eio_linux.pipe`.

- Avoid short reads from `getrandom(2)` (@haesbaert ocaml-multicore/eio#344).
  Guards against buggy user code that might not handle this correctly.

- Rename `Flow.read` to `Flow.single_read` (@talex5 ocaml-multicore/eio#353).
  This is a low-level function and it is easy to use it incorrectly by ignoring the possibility of short reads.

Bug fixes:

- Eio_luv: Fix non-tail-recursive continue (@talex5 ocaml-multicore/eio#378).
  Affects the `Socket_of_fd` and `Socketpair` effects.

- Eio_linux: UDP sockets were not created close-on-exec (@talex5 ocaml-multicore/eio#360).

- Eio_linux: work around io_uring non-blocking bug (@haesbaert ocaml-multicore/eio#327 ocaml-multicore/eio#355).
  The proper fix should be in Linux 6.1.

- `Eio_mock.Backend`: preserve backtraces from `main` (@talex5 ocaml-multicore/eio#349).

- Don't lose backtrace in `Switch.run_internal` (@talex5 ocaml-multicore/eio#369).

Documentation:

- Use a proper HTTP response in the README example (@talex5 ocaml-multicore/eio#377).

- Document that read_dir excludes "." and ".." (@talex5 ocaml-multicore/eio#379).

- Warn about both operations succeeding in `Fiber.first` (@talex5 ocaml-multicore/eio#358, reported by @iitalics).

- Update README for OCaml 5.0.0~beta2 (@talex5 ocaml-multicore/eio#375).

Backend-specific changes:

- Eio_luv: add low-level process support (@patricoferris ocaml-multicore/eio#359).
  A future release will add Eio_linux support and a cross-platform API for this.

- Expose `Eio_luv.Low_level.Stream.write` (@patricoferris ocaml-multicore/eio#359).

- Expose `Eio_luv.Low_level.get_loop` (@talex5 ocaml-multicore/eio#371).
  This is needed if you want to create resources directly and then use them with Eio_luv.

- `Eio_linux.Low_level.openfile` is gone (@talex5 ocaml-multicore/eio#378).
  It was just left-over test code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants