Conversation
- The mock_socket has changed slightly. Previously, it would request a size to read, do the read, and then wait for another size for the next read, which would typically just be the EOF for that request. This made it harder for the fuzzer to reach useful states. Now you can do a single transmit with a single test action. - Simplified the write in Tls_eio slightly by using the new `Flow.write` operation. - Only log a send when we actually send some data. - Allow the fuzzer to perform all the actions needed for the Tls handshake in one go instead of requiring it to find them by searching. This provides it a way to get to the interesting cases quickly.
Previously, Tls would treat any exception from a read as a permanent error, causing all future operations to fail. However, it is reasonable to cancel a read and then try again later. Note that the Lwt version appears to have the same problem. Possibly we should only be catching `Eio.Io` exceptions in the first place, or maybe none at all. I also extended the fuzz tests to check for this case.
|
afl-fuzz finds that this can fail, because a TLS read operation may need to perform a write. So when the application cancels a TLS read, it may actually end up cancelled a write, which we can't recover from. |
| | exn -> | ||
| (match t.state with | ||
| | `Error _ | `Eof -> () | ||
| | `Active _ -> t.state <- `Error exn) ; | ||
| raise exn |
There was a problem hiding this comment.
Wouldn't it make sense to completely remove this bit instead? I believe this is trapping other exceptions in addition to Cancelled, such as 'Poisioned' and intermittently this https://github.com/ocaml/ocaml/blob/trunk/stdlib/effect.ml#L59
There was a problem hiding this comment.
I assume the idea is that the flow should stop working on any unexpected error. Otherwise, you might fail to read a packet, retry, and then hang (because the sender thinks you already have the data). Possibly we should call Printexc.get_raw_backtrace here and store that in t.state too. Then you'd know where the original error came from.
But note that the original exception is still raised, with the original backtrace, the first time. So the first error you get should include the correct location in all cases. It's just if you keep trying to use it after an error then future errors don't give the exact location.
There was a problem hiding this comment.
But note that the original exception is still raised, with the original backtrace, the first time. So the first error you get should include the correct location in all cases. It's just if you keep trying to use it after an error then future errors don't give the exact location.
Indeed, that's what I expected. However, I was getting other exceptions being reported - as mentioned in the comment above - as being the cause of error. As a user of tls-eio, that led me to unproductive debugging sessions since the actual error was quite something else. I have a feeling that the interaction of Eio.Fiber.any/first, tls_eio exception handling here and the like is giving unpredictable exceptions. I was only able to move forward with identifying the correct issue after I removed this catch all exception handling. Removing the catch all exception handling also don't seem to impact any observable tls-eio behaviours, i.e. the tests pass and End_of_file exception is correctly reported to users of tls-eio.
|
Closing, as we decided not to do this (#464 (comment)):
|
Previously, Tls would treat any exception from a read as a permanent error, causing all future operations to fail. However, it is reasonable to cancel a read and then try again later. I extended the fuzz tests to check for this case.
Note that the Lwt version appears to have the same problem. Possibly we should only be catching
Eio.Ioexceptions in the first place, or maybe none at all.The first commit cleans up the fuzz testing a bit (see the commit comment for details) and the second one adds support for cancellation.
One place I'm not sure about is when
drain_handshakeis called fromreneg. Is it OK to abort there? I don't know what the function does and it's already marked withXXX. In the normal case (whendrain_handshakeis called during flow creation) it's safe because the caller never gets theTls_eio.t.I'm not sure whether we really want to encourage users to cancel reads and resume later, but this should at least fix the confusing
Cancelled: Eio__core__Fiber.Not_firsterrors reported by @bikallem in #458.