Conversation
tcharding
left a comment
There was a problem hiding this comment.
For what its worth, LGTM. Out of interest, did you verify that it fixes the electrs issue linked to?
I added a docs suggestion, feel free to use it or ignore it as you see fit.
Thanks.
src/network/stream_reader.rs
Outdated
There was a problem hiding this comment.
In case you want to use it, I massaged the docs a bit while understanding them.
/// Construct a stream reader for a given input `stream`. Use optional parameter `buffer_size`
/// to control the read buffer size (default: 64K). Specify `read_until` bytes when expecting
/// big objects coming from this reader. If not specified it's 0, trying to parse `Decodable`
/// every received bytes. Using the default (0) when parsing a big object like a full block can
/// result in poor performance due to the repeated parsing attempts on incomplete data.
|
I was going to say "looks like the proper solution would be to rewrite deserialization to use streaming ( So why is rust-bitcoin/src/consensus/encode.rs Line 172 in a961ab4 Seems to me that making |
No, I didn't test electrs, sorry. But I noticed the issue while working on blocks_iterator and I did a draft implementation there in 4dfca188f0ef7bd5d81f73664622244d22d2e2b5, the time to parse the testnet blockchain was reduced by 50%.
The reason for this implementation is that if there are more than one Decodable in the buffer, we know where to start for the following
Wrapping in |
|
I did a branch https://github.com/rust-bitcoin/rust-bitcoin/compare/master...RCasatta:stream_reader_2?expand=1 using a This other branch uses less memory in comparison to this PR if I am not sure either if there are some errors that were handled by the previous implementation and they are not on stream_reader_2. @dr-orlovsky is the original author, maybe he can comment too |
I guess the expectation is that the
AFAIR
Did you have any chance to measure how it performs? |
|
Why not just replace Also it looks like most of the time in |
Make sense, I didn't know there was a trait for that, very nice.
I bring electrs as one example, but other projects are wasting resources for nothing using the StreamReader like bdk On a side note, on a separate project, I did a bench on hashing a few bytes and the hardware accelerated sha2 is not doing so good in comparison to bicoin_hashes |
|
Thanks for that benchmark! So it seems it'd be better to avoid hashing instead of trying to optimize it. (Still would be nice to be able to use HW accelerated verion but not priority.) |
4e74a90 to
1437bb1
Compare
I added a bench with a 1MB block in 76f9fd2239919ba1c602511dd34d2eee61df6876 actual with 9dae6fa8e7caf5bbd4eb21682fd078092c2faffb |
1437bb1 to
9dae6fa
Compare
|
Sorry for a late response; these days I am focusing on completing Taproot support in rust-bitcoin and hadn't time for a careful review here. |
Kixunil
left a comment
There was a problem hiding this comment.
ACK 9dae6fa8e7caf5bbd4eb21682fd078092c2faffb
|
@RCasatta sorry I tried to review the PR and just do not get how it will deal with the situation if the TCP stream has more than one message. I see that the test case for that is removed, and it in fact was the main reason to introduce |
by calling
I removed that test because now you can't access the unparsed buffer, do you want me to re-add something like this: |
|
Oh, I see. Yeah, that test would be great to have! |
9dae6fa to
017d239
Compare
|
test rebased |
There is no such thing as I had no idea about this
Why would we reduce flexibility in this way? We don't need If people are having performance issues because they are using raw |
StreamReader before this commit is trying to repeatedly parse big object like blocks at every read, causing useless overhead. consensus_encode deal with partial data by simply blocking. After this changes it doesn't look what remain of the StreamReader is really giving value, so it's deprecated
017d239 to
5dfb93d
Compare
|
I changed |
src/network/stream_reader.rs
Outdated
| /// Struct used to configure stream reader function | ||
| pub struct StreamReader<R: Read> { | ||
| /// Stream to read from | ||
| pub stream: R, |
There was a problem hiding this comment.
Shouldn't we use BufReader to avoid performance issues in old code?
There was a problem hiding this comment.
Andrew thinks (#680 (comment)) it's the caller's responsibility to wrap if needed.
I am not strongly for using it or not using it, but for sure the main issue this PR is solving and which is our responsibility is not attempting to decode 16 times the same 1MB block
There was a problem hiding this comment.
I mean StreamReader was originally intended to be buffered. So by not using BufReader internally we create (I think significant) performance regression for those who update library without changing the deprecated code. It might be less bad to just remove it if we don't use BufReader internally.
There was a problem hiding this comment.
I mean StreamReader was originally intended to be buffered
ah that's true, let's see what @apoelstra thinks
There was a problem hiding this comment.
I mean, if we deprecate/remove it we might as well also change it to use BufReader internally :P
…ion on existing callers
I Added |
src/network/stream_reader.rs
Outdated
| /// optional parameter `buffer_size` determining reading buffer size | ||
| pub fn new(stream: R, buffer_size: Option<usize>) -> StreamReader<R> { | ||
| /// Constructs new stream reader for a given input stream `stream` | ||
| #[deprecated(since="0.28.0", note="wrap you stream into a buffered reader if necessary and use consensus_encode directly")] |
There was a problem hiding this comment.
your, please leave this for followup PR but
src/network/stream_reader.rs
Outdated
| /// Reads stream and parses next message from its current input, | ||
| /// also taking into account previously unparsed partial message (if there was such). | ||
| /// Reads stream and parses next message from its current input | ||
| #[deprecated(since="0.28.0", note="wrap you stream into a buffered reader if necessary and use consensus_encode directly")] |
|
When using |
|
@elichai you lose the data that was partially parsed. Is there an obvious way to improve on this? Does serde handle this somehow? I suppose the user could wrap their |
|
@elichai system interrupts (I believe you meant signals) can never cause unexpected EOF. I'd love to eventually support streaming parsing but that's a different topic. |
You're right, I was thinking about EINTR( I feel that I don't have enough knowledge about how to read from sockets robustly so ignore my comment :) |
|
EINTR is usually handled by |
StreamReaderperformance is extremely poor in case the object decoded is "big enough" for example a full Block.In the common case, the buffer is 64k, so to successfully parse a 1MB block 16 decode attempts are made.
Even if a user increases the buffer size,
readis not going to necessarily fill the buffer, as stated in the doc https://doc.rust-lang.org/stable/std/io/trait.Read.html#tymethod.read. In my tests, the reads are 64kB even with a 1MB buffer.I think this is the root issue of the performance issue found in electrs in romanz/electrs#547 and they now have decided to decode the TCP stream with their own code in romanz/electrs@cd0531b and romanz/electrs@05e0221.
Using directly
consensus_encodeseems to make more sense (taking care of usingBufReadif necessary) so theStreamReaderis deprecated