Skip to content

Deprecate StreamReader#680

Merged
RCasatta merged 4 commits intorust-bitcoin:masterfrom
RCasatta:stream_reader
Jan 6, 2022
Merged

Deprecate StreamReader#680
RCasatta merged 4 commits intorust-bitcoin:masterfrom
RCasatta:stream_reader

Conversation

@RCasatta
Copy link
Copy Markdown
Collaborator

@RCasatta RCasatta commented Oct 20, 2021

StreamReader performance is extremely poor in case the object decoded is "big enough" for example a full Block.

In the common case, the buffer is 64k, so to successfully parse a 1MB block 16 decode attempts are made.
Even if a user increases the buffer size, read is not going to necessarily fill the buffer, as stated in the doc https://doc.rust-lang.org/stable/std/io/trait.Read.html#tymethod.read. In my tests, the reads are 64kB even with a 1MB buffer.

I think this is the root issue of the performance issue found in electrs in romanz/electrs#547 and they now have decided to decode the TCP stream with their own code in romanz/electrs@cd0531b and romanz/electrs@05e0221.

Using directly consensus_encode seems to make more sense (taking care of using BufRead if necessary) so the StreamReader is deprecated

Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what its worth, LGTM. Out of interest, did you verify that it fixes the electrs issue linked to?

I added a docs suggestion, feel free to use it or ignore it as you see fit.

Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you want to use it, I massaged the docs a bit while understanding them.

    /// Construct a stream reader for a given input `stream`. Use optional parameter `buffer_size`
    /// to control the read buffer size (default: 64K). Specify `read_until` bytes when expecting
    /// big objects coming from this reader. If not specified it's 0, trying to parse `Decodable`
    /// every received bytes. Using the default (0) when parsing a big object like a full block can
    /// result in poor performance due to the repeated parsing attempts on incomplete data.

@dpc
Copy link
Copy Markdown
Contributor

dpc commented Oct 21, 2021

I was going to say "looks like the proper solution would be to rewrite deserialization to use streaming (std::io::Read)". But then I looked at code at it looks like it is already written to use Read. https://docs.rs/bitcoin/0.27.1/bitcoin/consensus/encode/trait.Decodable.html#tymethod.consensus_decode

So why is StreamReader even using deserialize_partial instead of the Decodable::consensus_decode(&mut reader) straight inside it?

let rv = Decodable::consensus_decode(&mut decoder)?;

Seems to me that making StreamReader just a wrapper around R : Read, reading directly from the socket/whatever as it needs to (possibly wrapped with BufReader) would lead to exactly one time parsing and no overheads, and the whole buffering into one array before parsing is just a waste.

@RCasatta
Copy link
Copy Markdown
Collaborator Author

For what its worth, LGTM. Out of interest, did you verify that it fixes the electrs issue linked to?

No, I didn't test electrs, sorry. But I noticed the issue while working on blocks_iterator and I did a draft implementation there in 4dfca188f0ef7bd5d81f73664622244d22d2e2b5, the time to parse the testnet blockchain was reduced by 50%.

So why is StreamReader even using deserialize_partial instead of the Decodable::consensus_decode(&mut reader) straight inside it?

The reason for this implementation is that if there are more than one Decodable in the buffer, we know where to start for the following

Seems to me that making StreamReader just a wrapper around R : Read, reading directly from the socket/whatever as it needs to (possibly wrapped with BufReader) would lead to exactly one time parsing and no overheads, and the whole buffering into one array before parsing is just a waste.

Wrapping in BufReader is mandatory to achieve similar performance, and I think it may be risky to just add in the docs and let the caller wrap it (and for what I can see electrs is not using it after removing the StreamReader, just grepped it, not examined in depth).
Note that this PR is not more wasteful than what we have, but I'll try a version that wraps the BufReader internally and without using deserialize_partial

@RCasatta
Copy link
Copy Markdown
Collaborator Author

I did a branch https://github.com/rust-bitcoin/rust-bitcoin/compare/master...RCasatta:stream_reader_2?expand=1 using a BufReader and directly consensus_encode and basically make the StreamReader useless, so if we are going this way I think we should deprecate it and remove it in a following versions...

This other branch uses less memory in comparison to this PR if read_until is set to 1MB for example but at the moment it doesn't handle the WouldBlock error, but I don't know if that happens in real life (and if it happens and we want to add it to the StreamReader and make a reason to exist, to spinlock on that error)

I am not sure either if there are some errors that were handled by the previous implementation and they are not on stream_reader_2. @dr-orlovsky is the original author, maybe he can comment too

@dpc
Copy link
Copy Markdown
Contributor

dpc commented Oct 21, 2021

The reason for this implementation is that if there are more than one Decodable in the buffer, we know where to start for the following

I guess the expectation is that the consensus_decode(&mut reader) would leave reader pointing exactly at the next decodable data

but at the moment it doesn't handle the WouldBlock error

AFAIR WouldBlock only happens with non-blocking IO (fd-s). Dealing with non-blocking IO makes more sense with async code, so until then I think it's fine to not deal with it, and users that want to work with non-blocking IO can deal with it: by either reading to a larger buffer directly, or one could have a struct WaitAndRetryOnWouldBlock<R : Read> wrapper. But then again ... if one is going to block, why bother with non-blocking IO?

I did a branch https://github.com/rust-bitcoin/rust-bitcoin/compare/master...RCasatta:stream_reader_2?expand=1 using a BufReader

Did you have any chance to measure how it performs?

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Oct 22, 2021

Why not just replace R: Read with R: BufRead? One can't forget it and if a byte slice is already available no need to force buffering.

Also it looks like most of the time in electrs is spent hashing, so while these optimizations may be useful, I think it'd be better to focus on that first.

@RCasatta
Copy link
Copy Markdown
Collaborator Author

Why not just replace R: Read with R: BufRead? One can't forget it and if a byte slice is already available no need to force buffering.

Make sense, I didn't know there was a trait for that, very nice.

Also it looks like most of the time in electrs is spent hashing, so while these optimizations may be useful, I think it'd be better to focus on that first.

I bring electrs as one example, but other projects are wasting resources for nothing using the StreamReader like bdk

On a side note, on a separate project, I did a bench on hashing a few bytes and the hardware accelerated sha2 is not doing so good in comparison to bicoin_hashes

test bench::bench_bitcoin_hashes_sha          ... bench:         380 ns/iter (+/- 162)
test bench::bench_sha2_crate                  ... bench:         324 ns/iter (+/- 60)

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Oct 22, 2021

Thanks for that benchmark! So it seems it'd be better to avoid hashing instead of trying to optimize it. (Still would be nice to be able to use HW accelerated verion but not priority.)

@RCasatta
Copy link
Copy Markdown
Collaborator Author

RCasatta commented Oct 22, 2021

Did you have any chance to measure how it performs?

I added a bench with a 1MB block in 76f9fd2239919ba1c602511dd34d2eee61df6876

actual

test blockdata::block::benches::bench_stream_reader                     ... bench:  33,482,849 ns/iter (+/- 1,829,374)

with 9dae6fa8e7caf5bbd4eb21682fd078092c2faffb

test blockdata::block::benches::bench_stream_reader                     ... bench:   3,654,142 ns/iter (+/- 186,499)

@dr-orlovsky
Copy link
Copy Markdown
Collaborator

Sorry for a late response; these days I am focusing on completing Taproot support in rust-bitcoin and hadn't time for a careful review here. StreamReader was my first contribution into rust-bitcoin nearly three years ago, so it may be not be an optimal one. Nevertheless I recall that there were some reasons to create this struct, like the fact that consensus_deserialize was failing if not consuming the whole buffer, which was not working with the stream of TCP messages

romanz
romanz previously approved these changes Nov 13, 2021
Kixunil
Kixunil previously approved these changes Dec 2, 2021
Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9dae6fa8e7caf5bbd4eb21682fd078092c2faffb

@dr-orlovsky
Copy link
Copy Markdown
Collaborator

@RCasatta sorry I tried to review the PR and just do not get how it will deal with the situation if the TCP stream has more than one message. I see that the test case for that is removed, and it in fact was the main reason to introduce StreamReader (since otherwise we can't talk to remote Bitcoin Core nodes in practice)

@RCasatta
Copy link
Copy Markdown
Collaborator Author

@RCasatta sorry I tried to review the PR and just do not get how it will deal with the situation if the TCP stream has more than one message

by calling consensus_decode instead of deserialize_partial you don't fail if there is something more in the buffer.

I see that the test case for that is removed,

I removed that test because now you can't access the unparsed buffer, do you want me to re-add something like this:

 #[test]
    fn parse_multipartmsg_test() {
        let mut multi = MSG_ALERT.to_vec();
        multi.extend(&MSG_ALERT[..]);
        let mut reader = StreamReader::new(&multi[..], None);
        let message: Result<RawNetworkMessage, _> = reader.read_next();
        assert!(message.is_ok());
        check_alert_msg(&message.unwrap());
        let message: Result<RawNetworkMessage, _> = reader.read_next();
        assert!(message.is_ok());
        check_alert_msg(&message.unwrap());
    }

@dr-orlovsky
Copy link
Copy Markdown
Collaborator

Oh, I see. Yeah, that test would be great to have!

@RCasatta RCasatta dismissed stale reviews from Kixunil and romanz via 017d239 December 12, 2021 16:35
@RCasatta
Copy link
Copy Markdown
Collaborator Author

test parse_multipartmsg_test re-added

rebased

@apoelstra
Copy link
Copy Markdown
Member

Sorry for a late response; these days I am focusing on completing Taproot support in rust-bitcoin and hadn't time for a careful review here. StreamReader was my first contribution into rust-bitcoin nearly three years ago, so it may be not be an optimal one. Nevertheless I recall that there were some reasons to create this struct, like the fact that consensus_deserialize was failing if not consuming the whole buffer, which was not working with the stream of TCP messages

There is no such thing as consensus_deserialize. If you mean consensus_decode then it does not error when it doesn't consume the whole buffer. If you mean consensus::deserialize that is because it is a convenience method specifically for use with vectors, and if you have a general reader then you should just use consensus_decode.

I had no idea about this StreamReader stuff. I have never seen #229 or @231, which git blame attributes it to, and this is the first I'm hearing about it. Unless somebody has a clear reason why it exists, and why the existing deserialization API (which is generic over any Read) is insufficient, we should deprecate it.

Why not just replace R: Read with R: BufRead? One can't forget it and if a byte slice is already available no need to force buffering.

Why would we reduce flexibility in this way? We don't need BufRead AFAIK so we should not require it.

If people are having performance issues because they are using raw TcpSockets or whatever without wrapping it in a buffered reader, they should perhaps read the docs for TcpSocket which explicitly recommends its use.

StreamReader before this commit is trying to repeatedly parse big object like
blocks at every read, causing useless overhead.
consensus_encode deal with partial data by simply blocking.

After this changes it doesn't look what remain of the StreamReader is really giving
value, so it's deprecated
@RCasatta
Copy link
Copy Markdown
Collaborator Author

I changed BufRead to Read

@RCasatta RCasatta changed the title Improve performance in the StreamReader Deprecate StreamReader Jan 3, 2022
@RCasatta RCasatta changed the title Deprecate StreamReader Deprecate StreamReader Jan 3, 2022
/// Struct used to configure stream reader function
pub struct StreamReader<R: Read> {
/// Stream to read from
pub stream: R,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use BufReader to avoid performance issues in old code?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrew thinks (#680 (comment)) it's the caller's responsibility to wrap if needed.

I am not strongly for using it or not using it, but for sure the main issue this PR is solving and which is our responsibility is not attempting to decode 16 times the same 1MB block

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean StreamReader was originally intended to be buffered. So by not using BufReader internally we create (I think significant) performance regression for those who update library without changing the deprecated code. It might be less bad to just remove it if we don't use BufReader internally.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean StreamReader was originally intended to be buffered

ah that's true, let's see what @apoelstra thinks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if we deprecate/remove it we might as well also change it to use BufReader internally :P

apoelstra
apoelstra previously approved these changes Jan 3, 2022
Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5dfb93d

But if we want to change this to use a BufReader internally that's fine by me too. It would probably simulate the old behaviour more accurately.

@RCasatta
Copy link
Copy Markdown
Collaborator Author

RCasatta commented Jan 4, 2022

But if we want to change this to use a BufReader internally that's fine by me too. It would probably simulate the old behaviour more accurately.

I Added BufReader internally in 9189539

Kixunil
Kixunil previously approved these changes Jan 4, 2022
Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9189539

/// optional parameter `buffer_size` determining reading buffer size
pub fn new(stream: R, buffer_size: Option<usize>) -> StreamReader<R> {
/// Constructs new stream reader for a given input stream `stream`
#[deprecated(since="0.28.0", note="wrap you stream into a buffered reader if necessary and use consensus_encode directly")]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your, please leave this for followup PR but

/// Reads stream and parses next message from its current input,
/// also taking into account previously unparsed partial message (if there was such).
/// Reads stream and parses next message from its current input
#[deprecated(since="0.28.0", note="wrap you stream into a buffered reader if necessary and use consensus_encode directly")]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your

apoelstra
apoelstra previously approved these changes Jan 4, 2022
Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9189539

I don't mind re-acking if you want to fix the comment typo :)

@RCasatta RCasatta dismissed stale reviews from apoelstra and Kixunil via e860333 January 5, 2022 08:40
Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK e860333

@elichai
Copy link
Copy Markdown
Member

elichai commented Jan 5, 2022

When using consensus_decode Is it possible to continue parsing if you receive an unexpected EOF(this can happen with system interrupts) in the middle of parsing? or would you lose the data that was partially parsed?

@apoelstra
Copy link
Copy Markdown
Member

@elichai you lose the data that was partially parsed.

Is there an obvious way to improve on this? Does serde handle this somehow? I suppose the user could wrap their Read object in something that would intercept certain kinds of errors and do retries rather than bubbling them up?

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK e860333

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jan 5, 2022

@elichai system interrupts (I believe you meant signals) can never cause unexpected EOF.

I'd love to eventually support streaming parsing but that's a different topic.

@elichai
Copy link
Copy Markdown
Member

elichai commented Jan 6, 2022

@elichai system interrupts (I believe you meant signals) can never cause unexpected EOF.

I'd love to eventually support streaming parsing but that's a different topic.

You're right, I was thinking about EINTR(ErrorKind::Interrupted), which this doesn't currently solve.

I feel that I don't have enough knowledge about how to read from sockets robustly so ignore my comment :)

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jan 6, 2022

EINTR is usually handled by std (I believe BufReader does it too). EWOULDBLOCK is more serious but out of scope of this library, at least today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants