Speed improvements by fegies · Pull Request #96 · sevagh/pq

fegies · 2020-12-11T12:30:25Z

This Pull requests aims to improve the processing speed by avoiding the construction of serde-value objects (they are implemented as B-Trees), leading to many allocations for every proto message.

For my test case (converting 200k fairly heavy protobuf messages) I was able to observe the following speed improvements:

Before:
$ time ./pq --msgtype simmo.TranshipmentCandidate --fdsetdir ./fdset --stream i32be < candidates.proto.ld > /dev/null
real 2m20.320s
user 3m7.052s
sys 0m20.936s

After:

$ time ./pq --msgtype simmo.TranshipmentCandidate --fdsetdir ./fdset --stream i32be < candidates.proto.ld > /dev/null
real 1m19.370s
user 1m14.457s
sys 0m4.568s

Here you can see flamegraphs clearly showing the allocations taking a lot of cpu time:

And after:

This commit avoids a lot of allocations for every protobuf message for the construction of the BTreeMap in the serde-value structs. Some tests had to be modified since the names of the resulting objects keys are no longer ordered by name but by the fdset field order.

fegies · 2020-12-11T12:34:19Z

Coincidentally, this also fixes #95 since the unbounded queue has been removed.

sevagh · 2020-12-11T14:32:23Z

This is excellent. I'll need some time to refamiliarize myself with this code, fetch your branch, compile it, run the tests, etc. But - thanks in advance for the contribution.

fegies · 2020-12-11T16:03:46Z

It might also be possible to save on the allocation per message by avoiding the Iterator interface and instead providing a function like
next_preallocated<'a>(&muf self, buf: &'a mut Vec) -> Option<&'a [u8]>,
on the ByteConsumer Trait.
and then changing the main loop from for... to while let ...

I am not sure about the performance improvements there though

EDIT:
real 1m14.688s
user 1m13.367s
sys 0m1.040s

reusing the message buffer seems to account for ~5s in my use case. (a further improvement of about 6%).

Since this would requires some changes in the interface to the stream-delimit subcrate, I'd open a new pullrequest for discussion if you are open to the idea.

sevagh · 2020-12-11T19:36:29Z

Since this would requires some changes in the interface to the stream-delimit subcrate, I'd open a new pullrequest for discussion if you are open to the idea.

I'm open.

sevagh · 2020-12-12T19:07:02Z

I have time to clone and test this today.

sevagh · 2020-12-12T19:10:22Z

Can you compare your code against this previous (merged) PR: #81

That PR added the call to "spawn" in the first place, and now I see you're removing that part.

fegies · 2020-12-13T19:54:33Z

Can you compare your code against this previous (merged) PR: #81

I am not sure what you mean by compare, as the first measurement (and flamegraph) is already done on master, which includes the threaded code

Before #81 was merged, each message was also deserialized to a temporary serde_value::Value object, suffering from the same allocations and double traversal it now does.

Regarding the removal of the threads, its still faster because:
The main reason for the speedup in this case is the fact that we can avoid constructing an itermediate value, and so only have to traverse each message once.
This means that we cant run decoding and encoding in separarate threads though.

In theory it might be possible to parallelize on a message level (transcoding multiple messages simultaneously)
I cant think of an easy way to do it without giving up ordering guarantees in the output though.

sevagh · 2020-12-13T21:21:53Z

Just wanted to make sure we're not walking back any desired behavior. Anyway it looks OK on my end, performance improved.

sevagh · 2020-12-13T21:22:33Z

Thanks. I could either cut a release or wait for the next idea you had.

fegies added 2 commits December 11, 2020 13:19

run cargo update

04712ca

sevagh merged commit ab25389 into sevagh:master Dec 13, 2020

fegies mentioned this pull request Dec 13, 2020

Writer Backpressure #95

Closed

fegies deleted the speed_improvements branch January 24, 2021 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed improvements#96

Speed improvements#96
sevagh merged 2 commits into
sevagh:masterfrom
fegies:speed_improvements

fegies commented Dec 11, 2020 •

edited

Loading

Uh oh!

fegies commented Dec 11, 2020

Uh oh!

sevagh commented Dec 11, 2020

Uh oh!

fegies commented Dec 11, 2020 •

edited

Loading

Uh oh!

sevagh commented Dec 11, 2020

Uh oh!

sevagh commented Dec 12, 2020

Uh oh!

sevagh commented Dec 12, 2020

Uh oh!

fegies commented Dec 13, 2020 •

edited

Loading

Uh oh!

sevagh commented Dec 13, 2020

Uh oh!

sevagh commented Dec 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fegies commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fegies commented Dec 11, 2020

Uh oh!

sevagh commented Dec 11, 2020

Uh oh!

fegies commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sevagh commented Dec 11, 2020

Uh oh!

sevagh commented Dec 12, 2020

Uh oh!

sevagh commented Dec 12, 2020

Uh oh!

fegies commented Dec 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sevagh commented Dec 13, 2020

Uh oh!

sevagh commented Dec 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fegies commented Dec 11, 2020 •

edited

Loading

fegies commented Dec 11, 2020 •

edited

Loading

fegies commented Dec 13, 2020 •

edited

Loading