Skip to content

Speed improvements#96

Merged
sevagh merged 2 commits into
sevagh:masterfrom
fegies:speed_improvements
Dec 13, 2020
Merged

Speed improvements#96
sevagh merged 2 commits into
sevagh:masterfrom
fegies:speed_improvements

Conversation

@fegies

@fegies fegies commented Dec 11, 2020

Copy link
Copy Markdown
Contributor

This Pull requests aims to improve the processing speed by avoiding the construction of serde-value objects (they are implemented as B-Trees), leading to many allocations for every proto message.

For my test case (converting 200k fairly heavy protobuf messages) I was able to observe the following speed improvements:

Before:
$ time ./pq --msgtype simmo.TranshipmentCandidate --fdsetdir ./fdset --stream i32be < candidates.proto.ld > /dev/null
real 2m20.320s
user 3m7.052s
sys 0m20.936s

After:

$ time ./pq --msgtype simmo.TranshipmentCandidate --fdsetdir ./fdset --stream i32be < candidates.proto.ld > /dev/null
real 1m19.370s
user 1m14.457s
sys 0m4.568s

Here you can see flamegraphs clearly showing the allocations taking a lot of cpu time:
flame_before
And after:
flame_after

This commit avoids a lot of allocations for every protobuf message for
the construction of the BTreeMap in the serde-value structs.

Some tests had to be modified since the names of the
resulting objects keys are no longer ordered by name but by the fdset field
order.
@fegies

fegies commented Dec 11, 2020

Copy link
Copy Markdown
Contributor Author

Coincidentally, this also fixes #95 since the unbounded queue has been removed.

@sevagh

sevagh commented Dec 11, 2020

Copy link
Copy Markdown
Owner

This is excellent. I'll need some time to refamiliarize myself with this code, fetch your branch, compile it, run the tests, etc. But - thanks in advance for the contribution.

@fegies

fegies commented Dec 11, 2020

Copy link
Copy Markdown
Contributor Author

It might also be possible to save on the allocation per message by avoiding the Iterator interface and instead providing a function like
next_preallocated<'a>(&muf self, buf: &'a mut Vec) -> Option<&'a [u8]>,
on the ByteConsumer Trait.
and then changing the main loop from for... to while let ...

I am not sure about the performance improvements there though

EDIT:
real 1m14.688s
user 1m13.367s
sys 0m1.040s

reusing the message buffer seems to account for ~5s in my use case. (a further improvement of about 6%).

Since this would requires some changes in the interface to the stream-delimit subcrate, I'd open a new pullrequest for discussion if you are open to the idea.

@sevagh

sevagh commented Dec 11, 2020

Copy link
Copy Markdown
Owner

Since this would requires some changes in the interface to the stream-delimit subcrate, I'd open a new pullrequest for discussion if you are open to the idea.

I'm open.

@sevagh

sevagh commented Dec 12, 2020

Copy link
Copy Markdown
Owner

I have time to clone and test this today.

@sevagh

sevagh commented Dec 12, 2020

Copy link
Copy Markdown
Owner

Can you compare your code against this previous (merged) PR: #81

That PR added the call to "spawn" in the first place, and now I see you're removing that part.

@fegies

fegies commented Dec 13, 2020

Copy link
Copy Markdown
Contributor Author

Can you compare your code against this previous (merged) PR: #81

I am not sure what you mean by compare, as the first measurement (and flamegraph) is already done on master, which includes the threaded code

Before #81 was merged, each message was also deserialized to a temporary serde_value::Value object, suffering from the same allocations and double traversal it now does.

Regarding the removal of the threads, its still faster because:
The main reason for the speedup in this case is the fact that we can avoid constructing an itermediate value, and so only have to traverse each message once.
This means that we cant run decoding and encoding in separarate threads though.

In theory it might be possible to parallelize on a message level (transcoding multiple messages simultaneously)
I cant think of an easy way to do it without giving up ordering guarantees in the output though.

@sevagh

sevagh commented Dec 13, 2020

Copy link
Copy Markdown
Owner

Just wanted to make sure we're not walking back any desired behavior. Anyway it looks OK on my end, performance improved.

@sevagh sevagh merged commit ab25389 into sevagh:master Dec 13, 2020
@sevagh

sevagh commented Dec 13, 2020

Copy link
Copy Markdown
Owner

Thanks. I could either cut a release or wait for the next idea you had.

@fegies fegies mentioned this pull request Dec 13, 2020
@fegies fegies deleted the speed_improvements branch January 24, 2021 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants