Performance enhancements for rosbag filter

Currently, `rosbag filter` deserializes every message in a bag, in a single threaded process, with a fresh `eval` each time. This is all enormously expensive and slow:

https://github.com/ros/ros_comm/blob/29053c4832229efa7160fb944c05e3bc82e11540/tools/rosbag/src/rosbag/rosbag_main.py#L374-L383

Some possibilities for how to improve this situation:

- Lazy deserialization: Instead of deserializing every time, pass a proxy object which uses __getattr__ to deserialize on demand and pass through accesses. This would prevent unnecessary deserialization expense for the use-cases which only filter on topic name or timestamp.
- Wrap the `eval` in a lambda, so that it needs to be compiled just once rather than on each use: https://stackoverflow.com/a/12467755/109517
- Use multiprocessing to parallelize the eval invocations—would have to check how much of a gain there is to be had here, particularly with the added cost of managing the work queue to maintain sequence.

	for topic, raw_msg, t, conn_header in inbag.read_messages(raw=True, return_connection_header=True):
	msg_type, serialized_bytes, md5sum, pos, pytype = raw_msg
	msg = pytype()
	msg.deserialize(serialized_bytes)

	if filter_fn(topic, msg, t):
	outbag.write(topic, msg, t, connection_header=conn_header)

	total_bytes += len(serialized_bytes)
	meter.step(total_bytes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance enhancements for rosbag filter #1711

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance enhancements for rosbag filter #1711

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions