Skip to content
This repository was archived by the owner on May 31, 2025. It is now read-only.
This repository was archived by the owner on May 31, 2025. It is now read-only.

Performance enhancements for rosbag filter #1711

@mikepurvis

Description

@mikepurvis

Currently, rosbag filter deserializes every message in a bag, in a single threaded process, with a fresh eval each time. This is all enormously expensive and slow:

for topic, raw_msg, t, conn_header in inbag.read_messages(raw=True, return_connection_header=True):
msg_type, serialized_bytes, md5sum, pos, pytype = raw_msg
msg = pytype()
msg.deserialize(serialized_bytes)
if filter_fn(topic, msg, t):
outbag.write(topic, msg, t, connection_header=conn_header)
total_bytes += len(serialized_bytes)
meter.step(total_bytes)

Some possibilities for how to improve this situation:

  • Lazy deserialization: Instead of deserializing every time, pass a proxy object which uses getattr to deserialize on demand and pass through accesses. This would prevent unnecessary deserialization expense for the use-cases which only filter on topic name or timestamp.
  • Wrap the eval in a lambda, so that it needs to be compiled just once rather than on each use: https://stackoverflow.com/a/12467755/109517
  • Use multiprocessing to parallelize the eval invocations—would have to check how much of a gain there is to be had here, particularly with the added cost of managing the work queue to maintain sequence.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions