[ALICE] Expose some shm API to allow shm message metadata to be passed through a side channel by aalkin · Pull Request #551 · FairRootGroup/FairMQ

aalkin · 2026-06-09T09:34:07Z

For a specific case where we want to have a cache of preallocated shared memory messages that is accessible by other devices through a time-based table, to avoid complicated synchronizations, we send a table with only the messages metadata. To access the objects in shared memory, referred to by this metadata, the target device needs to calculate the device-local pointer to the corresponding memory. This is achieved by exposing the shared memory manager API from the transport of the specific channel, that is used to send the metadata table. This approach allows us to centralize the cache management and re-use FairMQ-based shared memory management meaning the client code remains largely unchanged.

To summarize, shem message now exposes its metadata through a public method, that is then transferred to a target device, shmem transport exposes its shmem manager, that, in turn, exposes its API to get the local pointer from message handle and managed segment id for the target device. This is, of course, a draft, any suggestions as to how to handle this better (specifically, better aligned to the FairMQ architecture) are welcome.

Since "shmem/TransportFactory.h" needs to be included in the client code, the class is out-of-lined so that we do not need to expose other internal headers or link to ZeroMQ directly.

@ktf

ktf · 2026-06-09T09:35:58Z

@rbx does this make sense to you? Do you have any better suggestions on how to deal with this?

dennisklein · 2026-06-09T12:10:41Z

Can you elaborate (e.g. on a simplified pseudo topology) which device owns the msg and which will need which (read/write) access to it (also with regard to timing/parallel access)? I am trying to understand which features of the fmq memory mgmt you still need while you want to opt-out of the msg api. This is not clear to me yet.

aalkin · 2026-06-09T14:05:05Z

The objects that we want to cache have validity intervals in terms of timestamps in data. Those intervals can span many timeframes, or less than a timeframe, in both cases this means that the border between the two intervals, where we need to change the object, is not, in general, a timeframe border. And, of course, there are several consuming devices that have different processing rates to complicate things further. Our solution is to provide a centralized cache, that retrieves and stores corresponding objects based on the timestamp tables for the currently used timeframes, that are then transparently delivered to consuming devices through a table isomorphic with the timestamps table. Trying to deliver the objects through messages would be overly complicated, since there could be cases where the same message needs to be sent for several consecutive timeframes, or the opposite, several message for a single timeframe, and this is for a single such object. Instead, the objects are allocated in shared memory as messages in the source device, using the transport of the channels pointing to the consuming devices, but are not sent. What is sent are arrow tables, isomorphic to timestamps tables, with each row containing metadata for the corresponding unsent messages with the objects for the particular timestamp. This way the control is not passed to consuming devices and stays with the central cache, but devices can still access the content of the unsent messages - provided the pointer can be inferred from the metadata.

Specifically, we use the preconfigured channels, with their transport, to allocate the messages and then send their metadata in an unrelated message. The consumers, having access to the same channel, are able to use the contents of those unsent messages, while the cache is still managed by a single device. The consumers need read-only access and are not concerned with validity intervals or life-time of the objects, the cache will drop everything that belongs to a timeframe that is already reported as consumed and will not send a new metadata table until all of objects are ready.

I hope this clarifies the intent.

dennisklein · 2026-06-09T15:01:40Z

Thx for the explanation I think I got the constraints now. I don't object your proposal in this PR.

One alternative that still comes to mind is that you create a boost::interprocess::managed_shared_memory directly (skip fmq entirely for this cache) which would mean your side-channel metadata table needs to carry the segment handle additionally. I may overlook something, but currently I don't see yet the big advantage of using the fmq memory abstraction here.

ktf · 2026-06-09T15:32:44Z

I'd rather keep the fairmq abstraction, actually. I do not want to have a parallel transport which needs to be configured and so on.

aalkin · 2026-06-09T16:46:38Z

Indeed, this could also be achieved with manually managed shared memory, however going through the FairMQ API is easier simply because everything is already preconfigured in the workflow deployment, we can just re-use existing transport with minimal changes.

rbx · 2026-06-10T12:37:39Z

I think the use case is well-motivated and keeping the cache centralised while letting consumers resolve pointers from metadata is fine.

But there are some issues with this implementation:

inline on out-of-line methods. The header declares all factory methods inline but their definitions now live exclusively in TransportFactory.cxx. The inline keywords should be dropped.
The implementation unconditionally uses UserPtr(GetAddressFromHandle(...)), which is only valid for managed-segment messages. MetaHeader carries fManaged and fRegionId, so an unmanaged-region message would fail.
GetManager() exposes too much surface. Returning Manager* drags the entire Manager API - including all Boost.Interprocess internals - into public headers, which partially defeats the purpose of out-lining TransportFactory.h in the first place.

I propose an alternative that covers the same use case without GetManager() or the refactor:

shmem::Message::GetMeta() - same as yours; returns a copy of the MetaHeader.

Manager::GetDataAddressFromHandle(const MetaHeader&) - handles both managed segments and unmanaged regions.

shmem::GetDataAddressFromHandle(fair::mq::TransportFactory&, const MetaHeader&) - a free function declared in shmem/Common.h. Callers only need <fairmq/shmem/Common.h>, which is already transitively available via <fairmq/shmem/Message.h>, so there is no exposure of zmq.h or Manager.h internals in client code. TransportFactory also gains a same-named thin forwarding member for callers who already have the concrete type.

aalkin added 2 commits June 10, 2026 10:14

expose shm manager pointer reconstruction

c9bccc1

out-of-line shmem/TransportFactory.h

1ed0de2

aalkin force-pushed the v1.10.1-alice branch from 0ef953e to 1ed0de2 Compare June 10, 2026 08:17

aalkin marked this pull request as ready for review June 10, 2026 08:18

aalkin commented Jun 10, 2026

View reviewed changes

Comment thread fairmq/shmem/Manager.h Outdated

Redundant const

209af4b

aalkin mentioned this pull request Jun 10, 2026

feat(shmem): expose side-channel metadata API for unsent messages #555

Merged

aalkin closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ALICE] Expose some shm API to allow shm message metadata to be passed through a side channel#551

[ALICE] Expose some shm API to allow shm message metadata to be passed through a side channel#551
aalkin wants to merge 3 commits into
FairRootGroup:devfrom
aalkin:v1.10.1-alice

aalkin commented Jun 9, 2026

Uh oh!

ktf commented Jun 9, 2026

Uh oh!

dennisklein commented Jun 9, 2026 •

edited

Loading

Uh oh!

aalkin commented Jun 9, 2026

Uh oh!

dennisklein commented Jun 9, 2026

Uh oh!

ktf commented Jun 9, 2026

Uh oh!

aalkin commented Jun 9, 2026

Uh oh!

Uh oh!

rbx commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aalkin commented Jun 9, 2026

Uh oh!

ktf commented Jun 9, 2026

Uh oh!

dennisklein commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aalkin commented Jun 9, 2026

Uh oh!

dennisklein commented Jun 9, 2026

Uh oh!

ktf commented Jun 9, 2026

Uh oh!

aalkin commented Jun 9, 2026

Uh oh!

Uh oh!

rbx commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dennisklein commented Jun 9, 2026 •

edited

Loading