Skip to content

[C++] Overhead of std::shared_ptr<DataType> copies is causing thread contention #31567

@asfimport

Description

@asfimport

We created a benchmark to measure ExecuteScalarExpression performance in ARROW-16014. We noticed significant thread contention (even though there shouldn't be much, if any, for this task) As part of ARROW-16138 we have been investigating possible causes.

One cause seems to be contention from copying shared_ptr objects.

Two possible solutions jump to mind and I'm sure there are many more.

ExecBatch is an internal type and used inside of ExecuteScalarExpression as well as inside of the execution engine. In the former we can safely assume the data types will exist for the duration of the call. In the latter we can safely assume the data types will exist for the duration of the execution plan. Thus we can probably take a more targetted fix and migrate only ExecBatch to using DataType* (or const DataType&).

On the other hand, we might consider a more global approach. All of our "stock" data types are assumed to have static storage duration. However, we must use std::shared_ptr because users could create their own extension types. We could invent an "extension type registration" system where extension types must first be registered with the C++ lib before being used. Then we could have long-lived DataType instances and we could replace std::shared_ptr with DataType* (or const DataType&) throughout most of the entire code base.

But, as I mentioned, I'm sure there are many approaches to take. CC @lidavidm and @pitrou and @cyb70289 for thoughts but this might be interesting for just about any C++ dev.

Reporter: Weston Pace / @westonpace

Related issues:

Original Issue Attachments:

Note: This issue was originally created as ARROW-16161. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions