Skip to content

Poor reported performance of DataFusion against DuckDB and Hyper #5942

@alamb

Description

@alamb

Describe the bug

There is a blog post that reports relatively poor performance of DataFusion compared to DuckDB and Hyper:

https://www.architecture-performance.fr/ap_blog/tpc-h-benchmark-of-hyper-duckdb-and-datafusion-on-parquet-files/

To Reproduce

I would like someone to try and reproduce the DataFusion performance reported in the blog and propose ways to improve the performance of DataFusion (perhaps by enabling some of the options that are off by default)

Expected behavior

No response

Additional context

@Dandandan suggests on slack that with parallelized scan of a parquet file this benchmark may go faster

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is neededperformanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions