Skip to content

Ballista: Fix hacks around concurrency=2 to force hash-partitioned joins #20

@andygrove

Description

@andygrove

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

By default, DataFusion uses hash-partitioned joins if concurrency > 1 which led to me adding this hacky code in a couple of places in Ballista.

let config = ExecutionConfig::new().with_concurrency(2); // TODO: this is hack to enable partitioned joins
let mut ctx = ExecutionContext::with_config(config);

Describe the solution you'd like
I'm actually not sure what the solution should be, but I would like to be able to tell the context to use hash-partitioned joins, separately from specifying concurrency.

Describe alternatives you've considered
None

Additional context
This code is running in the scheduler, not in the executor where the query actually executes. The scheduler concurrency should not impact how the query is planned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions