Skip to content

ElemWise DataFrame operations by Series #1759

@simonkamronn

Description

@simonkamronn

It is not possible to perform trivial operations like standardization of data e.g.

ddf = (ddf - ddf.mean(axis=0))  /  ddf.std(axis=0)

It throws an error saying

ValueError: Not all divisions are known, can't align partitions. 
Please use `set_index` or `set_partition` to set the index.

The output of the reduction operations are Series with one row and same number of columns so it shouldn't need to know the divisions. However, would it make sense to set the divisions of the Series to be the start and end divisions of the DataFrame that was reduced?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions