Skip to content

Use shallow copies when assigning columns #5739

@mrocklin

Description

@mrocklin

Currently when we run code like df["z"] = df.x + df.y we call df = df.assign(z=df.x + df.y). However, the latter snippet calls the underlying assign method, which calls df.copy() which can be slow. Instead, it might be nice to do soemthing like the following:

def set_column(df, name, value):
    df = df.copy(deep=False)
    df[name] = value
    return df

And then replace the calls to M.assign with this.

This is particularly important for cudf, but probably helps pandas performance a bit as well.

We might want to do this for all assign calls, rather than just setitem? There are some semantic differences in pandas apparently with regards to copying, but because we don't mutate my guess is that we should do this everywhere. @TomAugspurger ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions