-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Currently when we run code like df["z"] = df.x + df.y we call df = df.assign(z=df.x + df.y). However, the latter snippet calls the underlying assign method, which calls df.copy() which can be slow. Instead, it might be nice to do soemthing like the following:
def set_column(df, name, value):
df = df.copy(deep=False)
df[name] = value
return dfAnd then replace the calls to M.assign with this.
This is particularly important for cudf, but probably helps pandas performance a bit as well.
We might want to do this for all assign calls, rather than just setitem? There are some semantic differences in pandas apparently with regards to copying, but because we don't mutate my guess is that we should do this everywhere. @TomAugspurger ?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels