Skip to content

Unnecessary cloning #428

@zzril

Description

@zzril

Is your feature request related to a problem?

There seem to be some unnecessary cases of copying a table in _table.py.

For example, keep_only_columns creates a copy of the table, then calls remove_columns(...) on that copy and returns the result:

Implementation keep_only_columns(...):

    def keep_only_columns(self, column_names: list[str]) -> Table:
        ...
        clone = self._copy()
        clone = clone.remove_columns(list(set(self.column_names) - set(column_names)))
        return clone

However, remove_columns itself already has to - in accordance to our guidelines - create a copy.
Indeed, it does (although very implicitely, by calling Pandas' drop method with the default inplace=false keyword argument):

Implementation of remove_columns(...):

    def remove_columns(self, column_names: list[str]) -> Table:
       ...
        transformed_data = self._data.drop(labels=column_names, axis="columns")
        transformed_data.columns = [name for name in self._schema.column_names if name not in column_names]
        ...
        return Table._from_pandas_dataframe(transformed_data)

Pandas documentation for drop method:

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Since remove_columns already correctly returns a copy of the table it was called on, it is not clear to me why we need to make an additional copy within out keep_only_columns implementation.
Am I misunderstanding something?

Desired solution

If these additional copies indeed turn out to be unnecessary: Find all occurences of them and remove them,

Possible alternatives (optional)

If the copies actually turn out to be necessary: Close this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cleanup 🧹Refactorings and other tasks that improve the codeperformance 🏃Speed things upquestionFurther information is requestedreleasedIncluded in a release

    Type

    No type

    Projects

    Status

    ✔️ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions