Is your feature request related to a problem?
There seem to be some unnecessary cases of copying a table in _table.py.
For example, keep_only_columns creates a copy of the table, then calls remove_columns(...) on that copy and returns the result:
Implementation keep_only_columns(...):
def keep_only_columns(self, column_names: list[str]) -> Table:
...
clone = self._copy()
clone = clone.remove_columns(list(set(self.column_names) - set(column_names)))
return clone
However, remove_columns itself already has to - in accordance to our guidelines - create a copy.
Indeed, it does (although very implicitely, by calling Pandas' drop method with the default inplace=false keyword argument):
Implementation of remove_columns(...):
def remove_columns(self, column_names: list[str]) -> Table:
...
transformed_data = self._data.drop(labels=column_names, axis="columns")
transformed_data.columns = [name for name in self._schema.column_names if name not in column_names]
...
return Table._from_pandas_dataframe(transformed_data)
Pandas documentation for drop method:
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Since remove_columns already correctly returns a copy of the table it was called on, it is not clear to me why we need to make an additional copy within out keep_only_columns implementation.
Am I misunderstanding something?
Desired solution
If these additional copies indeed turn out to be unnecessary: Find all occurences of them and remove them,
Possible alternatives (optional)
If the copies actually turn out to be necessary: Close this issue.
Is your feature request related to a problem?
There seem to be some unnecessary cases of
copying a table in_table.py.For example,
keep_only_columnscreates a copy of the table, then callsremove_columns(...)on that copy and returns the result:Implementation keep_only_columns(...):
However,
remove_columnsitself already has to - in accordance to our guidelines - create a copy.Indeed, it does (although very implicitely, by calling
Pandas'dropmethod with the defaultinplace=falsekeyword argument):Implementation of remove_columns(...):
Pandas documentation for drop method:
Since
remove_columnsalready correctly returns a copy of the table it was called on, it is not clear to me why we need to make an additional copy within outkeep_only_columnsimplementation.Am I misunderstanding something?
Desired solution
If these additional copies indeed turn out to be unnecessary: Find all occurences of them and remove them,
Possible alternatives (optional)
If the copies actually turn out to be necessary: Close this issue.