Skip to content

ValueError in groupby.get_group from read_csv #7005

@hellocoldworld

Description

@hellocoldworld

What happened:
In dask==2020.12.0, trying to save a csv raised

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Downgrading to dask==2.30 solved the issue. I think this is related to PR 6878

What you expected to happen:
The dataframe is saved.

Anything else we need to know?:
Full traceback:

  File "/home/nico/Escritorio/pcnt/blacklists_service/crawlers_service/worldsys.py", line 79, in save_subsetted_df
    data.to_csv(filepath, sep=",", index=False, encoding="utf-8", header=False, single_file=True)
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/dataframe/core.py", line 1433, in to_csv
    return to_csv(self, filename, **kwargs)
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/dataframe/io/csv.py", line 858, in to_csv
    dfs = df.to_delayed()
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/dataframe/core.py", line 1493, in to_delayed
    graph = self.__dask_optimize__(graph, self.__dask_keys__())
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/dataframe/optimize.py", line 28, in optimize
    dependencies = dsk.get_all_dependencies()
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/highlevelgraph.py", line 532, in get_all_dependencies
    all_keys = self.keyset()
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/highlevelgraph.py", line 501, in keyset
    self._keys.update(layer.keys())
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/_collections_abc.py", line 720, in __iter__
    yield from self._mapping
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/blockwise.py", line 293, in __iter__
    return iter(self._dict)
  File "/home/nico/.local/share/virtualenvs/blacklists_service--dBzAapY/lib/python3.8/site-packages/dask/blockwise.py", line 595, in _dict
    if io_key in dsk[k]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Environment:

  • Dask version: 2020.12.0
  • Python version: 3.8.6
  • Operating System: Ubuntu MATE 18.04, also happen on Alpine Lunux 3.12
  • Install method (conda, pip, source): pip (vía pipenv in case of Ubuntu MATE)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions