Skip to content

map_partitions: use tokenized info as name of the SubgraphCallable#7524

Merged
jrbourbeau merged 9 commits intodask:mainfrom
madsbk:fix_partition_info_hash
Apr 23, 2021
Merged

map_partitions: use tokenized info as name of the SubgraphCallable#7524
jrbourbeau merged 9 commits intodask:mainfrom
madsbk:fix_partition_info_hash

Conversation

@madsbk
Copy link
Contributor

@madsbk madsbk commented Apr 6, 2021

This PR makes map_partitions() use the tokenized partition_info as name of the SubgraphCallable, which fixes #7488

  • Tests added / passed
  • Passes black dask / flake8 dask / isort dask

cc. @bnaul, @jsignell

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk! Could you add a small test which uses the distributed scheduler, like #7488 mentioned

@madsbk
Copy link
Contributor Author

madsbk commented Apr 6, 2021

Thanks @madsbk! Could you add a small test which uses the distributed scheduler, like #7488 mentioned

Done

@bnaul
Copy link
Contributor

bnaul commented Apr 6, 2021

Looks great to me, thanks @madsbk!

@jsignell
Copy link
Member

jsignell commented Apr 7, 2021

This looks good to me! I suspect the test failure is transient, so I'll rerun.

@gen_cluster(client=True)
async def test_map_partitions_partition_info(c, s, a, b):
dd = pytest.importorskip("dask.dataframe")
pd = pytest.importorskip("pandas")
Copy link
Member

@jsignell jsignell Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but maybe causing the failures in CI @jrbourbeau, I think there is something kind of wrong with using importorskip with @gen_cluster(client=True). It might be cleaner to have a separate test files for dataframe/tests/test_distributed.py and array/tests/test_distributed.py where the importorskip can just happen at the top of the file.

@jrbourbeau jrbourbeau mentioned this pull request Apr 20, 2021
3 tasks
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk! This is in

@jrbourbeau jrbourbeau merged commit 4bc3979 into dask:main Apr 23, 2021
@madsbk madsbk deleted the fix_partition_info_hash branch May 25, 2021 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

partition_info contains incorrect information with distributed scheduler

4 participants