Interface for DPPY interoperability by coquelin77 · Pull Request #788 · helmholtz-analytics/heat

coquelin77 · 2021-06-08T09:21:54Z

Description

implement the __partitioned__ attribute to the DNDarray for compatibility with daal4py (IntelPython/DPPY-Spec#3). At the moment, this is not used by heat internally. However, There are some ideas about how this could be done in the future.

Issue/s resolved: #772

Changes proposed:

added __partitioned__ attribute to DNDarray

Type of change

New feature (non-breaking change which adds functionality)

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

… -> __partitioned__

codecov · 2021-06-08T09:25:20Z

Codecov Report

Merging #788 (fd7ec83) into main (b9658d4) will increase coverage by 0.00%.
The diff coverage is 91.89%.

@@           Coverage Diff           @@
##             main     #788   +/-   ##
=======================================
  Coverage   91.80%   91.80%           
=======================================
  Files          72       72           
  Lines       10445    10519   +74     
=======================================
+ Hits         9589     9657   +68     
- Misses        856      862    +6

Flag	Coverage Δ
unit	`91.80% <91.89%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
heat/core/factories.py	`98.03% <85.71%> (-1.97%)`	⬇️
heat/core/dndarray.py	`96.92% <100.00%> (+0.12%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…e taken care of

coquelin77 · 2021-07-13T08:48:33Z

rerun tests

fschlimb · 2021-07-13T09:15:13Z

Cool, this looks good.

The latest spec asks for a 'get' field in the outer __partitioned__ dict. Here this should probably be either lambda x: x or a function which first checks if it's being called on the right rank. My current thinking is that it is ok to return real data only when called on the owning rank - at least for SPMD. So returning None for input None would be fine, I think.

I added your code (slightly modified) to my controller/worker wrapper prototype (#823). It has similar restrictions but allows calling get on the owner rank as well as on the controller rank (as required by this programming model).

coquelin77 · 2021-07-13T12:08:12Z

Im not seeing the changes that you made when you moved the create partition interface function over.

as for the get field, you mean something which gets the data of a specific partition correct? for example, it would look something like this x.__partitioned__.get(tuple or tile identifier) and it would return that data if it is on the node already otherwise it would be None. is that correct?

fschlimb · 2021-07-13T12:27:21Z

Yes, except of course that its x.__partitioned__['get'](id).

The idea here is that - in particular for frameworks like ray and dask - 'data' might (should) not be raw data but a handle/future. Having a unified way of converting the handle into raw data without explicitly understanding/using the ray/dask/... make this more useful.

coquelin77 · 2021-07-13T12:33:32Z

ive implemented that now. i tested it with a small example:

x = ht.arange(8* 3* 2).reshape((8, 3, 2)).resplit(0)
print(x.__partitioned__['get']((0, 0, 0)))

[1] None
[2] None
[0] tensor([[[ 0,  1],
[0]          [ 2,  3],
[0]          [ 4,  5]],
[0] 
[0]         [[ 6,  7],
[0]          [ 8,  9],
[0]          [10, 11]],
[0] 
[0]         [[12, 13],
[0]          [14, 15],
[0]          [16, 17]]], dtype=torch.int32)

fschlimb · 2021-07-13T12:35:05Z

heat/core/dndarray.py

        }
+
+        def _partition_getter(key):
+            return partition_dict["partitions"][key]["data"]


The idea was to accept whatever is located in partition_dict["partitions"][key]["data"].
A user would use it like this pdict['get'](pdict['partitions"][key]["data"].

here, it's really lambda x: x

ah! i was thinking one level of abstraction more than that.

now this is what the snippet would look like:

x = ht.arange(8* 3* 2).reshape((8, 3, 2)).resplit(0) print(x.__partitioned__['get'](x.__partitioned__['partitions'][(0,0,0)]['data']))

fschlimb · 2021-07-14T16:21:49Z

So far this is addressing the producer side. We'd also need the consumer side.
HeAT would need a from_partitioned creating a DNDarray from a __partitioned__. If this also goes into other features like constructors or operators can then be considered as well.

adding from_partitioned; aligning __partitioned__ with current spec

…, added factory function for building a dndarry from a partition dictionary

coquelin77 · 2021-09-23T07:59:46Z

@fschlimb I have added a bit more functionality to from_partitioned. It now supports non-zero split axes and i have also added a from_partition_dict function which does the same thing, but it dont not require a DNDarray as the object being passed in. Instead this one takes the dictionary object and creates the matching DNDarray. It behaves the same way and does a zero-copy when possible. I have added unit tests for this as well. Please have a look

fschlimb · 2021-10-07T09:50:10Z

FYI: A new discussion was initiated with the data-API consortium: data-apis/consortium-feedback#7

fschlimb · 2021-10-07T09:55:19Z

heat/core/factories.py

        )
        for x in parts.values()
    }
-    if split is not None and \


Is this assertion not true?

ghost · 2022-06-01T08:07:38Z

👇 Click on the image for a new way to code review

Make big changes easier — review code in small groups of related files
Know where to start — see the whole change at a glance
Take a code tour — explore the change with an interactive tour
Make comments and review — all fully sync’ed with github

Try it now!

Legend

ClaudiaComito · 2022-06-01T08:10:20Z

@fschlimb @coquelin77 thanks again for all this work. What are the next steps here?

fschlimb · 2022-06-02T10:31:06Z

I guess there are 2 options:

wait for the discussion Protocol for distributed data data-apis/consortium-feedback#7 to conclude. Any input there could help bringing this to a conclusion.
Go ahead and just merge it. It doesn't seem to hurt even it's not as useful as it could be since it's an isolated implementation.

ClaudiaComito

Made a small change here to ensure self.__partitions_dict__ is None after the latest dndarray changes. I'm going to approve and merge, @coquelin77 @fschlimb apologies for the delay.

@fschlimb just out of curiosity, with the changes introduced here would it be possible to run these benchmarks with Heat as a backend, or is there more work required there?

coquelin77 added 8 commits May 17, 2021 16:18

added partition interface to DNDarray

f297ea2

added 'locals' key to partition interface

9191599

renamed locals to lcls to avoid global name

89fda67

corrected format of locals

8806b7c

renamed dunder class attr of DNDarray to __partitioned__

51368b7

corrected split=0 case, corrected DNDarray property to be 'partitioned'

81e47c8

DNDarray.__partitioned__ -> __partitions_dict__, DNDarray.partitioned…

857f585

… -> __partitioned__

Merge branch 'master' into features/772-partition-interface

dfb3231

coquelin77 added 4 commits June 22, 2021 11:46

Merge branch 'master' into features/772-partition-interface

93d1e5e

added tests for partitioned attribute

d69fdd6

minor changes to test cases to check that things after the resplit ar…

0afc772

…e taken care of

split=None tests

750cc2b

coquelin77 marked this pull request as ready for review June 22, 2021 10:28

coquelin77 requested a review from ClaudiaComito June 22, 2021 10:28

Merge branch 'master' into features/772-partition-interface

4307306

changelog update

992c385

Merge branch 'master' into features/772-partition-interface

ac3928e

added 'get' attributed to __partitioned__ to get a tile from a DNDarray

012318d

fschlimb reviewed Jul 13, 2021

View reviewed changes

reduced level of abstraction for __partitioned__['get']

26203c0

coquelin77 added 3 commits August 23, 2021 10:57

Merge branch 'master' into features/772-partition-interface

d44109c

Merge branch 'master' into features/772-partition-interface

0d22fbd

Merge branch 'master' into features/772-partition-interface

e434b7c

fschlimb and others added 5 commits August 26, 2021 08:09

adding from_partitioned; aligning __partitioned__ with current spec

29d0385

Merge branch 'master' into features/772-partition-interface

6f0a0ac

Merge pull request #860 from fschlimb/features/772-from_partition

4983751

adding from_partitioned; aligning __partitioned__ with current spec

updating from_partitioned function

a0c8ab4

added nonzero split support to from partition dictionary, added tests…

7c70eae

…, added factory function for building a dndarry from a partition dictionary

fschlimb mentioned this pull request Sep 27, 2021

FEAT-#3451: Support __partitioned__ protocol modin-project/modin#3452

Closed

7 tasks

fschlimb mentioned this pull request Oct 6, 2021

Protocol for distributed data data-apis/consortium-feedback#7

Open

fschlimb reviewed Oct 7, 2021

View reviewed changes

ClaudiaComito added this to the 1.2.x milestone Oct 27, 2021

coquelin77 and others added 3 commits January 4, 2022 13:49

Merge branch 'master' into features/772-partition-interface

ab71539

Merge branch 'master' into features/772-partition-interface

c1e6bcf

Merge branch 'main' into features/772-partition-interface

a505e99

ClaudiaComito removed this from the 1.2.x milestone Jun 3, 2022

Merge branch 'main' into features/772-partition-interface

9d10b7b

ClaudiaComito added this to the 1.3.0 milestone Feb 8, 2023

ClaudiaComito self-assigned this Feb 8, 2023

Ensure is None when virtually resplitting to None on 1 process

fd7ec83

ClaudiaComito changed the title ~~Features/772 partition interface~~ Interface for DPPY interoperability Feb 9, 2023

ClaudiaComito added the interoperability label Feb 9, 2023

ClaudiaComito approved these changes Feb 9, 2023

View reviewed changes

ClaudiaComito merged commit bcea48a into main Feb 9, 2023

ClaudiaComito deleted the features/772-partition-interface branch February 9, 2023 10:51

JuanPedroGHM added the dndarray label Apr 14, 2023

Conversation

coquelin77 commented Jun 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

Uh oh!

codecov bot commented Jun 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coquelin77 commented Jul 13, 2021

Uh oh!

fschlimb commented Jul 13, 2021

Uh oh!

coquelin77 commented Jul 13, 2021

Uh oh!

fschlimb commented Jul 13, 2021

Uh oh!

coquelin77 commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fschlimb Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

fschlimb Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

coquelin77 Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

coquelin77 Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

fschlimb commented Jul 14, 2021

Uh oh!

coquelin77 commented Sep 23, 2021

Uh oh!

fschlimb commented Oct 7, 2021

Uh oh!

fschlimb Oct 7, 2021

Choose a reason for hiding this comment

Uh oh!

ghost commented Jun 1, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Legend

Uh oh!

ClaudiaComito commented Jun 1, 2022

Uh oh!

fschlimb commented Jun 2, 2022

Uh oh!

ClaudiaComito left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coquelin77 commented Jun 8, 2021 •

edited

Loading

codecov bot commented Jun 8, 2021 •

edited

Loading

coquelin77 commented Jul 13, 2021 •

edited

Loading

ghost commented Jun 1, 2022 •

edited by ghost

Loading