Skip to content

Interface for DPPY interoperability#788

Merged
ClaudiaComito merged 30 commits intomainfrom
features/772-partition-interface
Feb 9, 2023
Merged

Interface for DPPY interoperability#788
ClaudiaComito merged 30 commits intomainfrom
features/772-partition-interface

Conversation

@coquelin77
Copy link
Copy Markdown
Member

@coquelin77 coquelin77 commented Jun 8, 2021

Description

implement the __partitioned__ attribute to the DNDarray for compatibility with daal4py (IntelPython/DPPY-Spec#3). At the moment, this is not used by heat internally. However, There are some ideas about how this could be done in the future.

Issue/s resolved: #772

Changes proposed:

  • added __partitioned__ attribute to DNDarray

Type of change

  • New feature (non-breaking change which adds functionality)

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 8, 2021

Codecov Report

Merging #788 (fd7ec83) into main (b9658d4) will increase coverage by 0.00%.
The diff coverage is 91.89%.

@@           Coverage Diff           @@
##             main     #788   +/-   ##
=======================================
  Coverage   91.80%   91.80%           
=======================================
  Files          72       72           
  Lines       10445    10519   +74     
=======================================
+ Hits         9589     9657   +68     
- Misses        856      862    +6     
Flag Coverage Δ
unit 91.80% <91.89%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/factories.py 98.03% <85.71%> (-1.97%) ⬇️
heat/core/dndarray.py 96.92% <100.00%> (+0.12%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@coquelin77 coquelin77 marked this pull request as ready for review June 22, 2021 10:28
@coquelin77 coquelin77 requested a review from ClaudiaComito June 22, 2021 10:28
@coquelin77
Copy link
Copy Markdown
Member Author

rerun tests

@fschlimb
Copy link
Copy Markdown
Contributor

Cool, this looks good.

The latest spec asks for a 'get' field in the outer __partitioned__ dict. Here this should probably be either lambda x: x or a function which first checks if it's being called on the right rank. My current thinking is that it is ok to return real data only when called on the owning rank - at least for SPMD. So returning None for input None would be fine, I think.

I added your code (slightly modified) to my controller/worker wrapper prototype (#823). It has similar restrictions but allows calling get on the owner rank as well as on the controller rank (as required by this programming model).

@coquelin77
Copy link
Copy Markdown
Member Author

Im not seeing the changes that you made when you moved the create partition interface function over.

as for the get field, you mean something which gets the data of a specific partition correct? for example, it would look something like this x.__partitioned__.get(tuple or tile identifier) and it would return that data if it is on the node already otherwise it would be None. is that correct?

@fschlimb
Copy link
Copy Markdown
Contributor

Yes, except of course that its x.__partitioned__['get'](id).

The idea here is that - in particular for frameworks like ray and dask - 'data' might (should) not be raw data but a handle/future. Having a unified way of converting the handle into raw data without explicitly understanding/using the ray/dask/... make this more useful.

@coquelin77
Copy link
Copy Markdown
Member Author

coquelin77 commented Jul 13, 2021

ive implemented that now. i tested it with a small example:

x = ht.arange(8* 3* 2).reshape((8, 3, 2)).resplit(0)
print(x.__partitioned__['get']((0, 0, 0)))
[1] None
[2] None
[0] tensor([[[ 0,  1],
[0]          [ 2,  3],
[0]          [ 4,  5]],
[0] 
[0]         [[ 6,  7],
[0]          [ 8,  9],
[0]          [10, 11]],
[0] 
[0]         [[12, 13],
[0]          [14, 15],
[0]          [16, 17]]], dtype=torch.int32)

}

def _partition_getter(key):
return partition_dict["partitions"][key]["data"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to accept whatever is located in partition_dict["partitions"][key]["data"].
A user would use it like this pdict['get'](pdict['partitions"][key]["data"].

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, it's really lambda x: x

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! i was thinking one level of abstraction more than that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now this is what the snippet would look like:

    x = ht.arange(8* 3* 2).reshape((8, 3, 2)).resplit(0)
    print(x.__partitioned__['get'](x.__partitioned__['partitions'][(0,0,0)]['data']))

@fschlimb
Copy link
Copy Markdown
Contributor

So far this is addressing the producer side. We'd also need the consumer side.
HeAT would need a from_partitioned creating a DNDarray from a __partitioned__. If this also goes into other features like constructors or operators can then be considered as well.

@coquelin77
Copy link
Copy Markdown
Member Author

@fschlimb I have added a bit more functionality to from_partitioned. It now supports non-zero split axes and i have also added a from_partition_dict function which does the same thing, but it dont not require a DNDarray as the object being passed in. Instead this one takes the dictionary object and creates the matching DNDarray. It behaves the same way and does a zero-copy when possible. I have added unit tests for this as well. Please have a look

@fschlimb
Copy link
Copy Markdown
Contributor

fschlimb commented Oct 7, 2021

FYI: A new discussion was initiated with the data-API consortium: data-apis/consortium-feedback#7

)
for x in parts.values()
}
if split is not None and \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assertion not true?

@ClaudiaComito ClaudiaComito added this to the 1.2.x milestone Oct 27, 2021
@ghost
Copy link
Copy Markdown

ghost commented Jun 1, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@ClaudiaComito
Copy link
Copy Markdown
Member

@fschlimb @coquelin77 thanks again for all this work. What are the next steps here?

@fschlimb
Copy link
Copy Markdown
Contributor

fschlimb commented Jun 2, 2022

I guess there are 2 options:

  1. wait for the discussion Protocol for distributed data data-apis/consortium-feedback#7 to conclude. Any input there could help bringing this to a conclusion.
  2. Go ahead and just merge it. It doesn't seem to hurt even it's not as useful as it could be since it's an isolated implementation.

@ClaudiaComito ClaudiaComito removed this from the 1.2.x milestone Jun 3, 2022
@ClaudiaComito ClaudiaComito added this to the 1.3.0 milestone Feb 8, 2023
@ClaudiaComito ClaudiaComito self-assigned this Feb 8, 2023
@ClaudiaComito ClaudiaComito changed the title Features/772 partition interface Interface for DPPY interoperability Feb 9, 2023
Copy link
Copy Markdown
Member

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a small change here to ensure self.__partitions_dict__ is None after the latest dndarray changes. I'm going to approve and merge, @coquelin77 @fschlimb apologies for the delay.

@fschlimb just out of curiosity, with the changes introduced here would it be possible to run these benchmarks with Heat as a backend, or is there more work required there?

@ClaudiaComito ClaudiaComito merged commit bcea48a into main Feb 9, 2023
@ClaudiaComito ClaudiaComito deleted the features/772-partition-interface branch February 9, 2023 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support __partition__ interface export

4 participants