Skip to content

Changing type and name of local_used_maps to reflect that it is only one map#65380

Closed
jaceyca wants to merge 4 commits intogh/jaceyca/7/basefrom
gh/jaceyca/7/head
Closed

Changing type and name of local_used_maps to reflect that it is only one map#65380
jaceyca wants to merge 4 commits intogh/jaceyca/7/basefrom
gh/jaceyca/7/head

Conversation

@jaceyca
Copy link
Contributor

@jaceyca jaceyca commented Sep 20, 2021

…one map

Fixing bugs that arise when running setup.py develop

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 20, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit eb037b0 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jaceyca added a commit that referenced this pull request Sep 21, 2021
…one map

Fixing bugs that arise when running setup.py develop

ghstack-source-id: 0ff288a
Pull Request resolved: #65380
jaceyca added a commit that referenced this pull request Sep 21, 2021
…one map

Fixing bugs that arise when running setup.py develop

ghstack-source-id: 1343ce6
Pull Request resolved: #65380
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for further cleaning up the code.

Fixing bugs that arise when running setup.py develop

Which bug are you referring to?

}
local_used_work_ = process_group_->allreduce(local_used_maps_dev_);
std::vector<at::Tensor> temp_local_used_map_dev_vec_ = {local_used_map_dev_};
local_used_work_ = process_group_->allreduce(temp_local_used_map_dev_vec_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, does the following work?

local_used_work_ = process_group_->allreduce({local_used_map_dev_});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that causes the python setup.py develop command to error out:

../torch/csrc/distributed/c10d/reducer.cpp: In member function ‘void c10d::Reducer::all_reduce_local_used_map()’:
../torch/csrc/distributed/c10d/reducer.cpp:688:67: error: no matching function for call to ‘c10d::ProcessGroup::allreduce(at::Tensor&)’
   local_used_work_ = process_group_->allreduce(local_used_map_dev_);
                                                                   ^
In file included from ../torch/csrc/distributed/c10d/reducer.hpp:13,
                 from ../torch/csrc/distributed/c10d/reducer.cpp:1:
../torch/csrc/distributed/c10d/ProcessGroup.hpp:212:50: note: candidate: ‘virtual c10::intrusive_ptr<c10d::ProcessGroup::Work> c10d::ProcessGroup::allreduce(std::vector<at::Tensor>&, const c10d::AllreduceOptions&)’
   virtual c10::intrusive_ptr<ProcessGroup::Work> allreduce(
                                                  ^~~~~~~~~
../torch/csrc/distributed/c10d/ProcessGroup.hpp:212:50: note:   no known conversion for argument 1 from ‘at::Tensor’ to ‘std::vector<at::Tensor>&’

This is the bug I was referring to in my commit message, I should've been more clear.

Yanli suggested this fix of pushing local_used_maps_dev_ into a temporary vector and passing it to allreduce()

jaceyca added a commit that referenced this pull request Sep 22, 2021
…one map

Fixing bugs that arise when running setup.py develop

ghstack-source-id: 3fd42d4
Pull Request resolved: #65380
@jaceyca
Copy link
Contributor Author

jaceyca commented Sep 22, 2021

@jaceyca has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaceyca merged this pull request in f24bd43.

@facebook-github-bot facebook-github-bot deleted the gh/jaceyca/7/head branch September 26, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants