THD updates by apaszke · Pull Request #1396 · pytorch/pytorch

apaszke · 2017-04-28T21:04:08Z

No description provided.

…r refactor

Previously, when using same data channel in multiple thread environment, one didn't have any guarantee that there won't be any deadlocks or even errors.

apaszke · 2017-05-01T08:49:59Z

Rebased and merged directly into master.

…7894eb Summary: Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52 Included changes: - **[39dd0d4](onnx/onnx@39dd0d4)**: [build] Add ONNX_API for protos in all cases (pytorch#1407) <Orion Reblitz-Richardson> - **[944db4f](onnx/onnx@944db4f)**: cmake (pytorch#1401) <zrphercule> - **[8ccc8dd](onnx/onnx@8ccc8dd)**: Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (pytorch#1397) <Marat Dukhan> - **[df14e74](onnx/onnx@df14e74)**: Change onnxifi test driver classname (pytorch#1396) <zrphercule> - **[0c885cc](onnx/onnx@0c885cc)**: ONNXIFI cpp test driver (pytorch#1290) <zrphercule> - **[a557848](onnx/onnx@a557848)**: Coverage Report Tools for Backend Scoreboard (pytorch#1301) <Akshay Chalana> - **[31fd87f](onnx/onnx@31fd87f)**: fix AvgPool doc. add default value for count_include_pad (pytorch#1391) <Wenhao Hu> - **[8ff08c2](onnx/onnx@8ff08c2)**: Do not export onnx symbols in the python extension (pytorch#1388) <bddppq> Differential Revision: D9806635 fbshipit-source-id: 962e5dcb79f98a7e3a769b1ca9633e60c1735b48

…7894eb (#11622) Summary: Pull Request resolved: #11622 Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52 Included changes: - **[39dd0d4](onnx/onnx@39dd0d4)**: [build] Add ONNX_API for protos in all cases (#1407) <Orion Reblitz-Richardson> - **[944db4f](onnx/onnx@944db4f)**: cmake (#1401) <zrphercule> - **[8ccc8dd](onnx/onnx@8ccc8dd)**: Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (#1397) <Marat Dukhan> - **[df14e74](onnx/onnx@df14e74)**: Change onnxifi test driver classname (#1396) <zrphercule> - **[0c885cc](onnx/onnx@0c885cc)**: ONNXIFI cpp test driver (#1290) <zrphercule> - **[a557848](onnx/onnx@a557848)**: Coverage Report Tools for Backend Scoreboard (#1301) <Akshay Chalana> - **[31fd87f](onnx/onnx@31fd87f)**: fix AvgPool doc. add default value for count_include_pad (#1391) <Wenhao Hu> - **[8ff08c2](onnx/onnx@8ff08c2)**: Do not export onnx symbols in the python extension (#1388) <bddppq> Reviewed By: orionr Differential Revision: D9806635 fbshipit-source-id: f61c052b6bd14e0c80ace19c1a5f0ba659030c6f

@eqy

…ytorch#1400) * it looks possible to remove this file * add communication collectives * update Column|RowParallelLinear * update checkpoint function * update function name * parity between public and private collectives * row parallel linear * column parallel linear * sequence parallel: p2p comm fix typo * sequence parallel: pipeline parallel * fix typo * add layernorm with sequence_parallel_enabled attr * class variable -> member variable * fix col parallel test with sequence parallel * Initial test of `forward_backward_pipelining_without_interleaving` with `model_type=ModelType.encoder_and_decoder` * add cases pretending to test sequence_parallel * Apply 2 suggestion(s) to 1 file(s) * update sequence_parallel_enabled docstring * update docstring: order of tensor dimensions, sequence_parallel_enabled behavior * Divide sequence_length if sequence parallel tensor shape should be updated if sequence parallel is enabled. * cherry-pick NVIDIA/Megatron-LM@8474e6e * type annotation * Fix matmul call in RowParallelLinear Fix `sequence_parallel_enabled` to `False` as you can see in https://github.com/NVIDIA/Megatron-LM/blob/d898a8991d1a08d29074f87819d1bf41517e35f5/megatron/mpu/layers.py#L511-L514 * update rowparallellinear test * fix `loss_weight` is not defined in test_layers * @eqy's comment * mixed fused layer norm * fix typo * misc * test_layers cleanup * Skip Bert/GPT script Since these two models haven't gotten updated for sequence parallle, e.g. the update of the order of dimension from (batch, sequence, feature) to (sequence, batch, feature) and global variables of arguments * debug part 1/N: comment out `x.retain_grad` * debug part 2/N: [ColumnParallelLinear] comment out overriding of sequence_parallel_enabled * debug 3/N: add pipeline test with parallel mlp * Fix handling `self.input_tensor` and argument * tp2pp4 ModelType.encoder_or_decoder is failing, which can be at my fault because the backward is blaming the output and the grad_ouptut shape don't match * revert debug 1/N * defer tensor model parallel size > 1 * split tensor in sequence dim * cosmetic * cosmetic: remove archaic comment * enable TP>1 for encoder_and_decoder as well * set requires_grad=True always... * Set `scatter_gather_tensors_in_pipeline` to :obj:`False` for the sake of nemo megatron's GPT works with sequence parallel enabled. * brush up comment of `requires_grad()` There's a possibility that PyTorch DistributedDataParallel hangs when some tensor (or parameter) doesn't require grad according to @ptrblck. This forced `requires_grad` in my understanding is different from that. * misc changes of scatter_gather_tensors_in_pipeline comment * guard for torch_ucc * cosmetic changes related to tests * update command line arguments * update TransformerLanguageModel * rename * move gpt to gpt.py * update bert * add all_gather for params in sequence parallel region * misc. some diffs were lost during rebasing... * updates for non sequence parallel execution * gpt with sequence parallel * Apply 2 suggestion(s) to 2 file(s) * update tensor&pipeline parallel size * why `sequence_parallel_enabled` is not supplied!? Did I messed up when rebasing? * cosmetic fix * correct key is sequence_parallel_enabled

There was a known issue with triton where we saw errors with bfloat16. This is now fixed upstream with pytorch#111129 . However, it seems that we branched off release/2.1 before the change was merged upstream. In the meantime, we can just skip these UTs.

apaszke force-pushed the thd branch 2 times, most recently from a988b35 to 3f2c72c Compare April 28, 2017 22:01

VirrageS and others added 18 commits April 28, 2017 15:30

Implement TH_API functions from the set 4

9924de3

Fix build with CUDA

5358676

Removed unnecessary code; Minor fixes

7449c0f

Always loop over all possible addresses in worker

a765215

Use TCP_NODELAY for data sockets

b4cb395

Fix invalid socket initialization

4775055

Minor code refactor

bb7f81c

Rename construct -> new; Minor fixes

bfdcc52

Don't build tests by default

e0fc0d0

Rewrite CommandChannel

694cbbb

Revert structure changes; Minor fixes

05719ce

Tweaks, fixes, cleanup in DataChannelTCP

b742792

Change rank type: int -> std::uint32_t; Minor fixes

00d47de

Add support for unsigned char aka byte to MPI

c123799

Add convertToRank to do bound checking

95592f3

Implement functions from set 1 (except Lapack)

a28045e

Lapack function implementation #1

617a3bf

Lapack functions implementation #2 + fixes after review

ed0c11a

apaszke force-pushed the thd branch from 3f2c72c to 89ace1d Compare April 28, 2017 22:31

0mp and others added 9 commits April 28, 2017 16:10

Add benchmark scripts (#66)

ddbdc0f

Review fixes

fcdc8a8

Add error handling in MasterWorker mode

dd4cd92

Add actual error reporting in Master

97e34aa

Minor fixes in THDMasterWorkerInit

934707a

Remove unnecessary code

aa33688

Move error thread to CommandChannel; Minor fixes;

266cf9e

Remove unnecessary nonzeroElems function

5ab60ec

Refactor error thread

c64ddb2

VirrageS and others added 18 commits April 28, 2017 16:10

Fix THD library build for CUDA

b526a42

Initial gloo bindings

8256a1f

Implement DataChannelGloo

cef4e28

Add groups

8dfbc5d

Add isend/irecv; Add all types generator for template functions; Mino…

1cc3474

…r refactor

Add DataChannelGloo tests

f4c29e4

Fix DataChannelGloo compilation

db20d19

Add python tests; Remove broken prefix store creation

0d474a4

Fix compilation errors

68f186d

Fix store and all operations

493c99e

Change Store exception handling

f2f98d1

Moved GlooCache to new file; Functions renames; Minor fixes

b4242de

Remove unused variable in macro

6d146fc

Forward declare GlooCache key_type

6360f31

Add multiple thread support for DataChannels

e88ce46

Previously, when using same data channel in multiple thread environment, one didn't have any guarantee that there won't be any deadlocks or even errors.

Update comments; Add inline accessors for value_type tuple in GlooCache

73d024d

Change warning message in MPI

9ae31de

Rebase fixes

ef12c74

apaszke force-pushed the thd branch from 89ace1d to ef12c74 Compare April 28, 2017 23:10

Add base for CUDA allReduce and broadcast in DataChannelGloo

22a901c

apaszke force-pushed the thd branch from f213fef to 22a901c Compare April 30, 2017 21:43

apaszke closed this May 1, 2017

houseroad mentioned this pull request Sep 13, 2018

Automatic update of fbcode/onnx to 39dd0d4fec5913aa517b71bcfcbf638a427894eb #11622

Closed

ezyang added the open source label Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THD updates#1396

THD updates#1396
apaszke wants to merge 49 commits intomasterfrom
thd

apaszke commented Apr 28, 2017

Uh oh!

apaszke commented May 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

apaszke commented Apr 28, 2017

Uh oh!

apaszke commented May 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants