Skip to content

Merge disc backend to acc 2.3#3

Merged
anw90 merged 12 commits intoaccfrom
merge_disc
Oct 11, 2024
Merged

Merge disc backend to acc 2.3#3
anw90 merged 12 commits intoaccfrom
merge_disc

Conversation

@yitongh
Copy link
Copy Markdown

@yitongh yitongh commented Aug 8, 2024

Open this PR to track the modifications on disc backend.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Aug 8, 2024

CLA assistant check
All committers have signed the CLA.

Support Disc as backend
Co-authored-by: yancey.yx <yancey.yx@antfin.com>
Co-authored-by: wangang.wa <wangang.wa@alibaba-inc.com>
Yancey0623 and others added 5 commits August 8, 2024 14:32
* add flag to disable disc backend in bazel workspace
support disc backend debug mode to dump DISC compilation logs
* fix bazel flag when complie python

* fix lint.
add float-norm pass to support bf16 amp training
@yitongh
Copy link
Copy Markdown
Author

yitongh commented Aug 8, 2024

The prs that have not been merged yet:

  • e5470e3d50d528f49f287cfbb015e0d10897b897
  • cceb7f76fa2c1dcdc3260cef75d3cf34e0744126
  • 3fa4fe284dacef4ae447271356d4719cd223c2e2
  • c820ca030a51429a97654178afdefaf5f88d4485
  • 4c7fbd81383bbd05de2aca7b3c84731f197e9628

cceb7f76fa2c1dcdc3260cef75d3cf34e0744126 should be merged before support disc debug mode to dump mhlo and logs because "//torch_xla/csrc/runtime:env_vars" and "//torch_xla/csrc/runtime:sys_util" can be linked.

FA 256 support: #4

@yitongh
Copy link
Copy Markdown
Author

yitongh commented Aug 8, 2024

TODO:

  • third_party/nccl/nccl.h not found
  • ENABLE_DISC=0 not work

yitongh and others added 4 commits August 12, 2024 14:45
* fix build failed on nccl

* using nccl hdrs
* change the device type of disc to cuda to make amp work properly

* Use the value of DISC_DEVICE as the device type of disc backend
@anw90 anw90 merged commit fab18e0 into acc Oct 11, 2024
@anw90 anw90 deleted the merge_disc branch October 11, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants