Modify build.sh and test.sh scripts for ppc64le jenkins build and test by avmgithub · Pull Request #10257 · pytorch/pytorch

avmgithub · 2018-08-06T14:55:31Z

Initial jenkins builds / test scripts for ppc64le.

ezyang · 2018-08-06T21:43:55Z

What's the plan for getting PPC machines connected to Jenkins, with enough capacity so that they can keep up with the volume of builds we request?
Can we cross-compile for PPC? That will help reduce the number of PPC machines we need for testing.
Can we avoid copy pasting the scripts?

avmgithub · 2018-08-07T02:27:23Z

@ezyang answer to your questions:

There is currently a CI build environment from https://powerci.osuosl.org/ . It is a shared environment for other jobs. I'm already doing build/test there for pytorch master. When I was working with @yf225 , he said pytorch PR builds should at least accommodate 4 simultaneous builds (the environment should be able to accommodate that). I believe @yf225 also already has the hooks from your ci to be able to trigger builds on our CI. The plan is to start with a single build/test image using Ubuntu CUDA image from nvidia, CUDA 9.1 , cudnn 7.x , python3 from anaconda. When the environment is ready, we can switch to cuda 9.2. I am still trying to eliminate random environment failures before we can integrate with your CI. We can talk about it more when I am ready. For now I hope to get the scripts ready for ppc64le.
Cross-compile may work (I've never tried it or set it up with nvcc). We do have a toolchain cross compiler that we can setup for x86 (I've used this on other non-CUDA projects). If you're willing to try it, I can help test the produced binaries. Just FYI , when building pytorch on ppc64le it takes between 7-14 minutes to finish when using ccache (so it does not take a lot of resources).
I'll take your suggestion on using your build.sh and test.sh and ifdef for ppc64le to avoid duplication of code in different files.

ezyang · 2018-08-07T22:15:37Z

So, I looked at your modified patch: you're still doing what's effectively a copy paste, since you have a giant if-block conditioned on ppc, versus not. If you're going to do that, you might as well just have another file.

What I would like to see is a localized block that sets the extra environment variables that PPC needs, and no reindenting of the build script. It would be even better if we had a PPC docker image (similar to the docker images we have for our Linux builds), which can have these environment variables pre-set. The definitions for these currently live in https://github.com/pietern/pytorch-dockerfiles

avmgithub · 2018-08-08T10:33:36Z

@ezyang Ok , I got it. For build.sh I can use it as is except for the line WERROR=1 python setup.py install . I am not able to use the line with WERROR=1 . For test.sh , how are you able to run the test while in the pytorch directory? I have to do a "cd .." out of the directory. Otherwise I get an error like below.

test_python_nn
python test/run_test.py --include nn --verbose
Traceback (most recent call last):
File "test/run_test.py", line 14, in
import torch
File "/home/freddie/builder/jenkins/pytorch/pytorch/torch/init.py", line 84, in
from torch._C import *
ModuleNotFoundError: No module named 'torch._C'

this is because there is a torch directory in the pytorch repo. Does this not happen on x86 ?

avmgithub · 2018-08-10T12:11:36Z

@ezyang, when you get the chance please review the latest revision.

.jenkins/pytorch/build.sh


-WERROR=1 python setup.py install
+if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
+  export TORCH_CUDA_ARCH_LIST=6.0


.jenkins/pytorch/test.sh

-    ln -s "$TORCH_LIB_PATH"/libnccl* build/bin
+    if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
+      SUDO=sudo 
+    fi


avmgithub · 2018-08-13T02:44:01Z

@pytorchbot retest this please

avmgithub · 2018-08-13T18:03:27Z

@ezyang when you get the chance , please review. I tried to do the export WERROR=1 in a separate conditional , but it created a problem when building the ATen install. So it looks like it needs to be in line with the python setup.py install.

.jenkins/pytorch/build.sh


 # Target only our CI GPU machine's CUDA arch to speed up the build
-export TORCH_CUDA_ARCH_LIST=5.2
+export TORCH_CUDA_ARCH_LIST="5.2 6.0"


.jenkins/pytorch/build.sh


-WERROR=1 python setup.py install
+# ppc64le build fails when WERROR=1
+# set only when building other archtectures


avmgithub · 2018-08-14T15:50:25Z

@ezyang please review when you get the chance, hope the changes are OK

facebook-github-bot

ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

pytorch#10257) Summary: Initial jenkins builds / test scripts for ppc64le. Pull Request resolved: pytorch#10257 Differential Revision: D9331278 Pulled By: ezyang fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838

add ppc64le build and test jenkins scripts

8fc17d9

avmgithub added 3 commits August 7, 2018 09:57

add ppc64le build and test jenkins scripts

14b7341

add ppc64le build and test jenkins scripts

a21ba0c

add ppc64le build and test jenkins scripts

f985990

avmgithub changed the title ~~Add ppc64le build and test jenkins scripts~~ Modify build.sh and test.sh scripts for ppc64le jenkins build and test Aug 7, 2018

avmgithub added 9 commits August 8, 2018 06:47

Merge remote-tracking branch 'upstream/master'

60db971

add ppc64le build and test jenkins scripts

94e74e0

Fix typo

3a5b60f

add ppc64le build and test jenkins scripts

6d49acc

Merge branch 'master' of https://github.com/avmgithub/pytorch

55ab3c2

fix syntax error

0f356ea

cleanup change

8a1e8ee

Update build.sh

c4c8f1b

Update build.sh

a34624c

ezyang reviewed Aug 10, 2018

View reviewed changes

.jenkins/pytorch/build.sh Outdated

WERROR=1 python setup.py install

if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

export TORCH_CUDA_ARCH_LIST=6.0

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Aug 10, 2018

View reviewed changes

.jenkins/pytorch/test.sh

ln -s "$TORCH_LIB_PATH"/libnccl* build/bin

if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

SUDO=sudo

fi

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

cleanup changes

8ed5d42

avmgithub added 3 commits August 13, 2018 12:22

cleanup changes

8889ef3

cleanup changes

c4bfabd

cleanup changes

dca5c97

ezyang reviewed Aug 13, 2018

View reviewed changes

.jenkins/pytorch/build.sh Outdated

# Target only our CI GPU machine's CUDA arch to speed up the build

export TORCH_CUDA_ARCH_LIST=5.2

export TORCH_CUDA_ARCH_LIST="5.2 6.0"

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Aug 13, 2018

View reviewed changes

.jenkins/pytorch/build.sh Outdated

WERROR=1 python setup.py install

# ppc64le build fails when WERROR=1

# set only when building other archtectures

This comment was marked as off-topic.

Sign in to view

cleanup changes

dc82f5e

ezyang approved these changes Aug 15, 2018

View reviewed changes

facebook-github-bot reviewed Aug 15, 2018

View reviewed changes

facebook-github-bot closed this in f1631c3 Aug 15, 2018

ezyang added open source merged labels Jun 24, 2019

Conversation

avmgithub commented Aug 6, 2018

Uh oh!

ezyang commented Aug 6, 2018

Uh oh!

avmgithub commented Aug 7, 2018

Uh oh!

ezyang commented Aug 7, 2018

Uh oh!

avmgithub commented Aug 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avmgithub commented Aug 10, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

avmgithub commented Aug 13, 2018

Uh oh!

avmgithub commented Aug 13, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

avmgithub commented Aug 14, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avmgithub commented Aug 8, 2018 •

edited

Loading