Skip to content

Modify build.sh and test.sh scripts for ppc64le jenkins build and test#10257

Closed
avmgithub wants to merge 18 commits intopytorch:masterfrom
avmgithub:master
Closed

Modify build.sh and test.sh scripts for ppc64le jenkins build and test#10257
avmgithub wants to merge 18 commits intopytorch:masterfrom
avmgithub:master

Conversation

@avmgithub
Copy link
Contributor

Initial jenkins builds / test scripts for ppc64le.

@ezyang
Copy link
Contributor

ezyang commented Aug 6, 2018

  • What's the plan for getting PPC machines connected to Jenkins, with enough capacity so that they can keep up with the volume of builds we request?
  • Can we cross-compile for PPC? That will help reduce the number of PPC machines we need for testing.
  • Can we avoid copy pasting the scripts?

@avmgithub
Copy link
Contributor Author

@ezyang answer to your questions:

  1. There is currently a CI build environment from https://powerci.osuosl.org/ . It is a shared environment for other jobs. I'm already doing build/test there for pytorch master. When I was working with @yf225 , he said pytorch PR builds should at least accommodate 4 simultaneous builds (the environment should be able to accommodate that). I believe @yf225 also already has the hooks from your ci to be able to trigger builds on our CI. The plan is to start with a single build/test image using Ubuntu CUDA image from nvidia, CUDA 9.1 , cudnn 7.x , python3 from anaconda. When the environment is ready, we can switch to cuda 9.2. I am still trying to eliminate random environment failures before we can integrate with your CI. We can talk about it more when I am ready. For now I hope to get the scripts ready for ppc64le.
  2. Cross-compile may work (I've never tried it or set it up with nvcc). We do have a toolchain cross compiler that we can setup for x86 (I've used this on other non-CUDA projects). If you're willing to try it, I can help test the produced binaries. Just FYI , when building pytorch on ppc64le it takes between 7-14 minutes to finish when using ccache (so it does not take a lot of resources).
  3. I'll take your suggestion on using your build.sh and test.sh and ifdef for ppc64le to avoid duplication of code in different files.

@avmgithub avmgithub changed the title Add ppc64le build and test jenkins scripts Modify build.sh and test.sh scripts for ppc64le jenkins build and test Aug 7, 2018
@ezyang
Copy link
Contributor

ezyang commented Aug 7, 2018

So, I looked at your modified patch: you're still doing what's effectively a copy paste, since you have a giant if-block conditioned on ppc, versus not. If you're going to do that, you might as well just have another file.

What I would like to see is a localized block that sets the extra environment variables that PPC needs, and no reindenting of the build script. It would be even better if we had a PPC docker image (similar to the docker images we have for our Linux builds), which can have these environment variables pre-set. The definitions for these currently live in https://github.com/pietern/pytorch-dockerfiles

@avmgithub
Copy link
Contributor Author

avmgithub commented Aug 8, 2018

@ezyang Ok , I got it. For build.sh I can use it as is except for the line WERROR=1 python setup.py install . I am not able to use the line with WERROR=1 . For test.sh , how are you able to run the test while in the pytorch directory? I have to do a "cd .." out of the directory. Otherwise I get an error like below.

test_python_nn
python test/run_test.py --include nn --verbose
Traceback (most recent call last):
File "test/run_test.py", line 14, in
import torch
File "/home/freddie/builder/jenkins/pytorch/pytorch/torch/init.py", line 84, in
from torch._C import *
ModuleNotFoundError: No module named 'torch._C'

this is because there is a torch directory in the pytorch repo. Does this not happen on x86 ?

@avmgithub
Copy link
Contributor Author

@ezyang, when you get the chance please review the latest revision.


WERROR=1 python setup.py install
if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
export TORCH_CUDA_ARCH_LIST=6.0

This comment was marked as off-topic.

This comment was marked as off-topic.

ln -s "$TORCH_LIB_PATH"/libnccl* build/bin
if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
SUDO=sudo
fi

This comment was marked as off-topic.

This comment was marked as off-topic.

@avmgithub
Copy link
Contributor Author

@pytorchbot retest this please

@avmgithub
Copy link
Contributor Author

@ezyang when you get the chance , please review. I tried to do the export WERROR=1 in a separate conditional , but it created a problem when building the ATen install. So it looks like it needs to be in line with the python setup.py install.


# Target only our CI GPU machine's CUDA arch to speed up the build
export TORCH_CUDA_ARCH_LIST=5.2
export TORCH_CUDA_ARCH_LIST="5.2 6.0"

This comment was marked as off-topic.


WERROR=1 python setup.py install
# ppc64le build fails when WERROR=1
# set only when building other archtectures

This comment was marked as off-topic.

@avmgithub
Copy link
Contributor Author

@ezyang please review when you get the chance, hope the changes are OK

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
pytorch#10257)

Summary:
Initial jenkins builds / test scripts for ppc64le.
Pull Request resolved: pytorch#10257

Differential Revision: D9331278

Pulled By: ezyang

fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants