|
31 | 31 | + [Why LD_PRELOAD in the build function?](#why-ld-preload-in-the-build-function-) |
32 | 32 | + [Why no leak detection?](#why-no-leak-detection-) |
33 | 33 | - [Caffe2 notes](#caffe2-notes) |
| 34 | +- [CI failure tips](#ci-failure-tips) |
34 | 35 |
|
35 | 36 | ## Contributing to PyTorch |
36 | 37 |
|
@@ -938,3 +939,38 @@ are Caffe2/PyTorch specific. Here they are: |
938 | 939 | - `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are |
939 | 940 | PyTorch-specific. Don't put Caffe2 code in them without extra |
940 | 941 | coordination. |
| 942 | + |
| 943 | +## CI failure tips |
| 944 | + |
| 945 | +Once you submit a PR or push a new commit to a branch that is in |
| 946 | +an active PR, CI jobs will be run automatically. Some of these may |
| 947 | +fail and you will need to find out why, by looking at the logs. |
| 948 | + |
| 949 | +Fairly often, a CI failure might be unrelated to your changes. In this |
| 950 | +case, you can usually ignore the failure. |
| 951 | + |
| 952 | +Some failures might be related to specific hardware or environment |
| 953 | +configurations. In this case, if the job is run by CircleCI, you can |
| 954 | +ssh into the job's session to perform manual debugging using the |
| 955 | +following steps: |
| 956 | + |
| 957 | +1. In the CircleCI page for the failed job, make sure you are logged in |
| 958 | + and then click the `Rerun` actions dropdown button on the top right. |
| 959 | + Click `Rerun Job with SSH`. |
| 960 | + |
| 961 | +2. When the job reruns, a new step will be added in the `STEPS` tab |
| 962 | + labelled `Set up SSH`. Inside that tab will be an ssh command that |
| 963 | + you can execute in a shell. |
| 964 | + |
| 965 | +3. Once you are connected through ssh, you may need to enter a docker |
| 966 | + container. Run `docker ps` to check if there are any docker |
| 967 | + containers running. Note that your CI job might be in the process |
| 968 | + of initiating a docker container, which means it will not show up |
| 969 | + yet. It is best to wait until the CI job reaches a step where it is |
| 970 | + building pytorch or running pytorch tests. If the job does have a |
| 971 | + docker container, run `docker exec -it IMAGE_ID /bin/bash` to |
| 972 | + connect to it. |
| 973 | + |
| 974 | +4. Now you can find the pytorch working directory, which could be |
| 975 | + `~/workspace` or `~/project`, and run commands locally to debug |
| 976 | + the failure. |
0 commit comments