Skip to content
This repository was archived by the owner on Feb 6, 2023. It is now read-only.

Do not test against Python 2 / 3.4 for master branch#523

Merged
emcastillo merged 5 commits intochainer:masterfrom
kmaehashi:py3-only-for-master
Oct 10, 2019
Merged

Do not test against Python 2 / 3.4 for master branch#523
emcastillo merged 5 commits intochainer:masterfrom
kmaehashi:py3-only-for-master

Conversation

@kmaehashi
Copy link
Copy Markdown
Member

@kmaehashi kmaehashi commented Aug 15, 2019

Closes #490

Merge after Py2 drop in master branch (i.e. after releasing v7.0.0b3)

@kmaehashi kmaehashi changed the title [WIP] Do not test against Python 2 for master branch Do not test against Python 2 for master branch Aug 15, 2019
@kmaehashi kmaehashi changed the title Do not test against Python 2 for master branch [WIP] Do not test against Python 2 for master branch Aug 15, 2019
@kmaehashi kmaehashi force-pushed the py3-only-for-master branch from fb6433d to 055ceb9 Compare August 15, 2019 10:25
@kmaehashi kmaehashi changed the title [WIP] Do not test against Python 2 for master branch Do not test against Python 2 for master branch Aug 15, 2019
@emcastillo
Copy link
Copy Markdown
Member

pfnCI, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 055ceb9, target branch master) failed with status FAILURE.

@emcastillo
Copy link
Copy Markdown
Member

Some numpy-scipy version issues

@kmaehashi kmaehashi force-pushed the py3-only-for-master branch from 055ceb9 to 6e60f6c Compare August 26, 2019 07:13
@kmaehashi
Copy link
Copy Markdown
Member Author

pfnCI, test this please.

@kmaehashi kmaehashi changed the title Do not test against Python 2 for master branch Do not test against Python 2 / 3.4 for master branch Aug 26, 2019
@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 6e60f6c, target branch master) failed with status FAILURE.

@kmaehashi
Copy link
Copy Markdown
Member Author

pfnCI, test this please.

@kmaehashi
Copy link
Copy Markdown
Member Author

pfnCI, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 2f4970c, target branch master) failed with status FAILURE.

@emcastillo
Copy link
Copy Markdown
Member

NCCL_ERROR 🤔

@emcastillo
Copy link
Copy Markdown
Member

Jenkins, test this please

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 2f4970c, target branch master) failed with status FAILURE.

@kmaehashi
Copy link
Copy Markdown
Member Author

pfnCI, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit d712d44, target branch master) failed with status FAILURE.

@emcastillo
Copy link
Copy Markdown
Member

Please Fix CIs

@kmaehashi
Copy link
Copy Markdown
Member Author

Sorry for taking time, I'm still unable to reproduce NCCL_ERROR_UNHANDLED_CUDA_ERROR: unhandled cuda error locally. Will try reproducing it on Jenkins interactively.
Jenkins test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 92cfd88, target branch master) failed with status FAILURE.

@leofang
Copy link
Copy Markdown

leofang commented Sep 10, 2019

Hi, was directed to here from cupy/cupy#1941 to see if a pair of fresh eyes help. Just throwing some random thoughts (no experience with this error):

  1. Any chance Chainer is doing multi-threading on top of multiprocessing? According to NCCL release note (see here):

Using multiple processes in conjunction with multiple threads to manage the different GPUs may in some cases cause ncclCommInitRank to fail while establishing IPCs (cudaIpcOpenMemHandle). This problem does not appear when using only processes or only threads. This issue is fixed in recent driver versions, therefore, consider updating to the latest drive

  1. Based on 1, it could also be that the driver/NCCL version in the Jenkins test is too old?
  2. How about adding –shm-size=1g –ulimit memlock=-1 to nvidia-docker run? It's recommended in NCCL's doc.
  3. A broken GPU could cause ncclCommInitRank to raise ncclUnhandledCudaError: RuntimeError: NCCL Error 1: unhandled cuda error pytorch/pytorch#11756.

@leofang
Copy link
Copy Markdown

leofang commented Sep 16, 2019

can this be related to chainer/chainer#7511?

@emcastillo
Copy link
Copy Markdown
Member

I don't think it is, @kmaehashi was able to reproduce the bug locally so we will know soon :).
It sounds mostly like some kind of environment misconfiguration.

@leofang
Copy link
Copy Markdown

leofang commented Sep 17, 2019

Cool!

@kmaehashi
Copy link
Copy Markdown
Member Author

pfnCI, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 92cfd88, target branch master) failed with status FAILURE.

@beam2d
Copy link
Copy Markdown
Member

beam2d commented Sep 25, 2019

If fixing the failure is difficult for now, how about skipping the test case for the newly added configuration as a tentative workaround? This change affects a wide range of development and, IMHO, making them proceed quickly should have higher priority than fixing the failure cleanly.

@toslunar
Copy link
Copy Markdown
Member

How about giving a lucky id to Jenkins?

./run_combination_test.py --id 1

The current id (1) is lucky but becomes unlucky after the PR. FYI, the last daily_master_chainer-shuffled passes with 10 combinations out of 21.

@niboshi
Copy link
Copy Markdown
Member

niboshi commented Oct 7, 2019

What's the status?
It should not be difficult to skip affected tests as suggested by @beam2d.

@kmaehashi
Copy link
Copy Markdown
Member Author

Changed the test variation as suggested by @toslunar to avoid this situation (until we migrate Jenkins to new test infrastructure).
pfnCI, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 92cfd88, target branch master) failed with status FAILURE.

@kmaehashi
Copy link
Copy Markdown
Member Author

I'll test separately to find the proper combination that can pass the test.

@kmaehashi
Copy link
Copy Markdown
Member Author

pfnCI, test this please.
(--id 4 passed the test)

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 92cfd88, target branch master) failed with status FAILURE.

@emcastillo
Copy link
Copy Markdown
Member

Unlucky id 😂

@kmaehashi
Copy link
Copy Markdown
Member Author

Ah, sorry configuration was not updated correctly...
pfnCI, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 92cfd88, target branch master) failed with status FAILURE.

@kmaehashi
Copy link
Copy Markdown
Member Author

The failure is not related to this PR (will be fixed in #532).

@kmaehashi
Copy link
Copy Markdown
Member Author

Jenkins, test this please.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 92cfd88, target branch master) succeeded!

@emcastillo emcastillo merged commit a6c996e into chainer:master Oct 10, 2019
@kmaehashi kmaehashi deleted the py3-only-for-master branch October 10, 2019 07:25
@kmaehashi kmaehashi mentioned this pull request Oct 15, 2019
10 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Separate parameter combinations between master and stable

7 participants