Skip to content

Caffe2 flaky tests on CircleCI #12395

@yf225

Description

@yf225

When running on CircleCI, different tests are found to be flaky in different test environments. Here are the details:


Test environment

Host OS on CircleCI: Ubuntu 14.04 (kernel: 3.13.0-151-generic) → Tests are flaky
(As comparison, on Jenkins we use Ubuntu 16.04 (kernel: 4.4.0-1062-aws) → Tests are stable)

OS in Docker container: Ubuntu 14.04 (kernel: 4.4.0-1062-aws)

How are we sharing the build products from build stage to test stage?
Build products are all in the intermediate Docker image which is shared from build stage to test stage.

Flaky tests

  1. TestFcOperator.test_fc
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/40627
  2. TestCRFOp.test_crf_gradient
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/40508
  3. TestReduceOps.test_reduce_min
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/40506
  4. TestReduceFrontSum.test_col2im_gradients
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/40771
  5. TestLayerNormOp.test_layer_norm_op
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/40507
    2. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/40972
  6. TestLayerNormOp.test_layer_norm_grad_op
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/41306
  7. RecurrentNetworkTest.test_sum_mul
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/41027
  8. TestAdagrad.test_row_wise_sparse_adagrad
    1. caffe2_py2_gcc4_8_ubuntu14_04_test, https://circleci.com/gh/pytorch/pytorch/41227

Test environment

Host OS on CircleCI: Ubuntu 14.04 (kernel: 3.13.0-151-generic) → Tests are flaky
(As comparison, on Jenkins we use Ubuntu 16.04 (kernel: 4.4.0-1062-aws) → Tests are stable)

OS in Docker container: Ubuntu 16.04 (kernel: 4.4.0-1062-aws)

How are we sharing the build products from build stage to test stage?
Build products are copied from build stage to test stage, without sharing the intermediate Docker image.

Flaky tests

  1. GRUCellTest.test_gru_unit_op
    1. caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/39963
  2. TestLayerNormOp.test_layer_norm_grad_op
    1. caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/39939
  3. TestGlu.test_glu_old
    1. caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/40017
    2. caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/39980
  4. TestConvolution.test_conv_separate_stride_pad_gradients
    1. caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/39989
  5. TestReduceFrontSum.test_col2im_gradients
    1. caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/39979
  6. TestGroupConvolution.test_group_convolution
    1. caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test, https://circleci.com/gh/pytorch/pytorch/39978

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions