Skip to content

C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed #40688

@mmartial

Description

@mmartial

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04 -- building inside Dockerfile with FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
  • TensorFlow installed from: source
  • TensorFlow version: 2.2.0
  • Python version: 3.6.9
  • Installed using virtualenv? pip? conda?: No
  • Bazel version: 2.0.0 (extracted from _TF_MAX_BAZEL)
  • GCC/Compiler version: 7.4.0
  • CUDA/cuDNN version: 10.1 / 7
  • GPU model and memory: tested on Titan XP and RTX 2070 8GB

Describe the problem

Build fails with

ESC[0mESC[91mtensorflow/python/lib/core/bfloat16.cc: In function 'bool tensorflow::{anonymous}::Initialize()':
tensorflow/python/lib/core/bfloat16.cc:636:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const c
har [6], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int
*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:640:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const c
har [10], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int
*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:643:77: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const c
har [5], <unresolved overloaded function type>, const std::array<int, 3>&)'
   if (!register_ufunc("less", CompareUFunc<Bfloat16LtFunctor>, compare_types)) {

                                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:647:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [8], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:651:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [11], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:655:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [14], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
ESC[0mESC[91mTarget //tensorflow/tools/pip_package:build_pip_package failed to build
ESC[0mESC[91mERROR: /usr/local/src/tensorflow/tensorflow/tools/pip_package/BUILD:62:1 C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)
ESC[0mESC[91mINFO: Elapsed time: 1828.057s, Critical Path: 881.14s
INFO: 13824 processes: 13824 local.
ESC[0mESC[91mFAILED: Build did NOT complete successfully
ESC[0mESC[91mFAILED: Build did NOT complete successfully
ESC[0mESC[91mCommand exited with non-zero status 1

Provide the exact sequence of commands / steps that you executed before running into the problem

Reproducible with the following Dockerfile

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04

# Install system packages
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y \
  && apt-get install -y --no-install-recommends apt-utils \
  && apt-get install -y \
    build-essential \
    checkinstall \
    cmake \
    curl \
    g++ \
    gcc \
    git \
    locales \
    perl \
    pkg-config \
    protobuf-compiler \
    python3-dev \
    rsync \
    software-properties-common \
    unzip \
    wget \
    zip \
    zlib1g-dev \
  && apt-get clean

# UTF-8
RUN localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8

# Setup pip
RUN wget -q -O /tmp/get-pip.py --no-check-certificate https://bootstrap.pypa.io/get-pip.py \
  && python3 /tmp/get-pip.py \
  && pip3 install -U pip \
  && rm /tmp/get-pip.py
# Some TF tools expect a "python" binary
RUN ln -s $(which python3) /usr/local/bin/python

# /etc/ld.so.conf.d/nvidia.conf point to /usr/local/nvidia which seems to be missing, point to the cuda directory install for libraries
RUN cd /usr/local && ln -s cuda nvidia
ARG CTO_CUDA_VERSION="10.1"
ARG CTO_CUDA_PRIMEVERSION="10.0"
ARG CTO_CUDA_APT="cuda-npp-${CTO_CUDA_VERSION} cuda-cublas-${CTO_CUDA_PRIMEVERSION} cuda-cufft-${CTO_CUDA_VERSION} cuda-libraries-${CTO_CUDA_VERSION} cuda-npp-dev-${CTO_CUDA_VERSION} cuda-cublas-dev-${CTO_CUDA_PRIMEVERSION} cuda-cufft-dev-${CTO_CUDA_VERSION} cuda-libraries-dev-${CTO_CUDA_VERSION}"
RUN apt-get install -y --no-install-recommends \
  time ${CTO_CUDA_APT} \
  && apt-get clean

ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64"

# Install Python tools 
RUN pip3 install -U \
  mock \
  numpy \
  setuptools \
  six \
  wheel \
  && pip3 install 'future>=0.17.1' \
  && pip3 install -U keras_applications --no-deps \
  && pip3 install -U keras_preprocessing --no-deps \
  && rm -rf /root/.cache/pip

## Download & Building TensorFlow from source
ARG LATEST_BAZELISK=1.5.0
ARG CTO_TENSORFLOW_VERSION="2.2.0"
RUN curl -s -Lo /usr/local/bin/bazel https://github.com/bazelbuild/bazelisk/releases/download/v${LATEST_BAZELISK}/bazelisk-linux-amd64 \
  && chmod +x /usr/local/bin/bazel \
  && mkdir -p /usr/local/src \
  && cd /usr/local/src \
  && wget -q --no-check-certificate https://github.com/tensorflow/tensorflow/archive/v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && tar xfz v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && mv tensorflow-${CTO_TENSORFLOW_VERSION} tensorflow \
  && rm v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && cd /usr/local/src/tensorflow \
  && fgrep _TF_MAX_BAZEL configure.py | grep '=' | perl -ne 'print $1 if (m%\=\s+.([\d\.]+).$+%)' > .bazelversion
RUN cd /usr/local/src/tensorflow \
  && TF_CUDA_CLANG=0 TF_CUDA_VERSION=${CTO_CUDA_VERSION} TF_CUDNN_VERSION=7 TF_DOWNLOAD_CLANG=0 TF_DOWNLOAD_MKL=0 TF_ENABLE_XLA=0 TF_NEED_AWS=0 TF_NEED_COMPUTECPP=0 TF_NEED_CUDA=1 TF_NEED_GCP=0 TF_NEED_GDR=0 TF_NEED_HDFS=0 TF_NEED_JEMALLOC=1 TF_NEED_KAFKA=0 TF_NEED_MKL=0 TF_NEED_MPI=0 TF_NEED_OPENCL=0 TF_NEED_OPENCL_SYCL=0 TF_NEED_ROCM=0 TF_NEED_S3=0 TF_NEED_TENSORRT=0 TF_NEED_VERBS=0 TF_SET_ANDROID_WORKSPACE=0 TF_CUDA_COMPUTE_CAPABILITIES="5.3,6.0,6.1,6.2,7.0,7.2,7.5" GCC_HOST_COMPILER_PATH=$(which gcc) CC_OPT_FLAGS="-march=native" PYTHON_BIN_PATH=$(which python) PYTHON_LIB_PATH="$(python -c 'import site; print(site.getsitepackages()[0])')" ./configure
RUN cd /usr/local/src/tensorflow \
  && time bazel build --verbose_failures --config=opt --config=v2 --config=cuda //tensorflow/tools/pip_package:build_pip_package \
  && time ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg \
  && time pip3 install /tmp/tensorflow_pkg/tensorflow-*.whl

CMD bash

Built using docker build --tag cto:test .

Note tested with CUDA 10.1, 10.0 and 10.2.
Also occurs with TF 1.15.3

Any other info / logs
I can provide the full build log if requested (91MB)

 ---> Running in 9690386205a5
2020/06/22 14:11:17 Downloading https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64...
Extracting Bazel installation...
You have bazel 2.0.0 installed.
Found CUDA 10.1 in:
    /usr/local/cuda-10.1/lib64
    /usr/local/cuda-10.1/include
Found cuDNN 7 in:
    /usr/lib/x86_64-linux-gnu
    /usr/include


Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
Removing intermediate container 9690386205a5
 ---> 8910acc4d9c5
Step 19/20 : RUN cd /usr/local/src/tensorflow   && time bazel build --verbose_failures --config=opt --config=v2 --config=cuda //tensorflow/tools/pip_package:build_pip_package   && time ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg   && time pip3 install /tmp/tensorflow_pkg/tensorflow-*.whl
 ---> Running in 3b0267b1209d
ESC[91mStarting local Bazel server and connecting to it...
ESC[0mESC[91mWARNING: The following configs were expanded more than once: [v2, cuda, using_cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ESC[0mESC[91mINFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
ESC[0mESC[91mINFO: Reading rc options for 'build' from /usr/local/src/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /usr/local/src/tensorflow/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=v2
INFO: Reading rc options for 'build' from /usr/local/src/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python --action_env PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages --python_path=/usr/local/bin/python --action_env TF_CUDA_VERSION=10.1 --action_env TF_CUDNN_VERSION=7 --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-10.1 --action_env TF_CUDA_COMPUTE_CAPABILITIES=5.3,6.0,6.1,6.2,7.0,7.2,7.5 --action_env LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/extras/CUPTI/lib64 --action_env GCC_HOST_COMPILER_PATH=/usr/bin/x86_64-linux-gnu-gcc-7 --config=cuda --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:v2 in file /usr/local/src/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
ESC[0mESC[91mINFO: Found applicable config definition build:cuda in file /usr/local/src/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /usr/local/src/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain
INFO: Found applicable config definition build:opt in file /usr/local/src/tensorflow/.tf_configure.bazelrc: --copt=-march=native --host_copt=-march=native --define with_default_optimizations=true
INFO: Found applicable config definition build:v2 in file /usr/local/src/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /usr/local/src/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /usr/local/src/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain
INFO: Found applicable config definition build:linux in file /usr/local/src/tensorflow/.bazelrc: --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /usr/local/src/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
ESC[0mESC[91mLoading: 
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mDEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
ESC[0mESC[91mDEBUG: Call stack for the definition of repository 'io_bazel_rules_docker' which is a git_repository (rule definition at /root/.cache/bazel/_bazel_root/bbcc73fcc5c2b01ab08b6bcf7c29e42e/external/bazel_tools/tools/build_defs/repo/git.bzl:195:18):
 - /root/.cache/bazel/_bazel_root/bbcc73fcc5c2b01ab08b6bcf7c29e42e/external/bazel_toolchains/repositories/repositories.bzl:37:9
 - /usr/local/src/tensorflow/WORKSPACE:37:1
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
    currently loading: tensorflow/tools/pip_package
ESC[0mESC[91mDEBUG: /root/.cache/bazel/_bazel_root/bbcc73fcc5c2b01ab08b6bcf7c29e42e/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:5: 
[...]```


Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions