Skip to content

Tensorflow's hwloc build force-enables use of sys/sysctl.h, which breaks on recent Linux/glibc #45861

@HadrienG2

Description

@HadrienG2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux, openSUSE Tumbleweed, tested on various snapshots up to 20201216 .
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Not tested.
  • TensorFlow installed from (source or binary): Source.
  • TensorFlow version: 1.15.2
  • Python version: 3.6.9
  • Installed using virtualenv? pip? conda?: pip
  • Bazel version (if compiling from source): 0.26.1
  • GCC/Compiler version (if compiling from source): 9.3
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the problem

Tensorflow vendors the hwloc library and heavily customizes the way in which this library is built. I am not familiar enough with Bazel to fully understand the details of what you are doing here and the reasons why you are doing it, but unfortunately, what I do know is that on my machine, the net result is a broken hwloc build...

The immediate symptom is that some hwloc source files do not compile because they are configured to include the <sys/sysctl.h> header, which has been removed from glibc >=2.32 because the underlying system call has been removed from the Linux kernel since release 5.5.

This is not a hwloc bug/incompatibility however, as it would intuitively seem, because the hwloc build system is perfectly able to figure out that this header does not exist and the hwloc source code knows how to avoid using it when that happens.

The actual problem is this line of the tensorflow build system:

"#undef HAVE_SYS_SYSCTL_H": "#define HAVE_SYS_SYSCTL_H 1",

For some reason that I do not know, it is pretty clear that you force-set the HAVE_SYS_SYSCTL_H define, which would normally be unset by the hwloc build system after it correctly detects that there is no sysctl.h header...

Removing this line of the BUILD.bazel file fixes the build on my machine, but I can only assume that you added it for some reason (most likely to make the build work on an operating system that does use the sysctl.h header, but on which the hwloc build system does not correctly detect said header ?), which means that the actual tensorflow patch will need to be more nuanced and only perform this patch on the OS configurations where it is necessary.

I am not able to easily share the build instructions that I followed because they are inside a mildly complicated build system within a closed-source project. But from my understanding of the problem, detailed reproducer instructions should not be necessary here, you should be able to easily replicate this issue just by trying to build tensorflow from source, through any method of your choosing, on any Linux distribution that uses glibc >=2.32. Although I personally observed this problem on openSUSE Tumbleweed, I would also expect it to reproduce identically on Gentoo, Arch, or Fedora 34...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions