Skip to content

Introduce libtorch to setup.py build#8792

Closed
anderspapitto wants to merge 2 commits intopytorch:masterfrom
anderspapitto:build-libtorch-root-cmake
Closed

Introduce libtorch to setup.py build#8792
anderspapitto wants to merge 2 commits intopytorch:masterfrom
anderspapitto:build-libtorch-root-cmake

Conversation

@anderspapitto
Copy link
Contributor

@anderspapitto anderspapitto commented Jun 22, 2018

Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other.

  1. with setup.py. This method
  • used the setuptools C extension functionality
  • worked on all platforms
  • did not build test_jit/test_api binaries
  • did not include the C++ api
  • always included python functionality
  • produced _C.so
  1. with cpp_build. This method
  • used CMake
  • did not support Windows or ROCM
  • was capable of building the test binaries
  • included the C++ api
  • did not build the python functionality
  • produced libtorch.so

This diff combines the two.

  1. cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build
  • is CMake-based
  • works on all platforms
  • builds the test binaries
  • includes the C++ api
  • does not include the python functionality
  • produces libtorch.so
  1. the setup.py build
  • compiles the python functionality
  • calls into the CMake build to build libtorch.so
  • produces _C.so, which has a dependency on libtorch.so

In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly.

@anderspapitto anderspapitto force-pushed the build-libtorch-root-cmake branch 11 times, most recently from 1f7bcc0 to 88b0f78 Compare June 26, 2018 18:20

This comment was marked as off-topic.

This comment was marked as off-topic.

setup.py Outdated

This comment was marked as off-topic.

@anderspapitto anderspapitto force-pushed the build-libtorch-root-cmake branch 4 times, most recently from 28de5e5 to 98c6a0a Compare June 26, 2018 19:14
@anderspapitto anderspapitto changed the title [WIP] Build libtorch root cmake Introduce libtorch to setup.py build Jun 26, 2018
@anderspapitto anderspapitto force-pushed the build-libtorch-root-cmake branch from 98c6a0a to d4c7bb3 Compare June 26, 2018 19:31

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@anderspapitto anderspapitto force-pushed the build-libtorch-root-cmake branch from 78b6a9f to 3933272 Compare July 3, 2018 15:21
@anderspapitto anderspapitto requested a review from ebetica as a code owner July 3, 2018 15:21
@anderspapitto anderspapitto force-pushed the build-libtorch-root-cmake branch from 6965cde to 8b1d5cb Compare July 5, 2018 19:51
@anderspapitto anderspapitto force-pushed the build-libtorch-root-cmake branch 2 times, most recently from 274f619 to 07c01b9 Compare July 6, 2018 19:59
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jul 8, 2018

build failures look legit

@anderspapitto
Copy link
Contributor Author

anderspapitto commented Jul 12, 2018

This diff is ready for another round of review. I highlight two points of potential concern.

  1. The ROCM builds consistently fail due to

This is particularly annoying because I did fix legit ROCM issues along the way, so I would like to get a clear signal that everything is now correct. However, my initial feeling is that fixing these infra-level resource issues is outside the scope of this diff.

  1. On windows, a shared library cannot expose global variables with dllexport. Straightforward application of the new TORCH_API macro results in errors messages like
C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build\torch/csrc/autograd/profiler.h(172): error C2492: 'thread_id': data with thread storage duration may not have dll interface

seen here https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/12062/console.

In this diff, I've addressed this issue by moving all the relevant thread_local global variables out of the header files, and instead exposing accessor methods across the shared-object boundary. I'm concerned that this may be a perf problem, if function call overhead is too high a cost to pay to access these profiling-related variables. So, is this a problem, and if so is there a suggested alternative approach?

@ezyang @goldsborough as existing reviewers (also please bring in anyone else with relevant knowledge. E.g. I'm not sure who is most familiar with the profiling code.)

edit: also @Jorghi12 regarding the ROCM tests

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderspapitto has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@goldsborough goldsborough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about some things, especially the newly disabled warnings


#include <ATen/ATen.h>

#include "torch/csrc/WindowsTorchApiMacro.h"

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

thread_local uint64_t Function::next_sequence_nr_ = 0;
/// Monotonically incrementing (thread local!) counter to supply sequence
/// numbers.
thread_local uint64_t Function_next_sequence_nr_ = 0;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

getEventList().record(EventKind::PopRange, std::string(), thread_id, state == ProfilerState::CUDA);
}
}
RangeEventList& getEventList();

This comment was marked as off-topic.

This comment was marked as off-topic.


py::class_<torch::autograd::profiler::Event>(m,"ProfilerEvent")
.def("kind",&torch::autograd::profiler::Event::kind)
.def("name",&torch::autograd::profiler::Event::name)

This comment was marked as off-topic.

namespace torch { namespace autograd {

struct GradMode {
static bool is_enabled() {

This comment was marked as off-topic.

# This is required for Python 2 declarations that are deprecated
# in 3.
-Wno-deprecated-declarations
# Python 2.6 requires -fno-strict-aliasing see

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

-Wno-unused-parameter
-Wno-missing-field-initializers
-Wno-write-strings
# -Wno-zero-length-array

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


if (APPLE)
target_compile_options(test_api PRIVATE
-Wno-error=unknown-warning-option

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


#ifdef _WIN32
# if defined(ATen_cpu_EXPORTS) || defined(caffe2_EXPORTS)
# if defined(torch_EXPORTS)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

setup.py Outdated

# we specify exact lib names to avoid conflict with lua-torch installs
CAFFE2_LIBS = [os.path.join(lib_path, 'libcaffe2.so')]
CAFFE2_LIBS = [os.path.join(lib_path, 'libcaffe2.so'), os.path.join(lib_path, 'libtorch.so')]

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

virtual at::Tensor unsafeTensorFromTH(void * th_pointer, bool retain) const override;

static at::Type* getType(const at::Type& baseType);
TORCH_API static at::Type* getType(const at::Type& baseType);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

-DBUILD_CAFFE2=OFF ^
-DBUILD_TORCH="%BUILD_TORCH%" ^
-DNVTOOLEXT_HOME="%NVTOOLEXT_HOME%" ^
-DNO_API=ON ^

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

target_compile_options(torch PRIVATE -Werror)
endif()

if (MSVC)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Jul 13, 2018

Would it be possible to get a detailed break down of the semantic changes that were made in plain English? E.g., something similar to the comment at #9358 ; it will help us reviewers out a lot.

endif()

if(MSVC OR APPLE)
target_link_libraries(torch caffe2_gpu_library ${TORCH_CUDA_LIBRARIES})

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

if (TORCH_BUILD_TEST)
# JIT Tests. TODO: Put into test/cpp/jit folder

# JIT Tests. TODO: Put into test/cpp/jit folder

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

endif()

target_link_libraries(test_api torch)
if (NOT NO_API AND NOT USE_ROCM)

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderspapitto has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderspapitto has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderspapitto has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

setup.py now compiles libtorch.so via CMake, and _C.so takes a
dependency on libtorch.so.  Most cpp files that were previously built
directly into _C.so have been removed from _C.so and added to
libtorch.so.
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderspapitto has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ebetica
Copy link
Contributor

ebetica commented Jul 26, 2018

@anderspapitto I'm using the PyTorch C++ API in csrc/api, and I used to rely on ./build_all.sh to build the libs before linking (in open source). However, this PR got rid of that, and the command seems quite cumbersome:

export BUILD_TYPE=relwithdebinfo; export LIBTORCH_BUILDPATH=build/libtorch; ./build_caffe2.sh; ./build_libtorch.sh

Any chance it could be brought back and supported in the tests again so it does not break?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants