Skip to content

windows: use mingw-llvm#51197

Merged
idryomov merged 17 commits intoceph:mainfrom
petrutlucian94:mingw-llvm
Sep 1, 2023
Merged

windows: use mingw-llvm#51197
idryomov merged 17 commits intoceph:mainfrom
petrutlucian94:mingw-llvm

Conversation

@petrutlucian94
Copy link
Contributor

@petrutlucian94 petrutlucian94 commented Apr 24, 2023

We used to have deadlocks because of mingw/gcc and winpthreads shared lock bugs: #46989

We're now experiencing deadlocks when TLS keys are deleted as part of the exit routine:
https://pastebin.com/raw/p3KKquwS

Switching to mingw-llvm solves the problem. For now, we'll have a TOOLCHAIN build setting that accepts either mingw-llvm or mingw-gcc, defaulting to mingw-llvm. Another advantage of mingw-llvm is that it can output pdb symbols, which can be used with WinDbg.

Note that mingw-llvm uses libc++ instead of libstdc++ and does not rely on winpthreads for std::thread support.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@petrutlucian94 petrutlucian94 added the win32 Specifix changes for the windows platform label Apr 24, 2023
@petrutlucian94
Copy link
Contributor Author

There are a few test failures, I'll mark the PR as a draft until those issues are addressed.

[2023-04-24T13:07:08.000Z] [googletest] ceph_test_dokan failed. Error: Command returned non-zero code(1): "cmd /c 'C:\ceph\ceph_test_dokan.exe --gtest_output=xml:C:\workspace\test_results\out\ceph_test_dokan\ceph_test_dokan_results.xml  >> C:\workspace\test_results\out\ceph_test_dokan\ceph_test_dokan_results.log 2>&1'".
[2023-04-24T13:07:08.000Z] [googletest] unittest_json_formattable failed. Error: Command returned non-zero code(1): "cmd /c 'C:\ceph\unittest_json_formattable.exe --gtest_output=xml:C:\workspace\test_results\out\unittest_json_formattable\unittest_json_formattable_results.xml  >> 
[2023-04-24T13:07:08.000Z] [googletest] unittest_subprocess failed. Error: Command returned non-zero code(1): "cmd /c 'C:\ceph\unittest_subprocess.exe --gtest_output=xml:C:\workspace\test_results\out\unittest_subprocess\unittest_subprocess_results.xml --gtest_filter="-SubProcessTimed.SubshellTimedout" >> 
[2023-04-24T13:14:32.000Z] [isolated][googletest] unittest_admin_socket failed. Error: Command returned non-zero code(1): "cmd /c 'C:\ceph\unittest_admin_socket.exe --gtest_output=xml:C:\workspace\test_results\out\unittest_admin_socket\unittest_admin_socket_results.xml --gtest_filter="*" >> C:\workspace\test_results\out\unittest_admin_socket\unittest_admin_socket_results.log 2>&1'".

cluster:
    id:     6f1432dd-1afc-4426-8298-7bbe4c822f8e
    health: HEALTH_WARN
            Module 'rbd_support' has failed dependency: No module named 'dateutil'
            1 pool(s) do not have an application enabled
            3 pool(s) have no replicas configured

@petrutlucian94 petrutlucian94 marked this pull request as draft April 24, 2023 13:26
@petrutlucian94
Copy link
Contributor Author

jenkins test windows

@petrutlucian94
Copy link
Contributor Author

jenkins test make check

@petrutlucian94 petrutlucian94 marked this pull request as ready for review April 28, 2023 06:57
@petrutlucian94
Copy link
Contributor Author

jenkins test make check

@petrutlucian94
Copy link
Contributor Author

jenkins test windows

@petrutlucian94
Copy link
Contributor Author

Retried the windows job, cephfs tests were hanging as MDS was down:

services:
    mon: 3 daemons, quorum a,b,c (age 62m)
    mgr: x(active, since 62m)
    mds: 0/0 daemons up

@petrutlucian94
Copy link
Contributor Author

jenkins test api

@petrutlucian94
Copy link
Contributor Author

jenkins test make check

@petrutlucian94
Copy link
Contributor Author

jenkins test windows

@petrutlucian94 petrutlucian94 requested a review from a team as a code owner May 10, 2023 09:18
@github-actions github-actions bot added cephfs Ceph File System libcephsqlite ceph SQLite3 VFS mon labels May 10, 2023
@petrutlucian94 petrutlucian94 requested a review from a team as a code owner August 28, 2023 11:10
@petrutlucian94
Copy link
Contributor Author

jenkins test make check

@petrutlucian94
Copy link
Contributor Author

@tchaikov thanks for reviewing this PR, I've addressed the comments. Please let us know if we can ahead and merge it.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest squashing "cmake: simplify arch lib linking" into "cmake: avoid duplicate symbols" because former just undoes most of the latter.

Some symbols from the crc32, arch and fmt libs
are re-exported by libceph-common:

  FAILED: bin/unittest_time.exe
  ld.lld: error: fmt::v9::format_error::~format_error() was replaced

llvm throws errors because of the duplicate symbols.
One workaround is to use objects instead of static libs
for the libs. For libfmt we'll use the header-only version.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
clang errors out because of a few type mismatches that gcc
ignored through "-fpermissive".

We'll need to cast a few void pointers to the appropriate type.
There's also a function that doesn't have an explicit return type,
which was omitted by mistake.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
The "-Bsymbolic" and "-Bsymbolic-functions" flags only apply to ELF
binaries.

llvm errors out when targeting Windows, which is why we'll need
to skip those flags for Windows builds.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Especially when targeting Windows, llvm may not necessarily
use pthreads for std::thread. In this case, we must not use the
"native" thread handle with the pthreads API.

We'll update the ceph_pthread_getname and ceph_pthread_setname
wrappers, adding a new one: ceph_pthread_kill.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
This test currently failes to build for Windows using llvm
due to unresolved symbols.

We'll address the issue by explicitly specifying the ceph-common
dependency.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
We're checking a permission denied exception message that's
runtime specific:

  /mnt/data/workspace/ceph_mingw_clang/src/test/dokan/dokan.cc:252: Failure
  Expected equality of these values:
    e.what()
      Which is: "filesystem error: in remove: Permission denied
      [\"Z:\\ro_success_b76223c4-c590-45e0-ab78-4d281ac512b5\"]"
    exception_msg.c_str()
      Which is: "filesystem error: cannot remove: No such device
      [Z:\\ro_success_b76223c4-c590-45e0-ab78-4d281ac512b5]"

In order to support libc++, we'll drop the exception message
assertion and rely on the exception type.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
winpthreads is a library that emulates the pthreads API using
Windows primitives. It's also used by the mingw/gcc libstdc++
for std::thread.

The issue is that winpthreads isn't well maintained. There
have been numerous bugs that haven't been addressed in years.
Specifically, we've been hitting deadlocks because of the
winpthreads rw lock implementation.

This change will allow building Ceph for Windows using mingw/llvm,
which uses libc++ and doesn't rely on winpthreads.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
We've been experiencing timer hangs with mingw-llvm.
std::condition_variable::wait_for was returning a few microseconds
before the requested time and then hanging when called with a
small interval (e.g. microseconds).

This was affecting the OSD periodic tick, which would hang after
a while (20m up to 2h).

The issue can be reproduced with a timer loop and a small interval
(e.g. 40us), in which case the timer is likely to hang after about
10s.

We're adding some tests, while the actual mingw-llvm issue will
be mitigated in a separate commit.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
The monotonic clocks are commonly used for measuring time deltas,
which can be negative.

ceph::mono_clock and ceph::coarse_mono_clock currently use
unsigned duration types [1]. The difference operators are overloaded
in order to ensure that the result is signed [2][3].

However, we still have issues when unsigned timespans are compared.
For example, std::condition::wait_for can hang indefinitely due
to underflows [4][5]. It ends up using our unsigned type for a
negative timespan, which is then compared to
std::chrono::duration<Rep,Period>::zero.

In order to avoid such problems, we'll simply use a signed type
for monotonic clock durations.

With signed timespans, we can no longer assume that time_point::zero()
is equal to time_point::min(), so we're updating it accodingly.

[1] https://github.com/ceph/ceph/blob/4040f12347a5f48520f8ff2f83065b9ee3a36f68/src/common/ceph_time.h#L285
[2] https://github.com/ceph/ceph/blob/4040f12347a5f48520f8ff2f83065b9ee3a36f68/src/common/ceph_time.h#L345-L380
[3] https://github.com/ceph/ceph/blob/4040f12347a5f48520f8ff2f83065b9ee3a36f68/src/common/ceph_time.h#L466-L487
[4] https://github.com/llvm/llvm-project/blob/91cff8a71872cf49f0c5c9e5510f8065bfefa3c3/libcxx/include/__condition_variable/condition_variable.h#L178
[5] https://github.com/llvm/llvm-project/blob/91cff8a71872cf49f0c5c9e5510f8065bfefa3c3/libcxx/include/__condition_variable/condition_variable.h#L193

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
std::condition_variable::wait_for uses SleepConditionVariableSRW
on Windows, which has millisecond precision.

In order to avoid busy loops, we won't wait for less than one
millisecond on Windows.

Note that this situation is quite common since on Windows,
"wait_for" often returns ~1ms before the specified timeout.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Because of winpthreads issues, we had to use Boost's shared_mutex
implementation.

When using mingw-llvm, we can safely use libc++'s shared mutex
implementation.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
mingw-llvm can't handle the '--exclude-libs' linker flag, so
we'll have to skip it.

At the same time, cmake can't locate the boost shared libs as the
import libs are not generated when using mingw-llvm due to boost's
clang-linux.jam file. For now, we'll patch the cmake files, using
the dlls as import libs (which is allowed by mingw).

While at it, we'll avoid linking the static AND dynamic boost libs,
speeding the build.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Boost stacktrace defines a few UUIDs that were recently added
to mingw as well [1], causing compilation errors [2]:

  In file included from libs/stacktrace/build/../src/windbg.cpp:9:
  ./boost/stacktrace/detail/frame_msvc.ipp:31:5: error: redefinition of
    '__mingw_uuidof_s<IDebugClient>'
    __CRT_UUID_DECL(IDebugClient,0x27fe5639,...

We'll apply a fix that hasn't merged upsteam yet [3].

[1] mirror/mingw-w64@ce5a9f6
[2] boostorg/stacktrace#133
[3] boostorg/stacktrace#140

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cmake changes look good. thanks

@idryomov idryomov merged commit ff570c6 into ceph:main Sep 1, 2023
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this pull request Sep 25, 2023
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this pull request Oct 11, 2023
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this pull request Nov 7, 2023
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this pull request Dec 6, 2023
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this pull request Jan 2, 2024
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
leonid-s-usov pushed a commit to ceph/ceph-ci that referenced this pull request Feb 13, 2024
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph/ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit 80d2d01)
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
leonid-s-usov pushed a commit to leonid-s-usov/ceph-ceph that referenced this pull request Feb 13, 2024
We're getting an ICE when trying to compile this test using
mingw-gcc and recent Boost versions. Note that mingw-llvm works fine.

    during IPA pass: inline
    /mnt/data/workspace/ceph.pr/src/test/common/test_back_trace.cc:44:1:
    internal compiler error: Segmentation fault
       44 | }
          | ^
    0x7f9c4a86c51f ???
            ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
    0x7f9c4a853d8f __libc_start_call_main
            ../sysdeps/x86/libc-start.c:58
    0x7f9c4a853e3f __libc_start_main_impl
            ../sysdeps/nptl/libc_start_call_main.h:392

For now, we'll just skip the test when using mingw-gcc. Note
that we're planing to switch to mingw-llvm anyway:
ceph#51197

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit 80d2d01)
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build/ops cephfs Ceph File System common core crimson documentation libcephsqlite ceph SQLite3 VFS mon rbd tests win32 Specifix changes for the windows platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants