rseq() c/r support#1706
Conversation
93192f5 to
4d7f1d1
Compare
|
You can run tests in CI with a kernel >= 5.13 if you use our KVM based setup which we use for vdso=0 testing. If you start a rawhide container on the Fedora Vagrant VM you should be able to get latest glibc with a new enough kernel. |
d46609b to
7d23bde
Compare
Codecov Report
@@ Coverage Diff @@
## criu-dev #1706 +/- ##
============================================
- Coverage 69.35% 69.30% -0.06%
============================================
Files 128 128
Lines 32087 32213 +126
============================================
+ Hits 22255 22326 +71
- Misses 9832 9887 +55
Continue to review full report at Codecov.
|
|
Hehe, great. We have CentOS 8 working. That's good news because it's the corner case when we have |
Yes, thanks for pointing it out. I've taken a look at |
Just run a rawhide container via Podman on the vagrant VM. You can basically re-use our rawhide on GitHub Actions setup in the VM and you should have everything you need. |
thanks, Adrian ;) Will try! |
49b90a6 to
c6d2ff1
Compare
great! Let's reopen that PR? |
Add rseq syscall numbers for: arm/aarch64, mips64, ppc64le, s390, x86_64/x86 Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Add "get_rseq_conf" feature corresponding to the ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Support basic rseq C/R scenario. Assume that: - there are no processes with IP inside the rseq critical section (CS) - kernel has ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support On dump: 1. use ptrace(PTRACE_GET_RSEQ_CONFIGURATION) to get struct rseq pointer, rseq size and signature from the kernel. 2. save to the image On restore: 1. get rseq ptr, size, signature from the image 2. register it back using rseq() from the restorer parasite Fixes: checkpoint-restore#1696 Reported-by: Radostin Stoyanov <radostin@redhat.com> Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
|
@mihalicyn OK, I'll try to review this tomorrow. |
…f feature A lot of kernel versions lacks support for ptrace(PTRACE_GET_RSEQ_CONFIGURATION). But the userspace may be fresh (for instance containers with fresh Fedora runs on CentOS 7 host). Consider two scenarious: - kernel has no ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support 1. there is a process which use rseq => fail dump 2. there is no process which use rseq => we can dump without any problems But how to determine if process use rseq or not without get_rseq_conf feature? Let's just try to do rseq registration from the parasite. If rseq is already registered then we'll got EBUSY error. If not we'll success in registration. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Here we just want to check that if rseq was registered before C/R it remains registered after it. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Let's see how rseq() C/R feature works This reverts commit d99def7. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
We have ability to use nested virtualization on Cirrus, and already have "Vagrant Fedora based test (no VDSO)" test, let's do analogical for Fedora Rawhide to get fresh kernel. Suggested-by: Adrian Reber <areber@redhat.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Let's take thread_pointer() implementation from Glibc. It will be useful in the further because Glibc stores struct rseq on the TLS. Absolute address can be calculated as __criu_thread_pointer() + __rseq_offset. __rseq_offset is an exported symbol from Glibc itself. We need to have an ability to determine where struct rseq is stored to unregister it in CRIU during the restore stage. For different libc like musl-libc we will have to handle rseq separately depends on how struct rseq is stored. Right now that's not a problem because musl-libc has no rseq support, so we don't need to unregister it. https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5 https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=cb976fba4c51ede7bf8cee5035888527c308dfbc Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
|
Dear friends, Andrei suggests using the simpler and better approach to make the rseq cs section execution time larger by using the |
Fresh glibc does rseq registration by default during start_thread(). [ see https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=95e114a0919d844d8fe07839cb6538b7f5ee920e ] This cause process crashes during memory restore procedure, because memory which corresponds to the struct rseq will be unmapped and overriden in __export_restore_task. Let's perform rseq unregistration just before unmap_old_vmas(). To achieve that we need to determine (struct rseq) address at first while we are in Glibc (we do that in prep_libc_rseq_info using Glibc exported symbols). See also ("nptl: Add public rseq symbols and <sys/rseq.h>") https://sourceware.org/git?p=glibc.git;a=commit;h=c901c3e764d7c7079f006b4e21e877d5036eb4f5 ("nptl: Add <thread_pointer.h> for defining __thread_pointer") https://sourceware.org/git?p=glibc.git;a=commit;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5 TODO: do the same for musl-libc if it will start to register rseq by default Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Fresh Glibc does rseq() register by default. We need to unregister rseq before registering our own. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
If we caught the process when it's inside rseq critical section we have to handle it properly. From the kernel side of view, if the process is executing inside the rseq cs and gets a signal, rseq critical section execution will be interrupted and after signal handler execution, we will proceed to rseq cs abort handler instead of continuing normal rseq cs execution (if RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL isn't set). When CRIU seizes processes that's the same thing as getting signal from the rseq point of view. So we need to fixup instruction pointer to rseq cs abort handler address. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
This reverts commit f008f74. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Userspace may configure rseq cs abort policy by
setting RSEQ_CS_FLAG_NO_RESTART_ON_* flags.
In ("cr-dump: fixup thread IP when inside rseq cs") we have supported
the case when process was caught by CRIU during rseq cs execution by
fixing up IP to abort_ip. Thats a common case, but there is special flag
called RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL, in this case we have to leave
process IP as it was before CRIU seized it. Unfortunately, that's not
all that we need here. We also must preserve (struct rseq)->rseq_cs field.
You may ask like "why we need to preserve it by hands? CRIU is dumping
all process memory and restores it". That's true. But not so easy. The problem
here is that the kernel performs this field cleanup when it realized that
the process gets out of rseq cs. But during dump/restore procedures we are
executing parasite/restorer from the process context. It means that process
will get out of rseq cs in any case and (struct rseq)->rseq_cs will be cleared
by the kernel. So we need to restore this field by hands at the *last* stage
of restore just before releasing processes.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
That's strange but rseq02 test fails with: 09:06:16.222: 51: exit 555f52082120 555f52082120 09:06:16.282: 51: exit 555f52082120 555f52082120 09:06:16.340: 51: exit 555f52082120 555f52082120 09:06:16.397: 51: exit 555f52082120 555f52082120 09:06:16.503: 51: exit 0 555f52082120 09:06:16.503: 51: FAIL: rseq02.c:235: Failed to increment per-cpu counter (errno = 2 (No such file or directory)) 09:06:16.503: 51: FAIL: rseq02.c:246: (errno = 16 (Device or resource busy)) It means that rseq_cs pointer was cleaned up by the kernel despite of NO_RESTART* flags. That's a hardly reproducible and I will investigate that. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
We have a separate target for alpine in script/ci/Makefile which defines some extra opts for zdtm using ZDTM_OPTIONS variable. But really it doesn't work. First of all, variable should be named as ZDTM_OPTS and also we have to specify it directly in the CONTAINER_RUNTIME cmdline to make it work. I've also changed variable value just to make it consistent with docker.env value which was really used. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
It looks like we've got broken fhandles from fdinfo for inotifies/fanotifies for btrfs. I will look into that. Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
mountinfo contains more info than just "mount" output Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
|
JFYI: I'm working on the article https://criu.org/Rseq |
This is patchset provides the
rseq()C/R supportThere are four patches which provide desired support:
cr-dump: handle rseq/rseq_cs flags properly
cr-dump: fixup thread IP when inside rseq cs
rseq: fail dump if rseq is used but host doesn't support get_rseq_conf feature
rseq: initial support
Have done:
task_structrseq-related fields)rseq()syscall is present butptrace(PTRACE_GET_RSEQ_CONFIGURATION)is not.RSEQ_CS_*flagsFixes #1696
https://criu.org/Rseq