Skip to content

VMware Plugin Backups corrupted on Backups >2TB or >2hr #2009

@UraniumDonut

Description

@UraniumDonut

Bareos component version

bareos-dir: 23.0.3pre124.bec7dec60
bareos-fd: 23.0.3
pre124.bec7dec60
bareos-sd: 23.0.3pre124.bec7dec60
bareos-vmware-plugin: 23.0.3
pre124.bec7dec60-90

Steps to reproduce

  1. Create a backup with the VMware plugin from a machine that has a virtual disk >2TB. (In my case the backup took >2hr, which is more than the default keepalive limit, but I am unsure if this is the source of the problem)
  2. a) Restore onto the same virtual machine
    b) Restore by exporting into a vmdk file using localvmdk=yes

Expected results

a) The virtual disk is properly restored to the same state like it was at backup time, it has no errors.
b) A vmdk file is created, that has the state of the virtual disk at backup time

Actual results

a) The virtual disk is restored, but has a lot of corrupted parts. An fsck sometimes fixes the issue, sometimes not
b) The creation of the vmdk file fails with error 16000 (see log)

Environment

- OS: Debian GNU/Linux 12 (bookworm)
- component:
vSphere Client version 8.0.2.00000
VMware ESXi, 8.0.1, 22088125
Guest OS: Red Hat Enterprise Linux 8

Relevant log output

Restore log VMware:

14	2024-10-28 18:32:22	bkpserv JobId 3052: Bareos bkpserv 23.0.3~pre124.bec7dec60 (28May24):
Build OS: Debian GNU/Linux 12 (bookworm)
JobId: 3052
Job: RestoreFiles.2024-10-28_10.26.04_31
Restore Client: "bkpserv " 23.0.3~pre124.bec7dec60 (28May24) Debian GNU/Linux 12 (bookworm),debian
Start time: 28-Oct-2024 15:08:36
End time: 28-Oct-2024 18:32:22
Elapsed time: 3 hours 23 mins 46 secs
Files Expected: 2
Files Restored: 2
Bytes Restored: 1,297,309,048,460
Rate: 106110.7 KB/s
FD Errors: 0
FD termination status: OK
SD termination status: OK
Bareos binary info: Bareos community build (UNSUPPORTED): Get professional support from https://www.bareos.com
Job triggered by: User
Termination: Restore OK

13	2024-10-28 18:32:20	bkpserv-fd JobId 3052: python3-fd-mod: keepalive failed with The session is not authenticated., last keepalive was 12221.71064543724 s ago, trying to reconnect.
12	2024-10-28 18:32:03	buserv-sd JobId 3052: Releasing device "buserv-Tape" (/dev/nst0).
11	2024-10-28 15:08:39	buserv-sd JobId 3052: Forward spacing Volume "000010L8" to file:block 1512:0.
10	2024-10-28 15:08:39	buserv-sd JobId 3052: Ready to read from volume "000010L8" on device "buserv-Tape" (/dev/nst0).
9	2024-10-28 15:08:39	bkpserv-fd JobId 3052: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
8	2024-10-28 15:08:39	bkpserv-fd JobId 3052: Connected Storage daemon at buserv:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
7	2024-10-28 15:08:36	bkpservJobId 3052: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
6	2024-10-28 15:08:36	bkpservJobId 3052: Handshake: Immediate TLS
5	2024-10-28 15:08:36	bkpservJobId 3052: Connected Client: bkpservat 127.0.0.1:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
4	2024-10-28 15:08:36	bkpserv JobId 3052: Using Device "buserv-Tape" to read.
3	2024-10-28 15:08:36	bkpserv JobId 3052: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
2	2024-10-28 15:08:36	bkpserv JobId 3052: Connected Storage daemon at buserv:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
1	2024-10-28 15:08:36	bkpserv JobId 3052: Start Restore Job RestoreFiles.2024-10-28_10.26.04_31

Restore log vmdk:

19	2024-11-04 19:00:49	bkpserv JobId 3109: Error: Bareos bkpserv 23.0.3~pre124.bec7dec60 (28May24):
Build OS: Debian GNU/Linux 12 (bookworm)
JobId: 3109
Job: RestoreFiles.2024-11-04_18.59.09_50
Restore Client: "bkpserv" 23.0.3~pre124.bec7dec60 (28May24) Debian GNU/Linux 12 (bookworm),debian
Start time: 04-Nov-2024 18:59:11
End time: 04-Nov-2024 19:00:49
Elapsed time: 1 min 38 secs
Files Expected: 2
Files Restored: 2
Bytes Restored: 270,840
Rate: 2.8 KB/s
FD Errors: 1
FD termination status: Fatal Error
SD termination status: Fatal Error
Bareos binary info: Bareos community build (UNSUPPORTED): Get professional support from https://www.bareos.com
Job triggered by: User
Termination: *** Restore Error ***

18	2024-11-04 19:00:49	buserv-sd JobId 3109: Releasing device "buserv-Tape" (/dev/nst0).
17	2024-11-04 19:00:49	buserv-sd JobId 3109: Error: lib/bsock_tcp.cc:455 Socket has errors=1 on call to client:192.168.123.123:9103
16	2024-11-04 19:00:49	buserv-sd JobId 3109: Fatal error: stored/read.cc:145 Error sending to File daemon. ERR=Connection reset by peer
15	2024-11-04 19:00:49	buserv-sd JobId 3109: Error: lib/bsock_tcp.cc:418 Wrote 178755 bytes to client:192.168.123.123:9103, but only 147456 accepted.
14	2024-11-04 19:00:49	bkpserv-fd JobId 3109: Fatal error: python3-fd-mod: plugin_io[IO_CLOSE]: bareos_vadp_dumper returncode: 1
13	2024-11-04 19:00:49	bkpserv-fd JobId 3109: Fatal error: python3-fd-mod: check_dumper(): bareos_vadp_dumper returncode: 1 error output:

11	2024-11-04 18:59:31	buserv-sd JobId 3109: Forward spacing Volume "000010L8" to file:block 1512:0.
10	2024-11-04 18:59:31	buserv-sd JobId 3109: Ready to read from volume "000010L8" on device "buserv-Tape" (/dev/nst0).
9	2024-11-04 18:59:13	bkpserv-fd JobId 3109: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
8	2024-11-04 18:59:13	bkpserv-fd JobId 3109: Connected Storage daemon at buserv:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
7	2024-11-04 18:59:11	bkpservJobId 3109: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
6	2024-11-04 18:59:11	bkpservJobId 3109: Handshake: Immediate TLS
5	2024-11-04 18:59:11	bkpservJobId 3109: Connected Client: bkpservat 127.0.0.1:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
4	2024-11-04 18:59:11	bkpservJobId 3109: Using Device "buserv-Tape" to read.
3	2024-11-04 18:59:11	bkpservJobId 3109: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
2	2024-11-04 18:59:11	bkpservJobId 3109: Connected Storage daemon at buserv:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
1	2024-11-04 18:59:11	bkpservJobId 3109: Start Restore Job RestoreFiles.2024-11-04_18.59.09_50

Relevant traces output

No response

Anything else?

The vmdk error is incredibly similar to https://bugs.bareos.org/view.php?id=670. But that bug was fixed and the PR was merged into bareos a couple of years ago.

Metadata

Metadata

Assignees

Labels

bugThis addresses a bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions