Skip to content

Race when accessing replicaset.vclock from applier thread #7089

@sergepetrenko

Description

@sergepetrenko

Found while debugging hangs in tarantool/tarantool-qa#166

One of the nodes logs contains the following output:

2022-04-26 17:32:07.767 [36183] relay/unix/:(socket)/102/main:reader V> Got a corrupted row:
2022-04-26 17:32:07.767 [36183] relay/unix/:(socket)/102/main:reader V> 00000000: 81 00 00 81 26 81 01 09 02 01
2022-04-26 17:32:07.767 [36183] main/117/applier/unix/:(socket) I> RAFT: message {term: 4, vote: 2, leader: 2, leader is seen: true, state: follower} from 3
2022-04-26 17:32:07.767 [36183] main/117/applier/unix/:(socket) I> RAFT: vote request is skipped - the leader is already known - 2
2022-04-26 17:32:07.767 [36183] relay/unix/:(socket)/101/main xrow.c:229 E> ER_INVALID_MSGPACK: Invalid MsgPack - packet body

The issue is with bad vclock encoding:

| header: map of 1 element
|    | type: IPROTO_OK
|   / \  | body: map of 1 element
V  /   \ V
81 00 00 81 26 81 01 09 02 01
               \            /
                   vclock

The vclock is incorrect: 81 states that the vclock contains a single replica_id : lsn pair, but two follow: 01 09 and 02 01.

With the following patch one of the nodes occasionally fails with an assertion in xrow_encode_vclock inside applier_thread_writer_f:

diff --git a/src/box/xrow.c b/src/box/xrow.c
index bbd697f11..2447fb8f2 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -62,6 +62,7 @@ mp_sizeof_vclock_ignore0(const struct vclock *vclock)
 static inline char *
 mp_encode_vclock_ignore0(char *data, const struct vclock *vclock)
 {
+       int map_size = vclock_size_ignore0(vclock);
        data = mp_encode_map(data, vclock_size_ignore0(vclock));
        struct vclock_iterator it;
        vclock_iterator_init(&it, vclock);
@@ -72,6 +73,7 @@ mp_encode_vclock_ignore0(char *data, const struct vclock *vclock)
        for ( ; replica.id < VCLOCK_MAX; replica = vclock_iterator_next(&it)) {
                data = mp_encode_uint(data, replica.id);
                data = mp_encode_uint(data, replica.lsn);
+               assert(--map_size >= 0);
        }
        return data;
 }

Seems the reason is replicaset.vclock access from 2 threads: TX updates vclock on a yet another WAL write, and applier tries to encode vclock at the same moment.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions