-
Notifications
You must be signed in to change notification settings - Fork 403
Closed
Description
Found while debugging hangs in tarantool/tarantool-qa#166
One of the nodes logs contains the following output:
2022-04-26 17:32:07.767 [36183] relay/unix/:(socket)/102/main:reader V> Got a corrupted row:
2022-04-26 17:32:07.767 [36183] relay/unix/:(socket)/102/main:reader V> 00000000: 81 00 00 81 26 81 01 09 02 01
2022-04-26 17:32:07.767 [36183] main/117/applier/unix/:(socket) I> RAFT: message {term: 4, vote: 2, leader: 2, leader is seen: true, state: follower} from 3
2022-04-26 17:32:07.767 [36183] main/117/applier/unix/:(socket) I> RAFT: vote request is skipped - the leader is already known - 2
2022-04-26 17:32:07.767 [36183] relay/unix/:(socket)/101/main xrow.c:229 E> ER_INVALID_MSGPACK: Invalid MsgPack - packet bodyThe issue is with bad vclock encoding:
| header: map of 1 element
| | type: IPROTO_OK
| / \ | body: map of 1 element
V / \ V
81 00 00 81 26 81 01 09 02 01
\ /
vclockThe vclock is incorrect: 81 states that the vclock contains a single replica_id : lsn pair, but two follow: 01 09 and 02 01.
With the following patch one of the nodes occasionally fails with an assertion in xrow_encode_vclock inside applier_thread_writer_f:
diff --git a/src/box/xrow.c b/src/box/xrow.c
index bbd697f11..2447fb8f2 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -62,6 +62,7 @@ mp_sizeof_vclock_ignore0(const struct vclock *vclock)
static inline char *
mp_encode_vclock_ignore0(char *data, const struct vclock *vclock)
{
+ int map_size = vclock_size_ignore0(vclock);
data = mp_encode_map(data, vclock_size_ignore0(vclock));
struct vclock_iterator it;
vclock_iterator_init(&it, vclock);
@@ -72,6 +73,7 @@ mp_encode_vclock_ignore0(char *data, const struct vclock *vclock)
for ( ; replica.id < VCLOCK_MAX; replica = vclock_iterator_next(&it)) {
data = mp_encode_uint(data, replica.id);
data = mp_encode_uint(data, replica.lsn);
+ assert(--map_size >= 0);
}
return data;
}Seems the reason is replicaset.vclock access from 2 threads: TX updates vclock on a yet another WAL write, and applier tries to encode vclock at the same moment.
Reactions are currently unavailable