Skip to content

Conversation

@gruuya
Copy link
Contributor

@gruuya gruuya commented Jul 30, 2024

Make sequence numbers optional in the protocol, and don't rely on their presence for the flushing mechanism to work. The reason for this is that PGD might not know the sequence number until the very end (last message) of the transaction.

This is achieved with a single queue of pending transactions, whereby with bookkeeping we:

  1. track which txs are complete (as soon as they have Some sequence number)
  2. track which txs have all tables/locations flushed out
  3. make sure to mark a tx as durable only if all preceding txs are also durable

@gruuya gruuya force-pushed the optional-sequence-number branch from 3021c0f to 31c6832 Compare July 30, 2024 15:28
@eatonphil
Copy link
Collaborator

Ok so we must not fill out the sequence number until the final batch for the transaction.

@eatonphil
Copy link
Collaborator

(Sorry I meant for that to be a question. :D )

@gruuya
Copy link
Contributor Author

gruuya commented Jul 30, 2024

Ok so we must not fill out the sequence number until the final batch for the transaction.

Well in principle I could bring back the last field, and have that denote the actual last message in the transaction. This would somewhat loosen the requirements on the sequence number, i.e. it could also become non-None prior to the last message. So something like the following sequence of messages:

(table_1, origin-A, None, false) <- still don't know the sequence number for this transaction
(table_2, origin-A, Some(1000), false) <- here you find out the sequence number and can start passing it along
(table_3, origin-A, Some(1000), true) <- the transaction is now complete (last = true)

Then we'd require that at least one of the messages in the transaction contains the sequence number.

Let me know if this would simplify things for you somehow.

@eatonphil
Copy link
Collaborator

No I think it seems simpler to do it as you have it. I just wanted to confirm

bool last = 6;
// Monotonically-increasing transaction number.
// Only specified in the last message of a transaction
optional uint64 sequence_number = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just noting this won't be bw compatible with existing clients)

Comment on lines +175 to +181
if origin != tx.origin {
return Err(SyncError::InvalidMessage {
reason: format!(
"Transaction from a new origin ({origin}) started without finishing the transaction from the previous one ({})",
tx.origin
)
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen during failover though? @eatonphil

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think I commented about this in tests Marko mentioned me on. You should not panic. While it's an invalid scenario it is certainly a possible scenario. You should not crash or become unrecoverable from this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example is not even failover but just if the writer process itself crashes midway through. We may not send the full transaction and then may send it again (from the same origin).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the above merely returns an error to the caller, and doesn't panic/crash the process. Then presumably the other side can re-adjust it's position in the message queue and try again?

// sync are actually redundant (since they were already persisted) and skip them.
// Also create a new `SyncCommitInfo` that corresponds to the last full transaction in the
// pending changes,
fn skip_syncs<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't @eatonphil mention something about how we can't actually skip over messages with an LSN < last committed because they might have changed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I still believe you must not do that. cc @vadim4o

@gruuya gruuya force-pushed the optional-sequence-number branch from 67fb7bb to 2156dc0 Compare July 31, 2024 13:55
@gruuya gruuya force-pushed the optional-sequence-number branch from bc5304d to 7e96f81 Compare August 2, 2024 09:57
@gruuya gruuya merged commit 903c436 into main Aug 5, 2024
@gruuya gruuya deleted the optional-sequence-number branch August 5, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants