Skip to content

Session event log sketch#675

Closed
arminsabouri wants to merge 58 commits intopayjoin:masterfrom
arminsabouri:session-event-log-sketch
Closed

Session event log sketch#675
arminsabouri wants to merge 58 commits intopayjoin:masterfrom
arminsabouri:session-event-log-sketch

Conversation

@arminsabouri
Copy link
Copy Markdown
Collaborator

This PR presents a rough sketch of the session event log architecture that @nothingmuch and I have been working on. Its primary purpose is to provide visibility into the direction we're taking with this design. I’d recommend against nitpicks or formatting comments at this stage. Architectural feedback is the most valuable form of review.

Note: This PR is not intended to be merged. It will be closed, and a follow-up PR will be opened with a clean commit history. The current branch history has diverged beyond repair, so the plan is to migrate functional chunks of code into a new branch. At present, this branch contains a working implementation of the session event log with replay support for the v2 receiver. We're currently focused on adding UniFFI bindings.

@DanGould
Copy link
Copy Markdown
Contributor

DanGould commented May 7, 2025

This change is epic.

From my read there are a few major architectural changes to how both sender and receiver persist events, which affects how both Sender and Receiver proceed through their operation. Here's my understanding of what changed:

Serialize all the things

Many, many structs become serializable so that they may be persisted.

Idiomatic, generic Receiver<State, Persister> typestate pattern

Receiver becomes generic over its state and session persister implementation. This lets each state share a common process_event function so that events can be replayed in a loop from persistence logs to recover the latest state.

ReceiverState enum introduced

Each ReceiverState enum variant encapsulates an instance of impl Receiver<State, P> for some concrete State, because rust can't pattern match on generic type parameters.

ReceiverSessionEvent enum introduced

Each ReceiverSessionEvent enum variant encapsulates serialized data necessary to make a state transition. The encapsulation allows that common ReceiverState::process_event(event: ReceiverSessionEvent, persister) trait function to apply the next state in a loop that reads from the logs.

Token concept removed

This change allows persistence to happen inside the state transitions without an explicit save/load that we were using before.

Instead, each session itself has a persister (PersistedSession as of writing) implementation itself with arbitrary load functionaly. This may address some of the confusion around how the reference implementation was written vs the needs of persisting session data as in @spacebear21's #689.

PersistableError introduced

Somehow, some errors get recorded to show what something failed by calling PersistedSession::record_error. This part seems the least fleshed out and I don't have much insight into design decisions yet. I'm a bit confused as to why an error wouldn't also be an Event. Is there a good reason for that or is this part still underdesigned?


Please correct any misunderstanding of my read so far

@nothingmuch
Copy link
Copy Markdown
Contributor

nothingmuch commented May 8, 2025

ReceiverState enum introduced

Each ReceiverState enum variant encapsulates an instance of impl Receiver<State, P> for some concrete State, because rust can't pattern match on generic type parameters.

the reason is more fundamental, an operation like replay log can't have a static return type that is just one of the states, so we need an enum to represent the combination of all of these possibilities: given a persisted state in the database it might be in any one state so we need an enum to express the return type of replay ranging over all possible typestates (technically this could return a trait object that you then downcast using as_any<Receiver<ExpectedState>> but yuck).

ReceiverSessionEvent enum introduced

Each ReceiverSessionEvent enum variant encapsulates serialized data necessary to make a state transition. The encapsulation allows that common ReceiverState::process_event(event: ReceiverSessionEvent, persister) trait function to apply the next state in a loop that reads from the logs.

similar logic as before re deserialization, unless you know what event type is coming next we need an enum type. in principle since the state transitions are known and the state machine has a linear structure except for failures (i.e. no branching structure), but the entire rationale for immutable events in a log is that it's easier to extend in the future with more event types, which would introduce variants into event logs recorded by different versions.

in the future, the history replay query accumulator typestate thingy need to be aware of all of those different versions and the sequences of events they produce (which btw, we should record some kind of version number in the first event i guess?), or at the very least have a procedure for converting legacy logs into current version logs by introducing dummy events corresponding to a new schema. we could commit to having all of that be statically typed, with the event log needing to contain a statically typed sequence of events (each of a different type which corresponds to a specific transitions between pairs of typestate types), but i don't think that buys us anything in terms of safety/stability, using a non exhaustive enum for the different types of events seems easier even for just the history accumulator, the simplest version of that would just scan the log for an event of the right type and return Some(the value), or None if no event was found in the log.

Token concept removed

This change allows persistence to happen inside the state transitions without an explicit save/load that we were using before.
...

PersistableError introduced

Somehow, some errors get recorded to show what something failed by calling PersistedSession::record_error. This part seems the least fleshed out and I don't have much insight into design decisions yet. I'm a bit confused as to why an error wouldn't also be an Event. Is there a good reason for that or is this part still underdesigned?

Please correct any misunderstanding of my read so far

We've discussed retaining this concept, but with changes in order to preserve the lack of side effects in the payjoin crate, which is a useful property.

@0xBEEFCAF3 here's a pseudocode sketch based on our call:

/// a token is now a specialized Result type that gives back
/// the receiver that the method consumed, but only if unwrapped
/// by a persister.save()
type Token<E : Event, N : State, C : State, R : StateError> = Result<Accepted<E : Event , N>,Rejected<E, R, C>>;

/// The Ok branch always contains the Accepted tuple.
/// This contains 1 or more (maybe exactly one?) events
/// and the Receiver with the state transition applied to it
/// as private fields 
struct Accepted<E, S : State>(Vec<E>, Receiver<S>)

/// The Err branch always contains the Rejected triple.
/// This is the same as Accepted except it contains an error.
/// The receiver is kept in the current state.
struct Rejected<E, R: Error, S : State>(Vec<E>, R, Receiver<S>)


trait StateError : Error {
    is_terminal() -> bool,
}

trait State {
   is_terminal() -> bool
}

impl Receiver<WithContext> {
  ...
  pub process_res(
    self, // consume self, not &mut self as it currently is
    body: &[u8],
    context: ohttp::ClientResponse,
  ) -> Token<ReceiverEvents, UncheckedProposal, WithContext, ProcessResError> {
    // decapusulate etc as per inner_process_res
    //
    // any error condition that is fatal can record information
    // with events
    //
    // Err(Rejected(...)) with is_terminal() implicitly closes
    // Same for Ok(Accepted()) and the terminal Receiver typestate
    // close() should still be available, but renamed abort()?

    // in the happy path, we accumulate some events
    let events : Vec<ReceiverEvent> = ...
    ...

    // and finally, consume self and the event buffer into an Accept token
    Ok(Accept(events, self))
  }

When calling process_res, you would then need to give this specialized result type to persister.save<E, N, C, R>() and it could unwrap it for you, returning Result<Receiver<N>, (R, Receiver<C>)>. An enum would be necessary for the FFI friendly, non generic version of save().

edit to add: thinking about it more, i think the nested Error in Reject makes more sense as part of the last event.

Instead, each session itself has a persister (PersistedSession as of writing) implementation itself with arbitrary load functionaly. This may address some of the confusion around how the reference implementation was written vs the needs of persisting session data as in @spacebear21's #689.

Consensus in the call today was that it mostly does, in particular the app choosing the ID for each session allows it to be used as a foreign key in whatever schema it has, so any additional data can be associated with a particular session.

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

last push includes a sender type state pattern similar to the receiver type state design.
The possible type states include:

pub enum SenderState<P> {
    Uninitialized(),
    WithReplyKey(Sender<SenderWithReplyKey, P>),
    GetContext(Sender<V2GetContext, P>),
    ProposalReceived(Sender<ProposalReceived, P>),
}

Application devs will still interact with the senderbuilder to construct a Sender with reply key however when replaying the logs internally we use a uninit sender and not the sender builder.
One notable change which I would like concept ack/nack feedback on is I coalesced the process_res and extract_req type states on the v2 sender. Meaning when the sender is in state WithReplyKey they can extract req but that does not move the sender to the next state. And it does not persist anything new b/c we didnt learn anything new. Sender does get a POST context but I believe that is deterministic. Only after process_res'ing does the sender persist the orignal proposal to dir, persist the session event and move onto. This mimicks the same pattern we have in receiver.

Additionally I added a type state ProposalReceived which is only attainable when we process_res the payjoin proposal. It marks the success state for the sender and wraps around the payjoin PSBT.

Copy link
Copy Markdown
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One notable change which I would like concept ack/nack feedback on is I coalesced the process_res and extract_req type states on the v2 sender. Meaning when the sender is in state WithReplyKey they can extract req but that does not move the sender to the next state.

I.e. sender state was created with URI, original PSBT, and reply key.

And [Sender<SenderWithReplyKey, P>::extract_*] does not persist anything new b/c [it] didnt learn anything [worth persisting].

Although ohttp_relay is passed to Sender<...>::extract_v2, it doesn't need to be persisted. Got it. Assuming new persistence is the only reason to advance state, this function does not advance state. Makes sense.

[`Sender::extract_v2] does [return] a [V2PostContext] but I believe that is deterministic.

I don't believe it it is deterministic because encrypt_message_a calls hpke::setup_sender, where it calls Kem::encap, which depends on csprng. Similarly, OHTTP encapsulation is not deterministic otherwise 2 requests encapsulating the same body would be linkable.

Only after process_res'ing does the sender persist the orignal proposal to dir, persist the session event and move onto.

So the original PSBT is not persisted after the builder builds? Or the proposal PSBT is only persisted after it's received?

This mimicks the same pattern we have in receiver.

Is mimicking the pattern the rationale for the change? The rationale for state transition upon v0.23.0 Sender::extract_req was that only 1 request would ever need to be extracted, and there was indeed a new state that needed to wait for a response. The sender's request would not poll. However, if the request failed, or if there were no network access before process_res could be called, I could see wanting to be able to extract another request instead of failing the session or persisting that request for replay (which would break unlinkability between OHTTP requests)

Is there some other rationale relating to persistence being a requirement for state transition proposed here? I'm not sure I have enough information to cack/nack, since I'm not sure exactly what you mean by "I coalesced the process_res and extract_req type states" or what the rationale for doing so is.

Comment on lines +191 to +201
match self.post_orginal_proposal(&context).await {
Ok(sender) => session = SenderState::V2GetContext(sender),
Err(_) => {
let (req, v1_ctx) = context.extract_v1();
let response = post_request(req).await?;
let psbt = Arc::new(v1_ctx.process_response(
&mut response.bytes().await?.to_vec().as_slice(),
)?);
self.process_pj_response((*psbt).clone())?;
return Ok(());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me that post_original_proposal calls post_request, as does the Err(_) match arm, which also loop. It seems like if this initial post fails the payjoin should fail by the error arm returning an error, not returning Ok.

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

arminsabouri commented May 13, 2025

So the original PSBT is not persisted after the builder builds? Or the proposal PSBT is only persisted after it's received?

Both are persisted under different session events.

Is there some other rationale relating to persistence being a requirement for state transition proposed here?

yes two reasons above:

  • "And it does not persist anything new b/c we didnt learn anything new." Meanings, In we only perist events when the type state learns a new peice of information (session hpke keys), non-deterministic (same example as last) and there are no adverse side affects from calling the prev type state.
  • creating similar patterns between the sender & receiver.

I was under the impression that extract_req will always get you the same req & ohttp_ctx and there are no side affects to looping and calling extract_req. But this is not the case. So I will revert back to the original type state.

was that only 1 request would ever need to be extracted, and there was indeed a new state that needed to wait for a response. The sender's request would not poll. However, if the request failed, or if there were no network access before process_res could be called, I could see wanting to be able to extract another request instead of failing the session or persisting that request for replay (which would break unlinkability between OHTTP requests)

Ok this reason + randomness in the KEM is convincing enough to keep the existing type states. One problem is that POSTContext includes a foreign type which does not statisfy the trait bounds. We'll have to wrap it and if that doenst work make a upstream change.

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

The latest push reintroduces storage tokens, now referred to as state transition objects. The key change is that instead of persisting session events directly within the type state during a transition function (e.g., process_res), the transition function returns a state transition object. This object is then evaluated by the persister, which handles persisting the associated session event and returning the next state.

This approach introduces some additional boilerplate for application developers but enforces a clean separation between state logic and I/O operations by moving persistence concerns outside the type state.

State transition objects are grouped into six categories. Naming is provisional and we can bike shed later.

// A transition that can be either fatal or transient or have no results.
/// If success it must have a next state and an event to save
pub type MaybeFatalTransitionWithNoResults<Event, NextState, CurrentState, Err> =
    MaybeFatalStateTransitionResultWithNoResults<Event, NextState, CurrentState, Err>;

/// A transition that can be either fatal or transient.
/// If success it must have a next state and an event to save
/// If Fatal error we must save the event and close the session
pub type MaybeFatalTransition<Event, NextState, Err> =
    MaybeFatalStateTransitionResult<Event, NextState, Err>;

/// A transition that can be transient.
/// If success it must have a next state and an event to save
pub type MaybeTransientTransition<Event, NextState, Err> =
    Result<AcceptNextState<Event, NextState>, RejectTransient<Err>>;

/// A transition that can be success or transient error
/// If success there are no events to save or next state
/// If transient error we can retry this state transition
pub type MaybeSuccessTransition<Err> = Result<AcceptCompleted, RejectTransient<Err>>;

/// A transition that is always a next state transition
pub type NextStateTransition<Event, NextState> = AcceptNextState<Event, NextState>;

/// A transition that can be success or bad init inputs
/// This is a special case where the state machine init inputs are bad and we can't proceed
/// The only thing we can do is reject the session. Since the session doesnt really exist at this point
/// there is no need to save events or close the session
pub type MaybeBadInitInputsTransition<Event, NextState, Err> =
    Result<AcceptNextState<Event, NextState>, RejectBadInitInputs<Err>>;

Each transition type captures a specific set of possible outcomes that a state transition function may return. These include:

  1. The session is completed and closed.
  2. The session progresses to the next state, and the event must be saved before the new type state is obtained.
  3. A transient error occurs; the session is not closed and the state does not change. A retry is expected.
  4. A fatal error occurs; the session is closed and the event must be persisted.
  5. The state machine initialization fails due to invalid input. The session is rejected without saving any event or transitioning state.

From the application developer’s perspective, the only implementation requirement remains the save_event function on the persister. Each persister function is now statically typed to its associated transition object, abstracting the underlying details of how each transition is persisted.

A notable detail is that transition types which include fatal outcomes are not wrapped in Result. This is a deliberate choice to prevent misuse such as blindly propagating (?) or unwrapping transitions that require events to be saved.

The remaining work before this PR can be broken into smaller, reviewable units includes:

  • Refactoring all state transition functions to consume self.
  • Applying the same state transition abstraction model to the sender.
  • Resolving potential asymmetries between the sender and receiver state machines.

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

arminsabouri commented May 15, 2025

One hazy idea: in general its a tough to know what persister method you need to call for what state transition type alias. Two solutions

  • Either all state transition methods are structs and the persister save methods are named accordingly. E.g create_session give you a MaybeBadInitInputs struct which wraps a Result type and the persister has a method save_maybe_bad_init_inputs
  • Or each state transition method returns a struct with save() method which consumes a ref persister which then give you your next type state

cc @nothingmuch

@nothingmuch
Copy link
Copy Markdown
Contributor

  • Or each state transition method returns a struct with save() method which consumes a ref persister which then give you your next type state

i think i prefer the simplicity of the 2nd approach

supporting both async and sync persister traits like we discussed would be perhaps a bit messier, it might be possible to do this without duplication based on a type parameter or associated type of the persister, which for async will be Future<Result<...>>, and in the other just Result<...>, i.e. a higher kinded type but i don't yet know how to do that in rust brb

that said even without HKT the 2nd approach seems like it'd have the least amount of boilerplate, while still emphasizing strong typing where we want it (i.e. we can't get away with a trait on all transitions because the whole point of splitting these is to simplify the return types depending on the nature of the transition)

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

The latest push implement approach two here

Much happier with the reduction in boiler plate. Example:

let proposal = proposal
            .check_broadcast_suitability(None, |tx| Ok(wallet.can_broadcast(tx)?))
            .save(persister);

@arminsabouri arminsabouri force-pushed the session-event-log-sketch branch from bef5c6b to 3c964d1 Compare May 19, 2025 20:01
@arminsabouri
Copy link
Copy Markdown
Collaborator Author

  • Rebased on master (as of this morning. I missed a couple commits).
  • Fixed the int. tests.
  • Split the persister session trait into two. A publicly accessible trait with methods to implement (save_event, load, and close) and an internal trait that extends the public one. The internal trait pub(crate)'s the methods that handle specific state transition objects.
  • Reduces api side boilerplace when returning state transition objects by proving util methods for expressing the specific state transition.
    e.g
- return MaybeFatalStateTransitionResult::Err(
-     MaybeFatalRejection::Transient(RejectTransient(
-           Error::ReplyToSender(e),
-       )),
-     );
+ return ReplyableError::Implementation(_) => MaybeFatalTransition::transient(Error::ReplyToSender(e))

@arminsabouri
Copy link
Copy Markdown
Collaborator Author

Last push includes:

  • Unit testing for each state transition objects and its respective persister save method.
  • Further api-side boilerplate reduction.
  • Handling replay session event log related errors
  • Handling expired sessions when replaying the session

@arminsabouri arminsabouri force-pushed the session-event-log-sketch branch from a3da5ac to 14020ab Compare May 31, 2025 13:27
@arminsabouri
Copy link
Copy Markdown
Collaborator Author

This PR has served its purpose. Most changes have been applied to master.
Related tracking item: #785

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants