Skip to content

Exactly-Once Semantics for golem:rdbms#1741

Merged
vigoo merged 153 commits intogolemcloud:mainfrom
justcoon:rdbms_tx
Sep 16, 2025
Merged

Exactly-Once Semantics for golem:rdbms#1741
vigoo merged 153 commits intogolemcloud:mainfrom
justcoon:rdbms_tx

Conversation

@justcoon
Copy link
Copy Markdown
Contributor

@justcoon justcoon commented Jun 4, 2025

fixes: #1514
/claim #1514

new oplog entries:

  1. BeginRemoteTransaction - Begins a transaction operation - entry is added to oplog after transaction begin
  2. PreCommitRemoteTransaction - Pre-Commit of the transaction, indicating that the transaction will be committed - entry is added to oplog before transaction commit is executed
  3. PreRollbackRemoteTransaction - Pre-Rollback of the transaction, indicating that the transaction will be rolled back - entry is added to oplog before transaction rollback is executed
  4. CommittedRemoteTransaction - Committed transaction operation, indicating that the transaction was committed - entry is added to oplog after successful transaction commit
  5. RolledBackRemoteTransaction - Rolled back transaction operation, indicating that the transaction was rolled back - entry is added to oplog after successful transaction rollback

BeginRemoteTransaction is first oplog entry related to DB transaction
PreCommitRemoteTransaction and CommittedRemoteTransaction entries will be around DB commit
PreRollbackRemoteTransaction and RolledBackRemoteTransaction entries will be around DB rollback
CommittedRemoteTransaction or RolledBackRemoteTransaction oplog entry is last oplog entry related to transaction

TODO:

  • recovery test improvements - initial implementation using TestOplog where append of oplog entries failing based on patterns in worker name. Oplog functions do not have Result as return type, in case of error, it just ends with panic, (PrimaryOplog), in this PR new oplog functions were added, where add of entry have in response Result type - discuss Oplog function with Result as return type, find better way how to control when specific action should fail
  • golem-wit updates
  • code cleanup - remove println ...

oplog snippet

[
    {
        "type": "BeginRemoteTransaction",
        "timestamp": "2025-06-04T17:51:36.626Z",
        "transactionId": "740"
    },
    {
        "type": "ImportedFunctionInvoked",
        "timestamp": "2025-06-04T17:51:36.638Z",
        "functionName": "rdbms::postgres::db-transaction::execute",
        "request": { ...
        },
        "response": {...
        },
        "wrappedFunctionType": {
            "type": "WriteRemoteTransaction",
            "index": 16
        }
    },
    {
        "type": "Error",
        "timestamp": "2025-06-04T17:51:36.662Z",
        "error": "error while executing at wasm backtrace:\n    0:  0x3fc08 - wit-component:shim!indirect-golem:rdbms/postgres@0.0.1-[method]db-transaction.commit\n    1:  0x2a6a1 - <unknown>!<wasm function 231>\n    2:  0x2af54 - <unknown>!<wasm function 238>: Runtime error: worker rdbms-service-postgres-763b5c94-2128-4fca-b130-682ffaa8ec10-Fail1OnPreCommitRemoteTransaction failed on PreCommitRemoteTransaction 1 times"
    },
    {
        "type": "Jump",
        "timestamp": "2025-06-04T17:51:36.842Z",
        "jump": {
            "start": 16,
            "end": 21
        }
    },
    {
        "type": "BeginRemoteTransaction",
        "timestamp": "2025-06-04T17:51:36.848Z",
        "transactionId": "744"
    },
    {
        "type": "ImportedFunctionInvoked",
        "timestamp": "2025-06-04T17:51:36.857Z",
        "functionName": "rdbms::postgres::db-transaction::execute",
        "request": {...
        },
        "response": {...
        },
        "wrappedFunctionType": {
            "type": "WriteRemoteTransaction",
            "index": 22
        }
    },
    {
        "type": "PreCommitRemoteTransaction",
        "timestamp": "2025-06-04T17:51:36.868Z",
        "beginIndex": 22
    },
    {
        "type": "CommitedRemoteTransaction",
        "timestamp": "2025-06-04T17:51:36.888Z",
        "beginIndex": 22    }
]

where transaction failed on commit, on appending PreCommitRemoteTransaction entry to oplog, which means transaction was not commited, and was restarted

@vigoo
Copy link
Copy Markdown
Contributor

vigoo commented Aug 21, 2025

From your TODO list, don't worry about backward compatibility. We decided to break many things in the 1.3 release so it won't be compatible with any older worker anyway.

justcoon and others added 18 commits August 21, 2025 18:43
# Conflicts:
#	cli/golem-cli/src/model/text/worker.rs
#	golem-api-grpc/proto/golem/worker/public_oplog.proto
#	golem-common/src/base_model.rs
#	golem-common/src/model/oplog.rs
#	golem-common/src/model/public_oplog/mod.rs
#	golem-common/src/model/public_oplog/protobuf.rs
#	golem-test-framework/src/dsl/debug_render.rs
#	golem-worker-executor/src/durable_host/mod.rs
#	golem-worker-executor/src/model/public_oplog/mod.rs
#	golem-worker-executor/src/model/public_oplog/wit.rs
#	golem-worker-executor/src/services/oplog/tests.rs
#	golem-worker-executor/src/worker/status.rs
#	golem-worker-executor/tests/common/mod.rs
#	openapi/golem-service.yaml
#	openapi/golem-worker-service.yaml
#	test-components/rdbms-service.wasm
@vigoo
Copy link
Copy Markdown
Contributor

vigoo commented Sep 16, 2025

If we want to expose this functionality through host functions, we would need to add an additional WIT interface for RemoteTransactionHandler that would be implemented by the worker. The worker executor could then invoke it as needed, similar to how load/save snapshot operations work.

That kind of callback interface is not really possible with these WIT interfaces (I mean what you write is technically probably possible but very inconvenient). Instead the version of this exposed as host function should be refactored similarly to what I did with Durability (which, if you remember, originally also was a higher order function somewhere, taking what to run as a parameter) in a way that it just returns data ("what to do") and the control is on the caller side. Then of course the logic can be still hidden in a library that implements it with the host functions always the same.

I don't want this to be implemented before this PR is merged though.

Because we are going to break compatibility at least once more in 1.4 according to the current plans, I am giving this another review now with the intent to have it in 1.3 as we planned. (And modify it later if needed)

@vigoo vigoo enabled auto-merge September 16, 2025 08:34
@vigoo vigoo merged commit 98fc4a3 into golemcloud:main Sep 16, 2025
50 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Exactly-Once Semantics for golem:rdbms Support

4 participants