Implement HALT lock & write forwarding#277
Conversation
d31bae9 to
0b63fa0
Compare
|
I was planning on recording some more videos today, but then I saw this and now I'm thinking maybe I should wait because this impacts the migration script I would be teaching. I'm ok trying an experimental version of this if need be. How long before that is available do you suppose? (No pressure of course 😅, I can wait). |
Assuming I don't hit any major roadblocks, I'm assuming I'll merge this early next week. It needs some clean up and a bunch of testing but it's not doing anything too crazy. Also, I can always merge fixes in with later PRs. |
|
Sounds good! I'll wait. Thanks a lot! |
c515539 to
a675d3d
Compare
|
Test the "forwarding" branch encountered "remote tx confirmation failed" error. The database was updated in node-2 which is a replica node. Logs of node-3(primary): |
|
@legionxiong Thanks for giving the branch a try. It's still really rough around the edges. I'll move the PR out of "Draft" state once I do some decent testing. |
Well, It would be encouraging. I will try it again once you remove the "Draft" state and give some feedback. |
|
@kentcdodds @legionxiong I fixed up some issues on write forwarding but I'm still having a lingering issue with read queries picking up an invalid version of the WAL index header and it getting into a retry loop. I was hoping to have this merged in but it looks like there's still some debugging to do. Just wanted to give y'all a heads up. |
1e66023 to
be0433d
Compare
|
@kentcdodds @legionxiong There's still an outstanding issue on write forwarding in that it performs a checkpoint on the writing replica after the WAL commits but the replica client doesn't have the This is definitely still "alpha" quality so don't deploy it into production please. I'm looking at you, Kent. :) I also need to add timeouts but I'll do that in a separate PR. |
|
Awesome! So this has been pushed to dockerhub? Which version should I use? |
@kentcdodds Yes, it's on DockerHub. If you want to use the latest version built from this PR then you can use the PR tagged version: |
|
@benbjohnson It works well for my use case, so AWESOME! Great Job Ben! |
|
@legionxiong Thanks for testing it out! I had some issues when I was doing load testing on Friday so there's definitely still some kinks in there. I'm going to be doing some more testing & fixes today though. |
|
@legionxiong @kentcdodds Just a heads up that I've having a hard time getting this stable when trying to make it handle it behind the scenes. I'm going to rework this to separate out the lock acquisition from the actual write forwarding. Basically, you'll need to wrap your write transaction code in a LiteFS-specific lock. Right now, there are a couple different options for how to implement this: Command-line migrationsIf someone is running a one-off command in a temporary VM to perform a migration then they could acquire the lock for the life of the mount: # For JS/Prisma apps
$ litefs mount -with-lock -- prisma migrate deploy
# For Rails apps
$ litefs mount -with-lock -- bin/rails db:migrateIf you have a migration script then it could acquire for a single command: $ litefs run -with-lock -- prisma migrate deployor it could acquire the lock for the life of the script: #!/bin/bash
litefs lock
prisma migrate deploy
... do other stuff ...
litefs unlockThe Library migrationsWe can also support acquiring the lock within a library. @kentcdodds has a litefs-js library that could hook into this lock mechanism so it could look like: import { withWriteLock } from 'litefs-js'
withWriteLock(() => {
// do transaction stuff here
})Please excuse my terrible JavaScript. :) I'm planning on implementing this as another lock byte on the database file so it should be easy for libraries to hook into the functionality. It should just be a |
|
That makes sense to me. 👍 |
829c865 to
91eedd7
Compare
|
@benbjohnson It's fine for me, I have nothing to migrate. Or, Is that means the app/tool in replica nodes need to acquire the lock itself before writing data to the database? |
Yes, SQLite is a single-writer database so only one node can write at a time. If it's a replica, it'll essentially borrow the lock from the primary, perform a write, and then push that change set back to the primary. It's a lot slower than simply writing on the primary directly but it makes it simpler for situations like running migrations or low-write scenarios. |
Thanks for your explanation! We are developing a maintenance tool using sqlite3 for a distributed system. It will not happen for two or more maintainers doing the same operation in different nodes, no concurrency no heavy writing. So it is basically a single user scenario. LiteFS with write forwarding is exactly what we expected to do database replication, THANK again! What is supposed to do for this single user scenario? De we need add some wrapper for our tool? |
|
Ok, I refactored the internals into two separate parts: the "halt" lock & the transaction forwarding. There's an API on the HTTP server to acquire the halt lock ( On the write forwarding side, the primary checks that the caller is holding the current halt lock ID and it verifies that the pre/post checksums and TXIDs match too. Next, I need to add the application's interface via FUSE. It'll use a This PR still needs a bunch of clean up and testing. It isn't smart about recovery, it doesn't have a timeout on the halt lock, etc. |
|
@legionxiong In your application code, you'll need to acquire a lock on a file byte before performing your write transaction on your replica. What language are you writing in? |
golang :) |
1 similar comment
golang :) |
3baf4db to
3b4f9eb
Compare
|
@legionxiong I added the FUSE API for the |
e2973ed to
e0ef315
Compare
7b330f6 to
39b37c2
Compare
39b37c2 to
5895619
Compare
91ab4fd to
55e0da0
Compare
55e0da0 to
5bdde15
Compare
2a8da57 to
9abbd54
Compare
b590ec3 to
388a3d3
Compare
bb126c5 to
d1b229a
Compare
d1b229a to
b799f1c
Compare
|
This is doing pretty well in our long-running test. I'm going to go ahead and merge this although it still needs additional testing before release. I still need to add some other functionality like pings and a shorter TTL. Otherwise, if a replica unexpected shuts down while holding the |
|
Also, I ended up moving non-SQLite locks (which is just the The reference implementation is in the |
|
So far so good! 👍 |
Overview
This pull request implements write forwarding
for WAL mode only. With write forwarding, any node can write to the database by borrowing the write lock from the primary, catching up, performing the transaction, and then shipping the changeset LTX file back to the primary. The transaction is then propagated out to the other replicas just like any other transaction.Write forwarding is useful for infrequent writes and this should not be used for regular, write-heavy usage. The main use case for this is being able to run migration scripts from any node. This is useful because deployments can roll over nodes in an unexpected order so we want to run on the first node deployed regardless of its primary state.
Fixes #56
Performance
Please note that the SQLite write lock is held for 1 round trip plus the time it takes to execute the transaction on the replica. This means that performing a forwarded write from Chennai when the primary is in Denver could mean that the write lock is held for ~250ms which would limit your write throughput to only 4 writes per second.
If you use forwarding for regular write usage, such as for a background jobs instance, ensure the node(s) are in the same region as the primary. For example, you can set your candidacy to only your Denver nodes by setting it in the
litefs.yml:TODO