Multi-instance concurrency issues with atomic operations
Problem Description
The current Valkeyrie implementation has critical concurrency issues when multiple instances share the same SQLite database file. This prevents safe deployment scenarios where multiple processes need to coordinate atomic operations.
Root Cause Analysis
1. Process-local Write Queue
- The
writeQueue and isProcessing variables in sqlite-driver.ts are global but process-local
- Multiple instances cannot coordinate through these variables
- Each process manages its own queue, leading to race conditions
2. Versionstamp Collision Risk
- Each instance generates versionstamps independently using timestamps + counter
- Under high load, different instances can generate identical versionstamps
- No cross-process coordination for versionstamp uniqueness
3. Atomic Operation Race Conditions
- Check-and-set operations can pass validation simultaneously in different instances
- The "check" phase happens in one transaction, "set" phase in another
- Multiple instances can read the same state and all pass validation
Issues with PR #56
While PR #56 attempts to address these issues, it has several problems:
- Breaking Change: Changes versionstamp format from 20 to 28 characters
- Unnecessary Complexity: Uses UUID hashes and complex exponential backoff
- Incomplete Solution: Still uses process-level mutex (
transactionMutex)
- Test Quality: Tests are flaky with one skipped test, don't validate core atomicity
Proposed Solution
1. Database-level Versionstamp Generation
Create a dedicated table for versionstamp generation:
CREATE TABLE IF NOT EXISTS versionstamp_sequence (
id INTEGER PRIMARY KEY CHECK (id = 1),
sequence INTEGER NOT NULL DEFAULT 0
);
INSERT OR IGNORE INTO versionstamp_sequence (id, sequence) VALUES (1, 0);
2. Maintain Backward Compatibility
- Keep 20-character versionstamp format
- Use database sequence + timestamp for uniqueness
- No breaking changes to existing APIs
3. Simplified Transaction Logic
- Remove process-local write queue entirely
- Use
BEGIN IMMEDIATE for database-level locking
- Simple retry logic for
SQLITE_BUSY errors
- Rely on SQLite's ACID properties
4. Implementation Approach
private generateVersionstamp(): string {
// Get next sequence number from database
const stmt = this.db.prepare(`
UPDATE versionstamp_sequence
SET sequence = sequence + 1
WHERE id = 1
RETURNING sequence
`);
const { sequence } = stmt.get();
// Combine timestamp + sequence for 20-character format
const timestamp = BigInt(Date.now() * 1000); // microseconds
const combined = (timestamp << 20n) | BigInt(sequence & 0xFFFFF);
return combined.toString(16).padStart(20, '0');
}
5. Benefits
- ✅ True cross-process atomicity
- ✅ Backward compatible
- ✅ Simpler implementation
- ✅ Better performance (no UUID overhead)
- ✅ Reliable versionstamp ordering
Test Requirements
- Multi-instance atomic increment tests
- Versionstamp uniqueness validation
- Check-and-set race condition tests
- High-concurrency stress tests
- Backward compatibility verification
Breaking Changes
None - this solution maintains full backward compatibility while fixing the concurrency issues.
Multi-instance concurrency issues with atomic operations
Problem Description
The current Valkeyrie implementation has critical concurrency issues when multiple instances share the same SQLite database file. This prevents safe deployment scenarios where multiple processes need to coordinate atomic operations.
Root Cause Analysis
1. Process-local Write Queue
writeQueueandisProcessingvariables insqlite-driver.tsare global but process-local2. Versionstamp Collision Risk
3. Atomic Operation Race Conditions
Issues with PR #56
While PR #56 attempts to address these issues, it has several problems:
transactionMutex)Proposed Solution
1. Database-level Versionstamp Generation
Create a dedicated table for versionstamp generation:
2. Maintain Backward Compatibility
3. Simplified Transaction Logic
BEGIN IMMEDIATEfor database-level lockingSQLITE_BUSYerrors4. Implementation Approach
5. Benefits
Test Requirements
Breaking Changes
None - this solution maintains full backward compatibility while fixing the concurrency issues.