Skip to content

Multi-instance concurrency issues with atomic operations #64

@fox1t

Description

@fox1t

Multi-instance concurrency issues with atomic operations

Problem Description

The current Valkeyrie implementation has critical concurrency issues when multiple instances share the same SQLite database file. This prevents safe deployment scenarios where multiple processes need to coordinate atomic operations.

Root Cause Analysis

1. Process-local Write Queue

  • The writeQueue and isProcessing variables in sqlite-driver.ts are global but process-local
  • Multiple instances cannot coordinate through these variables
  • Each process manages its own queue, leading to race conditions

2. Versionstamp Collision Risk

  • Each instance generates versionstamps independently using timestamps + counter
  • Under high load, different instances can generate identical versionstamps
  • No cross-process coordination for versionstamp uniqueness

3. Atomic Operation Race Conditions

  • Check-and-set operations can pass validation simultaneously in different instances
  • The "check" phase happens in one transaction, "set" phase in another
  • Multiple instances can read the same state and all pass validation

Issues with PR #56

While PR #56 attempts to address these issues, it has several problems:

  1. Breaking Change: Changes versionstamp format from 20 to 28 characters
  2. Unnecessary Complexity: Uses UUID hashes and complex exponential backoff
  3. Incomplete Solution: Still uses process-level mutex (transactionMutex)
  4. Test Quality: Tests are flaky with one skipped test, don't validate core atomicity

Proposed Solution

1. Database-level Versionstamp Generation

Create a dedicated table for versionstamp generation:

CREATE TABLE IF NOT EXISTS versionstamp_sequence (
  id INTEGER PRIMARY KEY CHECK (id = 1),
  sequence INTEGER NOT NULL DEFAULT 0
);
INSERT OR IGNORE INTO versionstamp_sequence (id, sequence) VALUES (1, 0);

2. Maintain Backward Compatibility

  • Keep 20-character versionstamp format
  • Use database sequence + timestamp for uniqueness
  • No breaking changes to existing APIs

3. Simplified Transaction Logic

  • Remove process-local write queue entirely
  • Use BEGIN IMMEDIATE for database-level locking
  • Simple retry logic for SQLITE_BUSY errors
  • Rely on SQLite's ACID properties

4. Implementation Approach

private generateVersionstamp(): string {
  // Get next sequence number from database
  const stmt = this.db.prepare(`
    UPDATE versionstamp_sequence 
    SET sequence = sequence + 1 
    WHERE id = 1 
    RETURNING sequence
  `);
  const { sequence } = stmt.get();
  
  // Combine timestamp + sequence for 20-character format
  const timestamp = BigInt(Date.now() * 1000); // microseconds
  const combined = (timestamp << 20n) | BigInt(sequence & 0xFFFFF);
  return combined.toString(16).padStart(20, '0');
}

5. Benefits

  • ✅ True cross-process atomicity
  • ✅ Backward compatible
  • ✅ Simpler implementation
  • ✅ Better performance (no UUID overhead)
  • ✅ Reliable versionstamp ordering

Test Requirements

  • Multi-instance atomic increment tests
  • Versionstamp uniqueness validation
  • Check-and-set race condition tests
  • High-concurrency stress tests
  • Backward compatibility verification

Breaking Changes

None - this solution maintains full backward compatibility while fixing the concurrency issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions