Skip to content

Implement wasm-rdbms interface for standardized database access #1016

@jdegoes

Description

@jdegoes

WASM RDBMS

WASM components do not have access to databases natively. Rather, they must obtain all functionality from their host, through the component model types.

Although there exists a wasi-sql project, which intends to provide a WASI standard for database access, it is still in its infancy and shows no signs of active development.

Rather than await the completion of wasi-sql, WASM startup Fermyon introduced a series of WIT interfaces postgres, sqlite, and mysql in their WASM-native framework Spin, which use a related series of database types. These interfaces are supported by the Spin host environment.

It is not clear if Fermyon’s interfaces will become standard or not, or if work will be resumed on the earlier (but far less complete) wasi-sql. Rather than wait for the lengthy process of standardization to produce something useful, Golem needs to support some level of database access now, so that developers using databases have options.

This specification calls for the development of a new wasm-rdbms interface, which is designed to support features common to mainstream databases, as well as a corresponding set of implementations in Golem, powered by a low-level Rust library.

💡

Currently, this design only facilitates connection with existing databases, rather than supporting the newer paradigm of provisioning databases based on their demonstrated usage inside the code. However, as connecting to existing databases is something that needs to be supported regardless, it makes sense to first start with this feature.

Overview

The basic approach of this language-agnostic interface to databases is to have a generalized core upon which all specific database interfaces can depend.

This allows new databases to be introduced cheaply, while still respecting the fact that developers must necessarily program against specific databases, due to differences in column types, SQL syntax, and other features.

Common Elements

Common elements belong to the same generic package and have generic interface names.

They include the following, as well as related supporting types:

  • Column Types. The types possible for a column. This is a superset of the possible types for any given database.
  • Columns. The combination of a column type, with a name, and column-specific flags, such as whether or not the column is nullable, unique, auto-generated, etc.
  • Value. A value is a value of some column type.
  • Row. A row is a collection of values.
  • Statement. A statement has SQL (text) as well as variables and values for those variables (prepared statement).
  • Result Set. A result set is the result of executing a query. It may be iterated over.

Unique Elements

Unique elements belong to database-specific packages, with database-specific interface names. They represent aspects of a database interface that benefit from tying to one specific database.

  • Connection. A connection may be opened, with a database-specific connection information; connections allow both query (which return result set) and execution (which returns integer).

WIT Sketch

Common Elements

Something like the following should work for most databases (other than SQLite):

type node-index = u32;
type table-name = string;

/// Node in the db-data-type structure
variant db-column-type-primitive{
    integer(option<u8>),
    decimal(u8, u8),
    float,
    boolean,
    datetime,
    interval,
    string(option<u32>),
    text,
    binary(option<u32>),
    blob,
    enum(list<string>),
    json,
    xml,
    uuid,
    spatial,
}

variant db-column-type {
    primitive(db-column-type-primitive),
    array(list<option<u32>>, db-column-type),
}

flags db-column-type-flags {
    // Key constraints
    primary-key,     // Single or part of composite primary key
    foreign-key,     // References another table's key
    unique,          // Unique constraint
    
    // Core properties
    nullable,        // If not present, column is NOT NULL
    generated,       // Value is computed rather than stored directly
    auto-increment,  // MySQL AUTO_INCREMENT/PostgreSQL SERIAL/SQL Server IDENTITY
    
    // Common behaviors
    default-value,   // Has a default value defined
    indexed,         // Participates in an index
}

/// Data types for database columns
record db-column-type-meta {
    db-type:     db-column-type,
    name:        string,
    flags:       db-column-type-flags,
    foreign-key: option<table-name>,
}

/// Value descriptor for a single database value
variant db-value-primitive {
    integer(s64),
    decimal(string),
    float(float64),
    boolean(bool),
    datetime(u64),
    interval(u64),
    string(string),
    text(string),
    binary(list<u8>),
    blob(list<u8>),
    enum(u32),
    json(string),
    xml(string),
    uuid(u128),
    spatial(list<float64>),
    other(string, list<u8>),
    db-null,
}

variant db-value {
    primitive(db-value-primitive),
    array(list<db-value-primitive>), // Flatteneed
}

/// A single row of values
record db-row {
    values: list<db-value>,
}

/// A potentially very large and lazy stream of rows:
resource db-result-set {
    get-column-metadata: func() -> list<db-column-type-meta>;
    get-next: func() -> option<list<db-row>>;
}

Unique Elements

Something like the following should work for databases that are not SQLite:

interface postgres {
  resource db-connection {
    open: static func(address: string) -> result<db-connection, error>;

    query: func(statement: string, params: list<db-value>) -> result<db-result-set, error>;

    execute: func(statement: string, params: list<db-value>) -> result<u64, error>;
  }
}

interface mysql {
  resource db-connection {
    open: static func(address: string) -> result<db-connection, error>;

    query: func(statement: string, params: list<db-value>) -> result<db-result-set, error>;

    execute: func(statement: string, params: list<db-value>) -> result<u64, error>;
  }
}

SQLite will require its own interface, perhaps inspired by the one in Spin.

Host Implementation

The host implementation will live in the worker-executor service and will use best-in-class async Rust libraries to provide implementations of the specified interfaces.

One slight complication is that a connection in the guest must not map to a connection in the host. The reason for this is that on a single worker-executor node, there may exist thousands of workers, and these workers should not maintain their own independent connections to the database.

Therefore, in order to improve performance, it is necessary to perform connection pooling in the host, so that if only 20 concurrent connections are needed across all workers, then only 20 concurrent connections are created and maintained.

Note that if the wasm-rdbms interface improves in power to allow interleaving database updates with side-effects as part of a single database transaction, then pooling becomes much more difficult. However, with the current interface, the only two things a worker can do with a connection are query and execute, which atomically return results; and if those results are small enough to fit in memory, then there is no need to keep an underlying connection dedicated to the worker (instead, it can be reused for another worker).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions