-
-
Notifications
You must be signed in to change notification settings - Fork 813
Description
Datasette metadata allows users to store extra descriptions, URLs, and styling of their Datasette instances/databases/tables in an easy way. Traditionally, users can bring in their metadata in one of two ways:
- With the
-m metadata.jsonCLI option, wheremetadata.jsonis a nested JSON file of all metadata (YAML also supported) - Using the
get_metadata()hook
Internally, Datasette stores metadata in internal Python dictionaries, and is accessed with the (publicly undocumented) .metadata() method. The logic is quite complex — it handles "recursive" updates to combine metadata.json metadata with plugin hooks metadata, fallback logic, and confusing database/table/key arguments.
Proposal: New datasette_metadata_* tables inside internal.db
We added a new --internal internal.db option to Datasette in a recent Datasette 1.0a release. This is a persistent instance-wide database that plugins can use to store data. I propose that Datasette core uses this database to store metadata, as a "single-source" of truth for metadata resolution.
Datasette core will use these new datasette_metadata_* tables to source metadata for instances/database/tables/columns. Plugins can write directly to these tables to store metadata, removing the need for the get_metadata() hook.
The metadata.json pattern can still be supported by just writing the contents of metadata.json to the datasette_metadata_* tables on startup.
Proposed SQL + Python API
The "internal tables" that Datasette uses for metadata can be described as follows:
-- Metadata key/values for the entire Datasette instance
CREATE TABLE datasette_metadata_instance_entries(
key text,
value text,
unique(key)
);
-- Metadata key/values for specific databases
CREATE TABLE datasette_metadata_database_entries(
database_name text,
key text,
value text,
unique(database_name, key)
);
-- Metadata key/values for specific "resources" (tables, views, canned_queries)
CREATE TABLE datasette_metadata_resource_entries(
database_name text,
resource_name text,
key text,
value text,
unique(database_name, resource_name, key)
);
-- Metadata key/values for specific columns
CREATE TABLE datasette_metadata_column_entries(
database_name text,
resource_name text,
column_name text,
key text,
value text,
unique(database_name, resource_name, column_name, key)
);In Python, Datasette core will add the following methods on the Datasette class:
class Datasete:
# ...
async def get_instance_metadata() -> dict[str, any]:
pass
async def get_database_metadata(database_name: str) -> dict[str, any]:
pass
async def get_resource_metadata(database_name: str, resource_name: str) -> dict[str, any]:
pass
async def get_column_metadata(database_name: str, resource_name: str column_name: str) -> dict[str, any]:
passThese will be used internally by Datasette to wrap the SQL queries to the datasette_metadata_* tables. Though maybe plugins can use them as well?
We could also add set_* methods, mainly for plugin authors, so they could avoid writing SQL.
class Datasete:
# ...
async def set_instance_metadata(key:str, value:str):
pass
async def set_database_metadata(database_name: str, key:str, value:str):
pass
# etc.Consequences
- The
get_metadata()hook will be deprecated. Instead, plugins can write directly to thedatasette_metadata_*tables on startup, and update them as they wish (on user request, on a scheduled basis, etc.) - "Cascading metadata", aka the
fallbackoption will be deprecated. It only really makes sense in narrow use-cases (ie licensing an entire database), and plugins could define their own cascading logic if needed. - Metadata fetching becomes an async operation.
metadata.jsoncan still be supported - it'll just overwrite thedatasette_metadata_*entries on startup, meaning users will only need to run it once then can delete theirmetadata.json(provided they include a persistent--internaldatabase). Though "overwriting" may have unintended consequences...
How 3rd party plugins currently use the get_metadata() hooks
There aren't many open-source usages of the get_metadata() hook, at least what I could find on Github search. The ones I found:
datasette-metadata-editable: Handles an in-memory cache (Python dictionary) that gets populated onstartup(), and updated on updatesdatasette-remote-metadata: Handles an in-memory cache (Python dictionary) that gets updated on a reoccurring basisdatasette-updated: Reads from on-disk file at request timedatasette-scraper: Provides metadata for tables that the plugin creates/manages (kindof like shadow tables?)datasette-live-config
I think all of these use-cases can easily be supported with this new approach — writing to the datasette_metadata_* tables on startup, and update them whenever they need it. I'd also say it would simplify much of the code we see here, but only time will tell...