Proposal - store metadata inside `internal.db` tables


[Datasette metadata](https://docs.datasette.io/en/latest/metadata.html) allows users to store extra descriptions, URLs, and styling of their Datasette instances/databases/tables in an easy way. Traditionally, users can bring in their metadata in one of two ways:

1. With the `-m metadata.json` CLI option, where `metadata.json` is a nested JSON file of all metadata (YAML also supported)
2. Using the `get_metadata()` hook

Internally, Datasette stores metadata in internal Python dictionaries, and is accessed with the (publicly undocumented) [`.metadata()`](https://github.com/simonw/datasette/blob/8f9509f00cceea8dc87403c28b2056db7b246ed4/datasette/app.py#L649-L695) method. The logic is quite complex — it handles "recursive" updates to combine `metadata.json` metadata with plugin hooks metadata, fallback logic, and confusing database/table/key arguments.

## Proposal: New `datasette_metadata_*` tables inside `internal.db`

We added a new [`--internal internal.db`](https://github.com/simonw/datasette/issues/2157) option to Datasette in a recent Datasette `1.0a` release. This is a persistent instance-wide database that plugins can use to store data. I propose that Datasette core uses this database to store metadata, as a "single-source" of truth for metadata resolution. 

Datasette core will use these new `datasette_metadata_*` tables to source metadata for instances/database/tables/columns. Plugins can write directly to these tables to store metadata, removing the need for the `get_metadata()` hook. 

The `metadata.json` pattern can still be supported by just writing the contents of `metadata.json` to the  `datasette_metadata_*` tables on startup. 

### Proposed SQL + Python API

The "internal tables" that Datasette uses for metadata can be described as follows:

```sql
-- Metadata key/values for the entire Datasette instance
CREATE TABLE datasette_metadata_instance_entries(
  key text,
  value text,
  unique(key)
); 

-- Metadata key/values for specific databases
CREATE TABLE datasette_metadata_database_entries(
  database_name text,
  key text,
  value text,
  unique(database_name, key)
);

-- Metadata key/values for specific "resources" (tables, views, canned_queries)
CREATE TABLE datasette_metadata_resource_entries(
  database_name text,
  resource_name text,
  key text,
  value text,
  unique(database_name, resource_name, key)
);

-- Metadata key/values for specific columns
CREATE TABLE datasette_metadata_column_entries(
  database_name text,
  resource_name text,
  column_name text,
  key text,
  value text,
  unique(database_name, resource_name, column_name, key)
);
```


In Python, Datasette core will add the following methods on the Datasette class:

```python
class Datasete:
	# ...
  
  async def get_instance_metadata() -> dict[str, any]:
	pass
  
  async def get_database_metadata(database_name: str) -> dict[str, any]:
    pass
  
  async def get_resource_metadata(database_name: str, resource_name: str) -> dict[str, any]:
    pass
  
  async def get_column_metadata(database_name: str, resource_name: str column_name: str) -> dict[str, any]:
    pass
```

These will be used internally by Datasette to wrap the SQL queries to the `datasette_metadata_*` tables. Though maybe plugins can use them as well?

We could also add `set_*` methods, mainly for plugin authors, so they could avoid writing SQL.

```python
class Datasete:
	# ...
  
  async def set_instance_metadata(key:str, value:str):
	pass
  
  async def set_database_metadata(database_name: str, key:str, value:str):
    pass
  
  # etc.
```
### Consequences

- The `get_metadata()` hook will be deprecated. Instead, plugins can write directly to the `datasette_metadata_*` tables on startup, and update them as they wish (on user request, on a scheduled basis, etc.)
- "Cascading metadata", aka the `fallback` option will be deprecated. It only really makes sense in narrow use-cases (ie licensing an entire database), and plugins could define their own cascading logic if needed.
- Metadata fetching becomes an async operation. 
- `metadata.json` can still be supported - it'll just overwrite the `datasette_metadata_*` entries on startup, meaning users will only need to run it once then can delete their `metadata.json` (provided they include a persistent `--internal` database). Though "overwriting" may have unintended consequences...



## How 3rd party plugins currently use the `get_metadata()` hooks

There aren't many open-source usages of the `get_metadata()` hook, at least what I could find on Github search. The ones I found:

- [`datasette-metadata-editable`](https://github.com/datasette/datasette-metadata-editable/blob/4452d87d60468f2b039c790839095715a3bf010e/datasette_metadata_editable/__init__.py#L243-L245): Handles an in-memory cache (Python dictionary) that gets populated on `startup()`, and updated on updates
- [`datasette-remote-metadata`](https://github.com/simonw/datasette-remote-metadata/blob/b4084ae687bc496c9b767d24a6c73d1025e8b646/datasette_remote_metadata/__init__.py#L53-L55): Handles an in-memory cache (Python dictionary) that gets updated on a reoccurring basis
- [`datasette-updated`](https://github.com/rcaught/datasette-updated/blob/1150cc5a6c7160a64e84d16cc7ad2c12debd92e0/datasette_updated/__init__.py#L8): Reads from on-disk file at request time
- [`datasette-scraper`](https://github.com/cldellow/datasette-scraper/blob/25af45b9fd204e1068d82f5c04b4a14b9f4cbd5a/datasette_scraper/__init__.py#L92): Provides metadata for tables that the plugin creates/manages (kindof like shadow tables?)
- [`datasette-live-config`](https://github.com/next-LI/datasette-live-config/blob/038e756d1b5b6ebcfd847553774f2b49e4604a80/datasette_live_config/__init__.py#L157)

I think all of these use-cases can easily be supported with this new approach — writing to the `datasette_metadata_*` tables on startup, and update them whenever they need it. I'd also say it would simplify much of the code we see here, but only time will tell...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal - store metadata inside `internal.db` tables #2341

Proposal: New `datasette_metadata_*` tables inside `internal.db`

Proposed SQL + Python API

Consequences

How 3rd party plugins currently use the `get_metadata()` hooks

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Proposal - store metadata inside internal.db tables #2341

Description

Proposal: New datasette_metadata_* tables inside internal.db

Proposed SQL + Python API

Consequences

How 3rd party plugins currently use the get_metadata() hooks

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposal - store metadata inside `internal.db` tables #2341

Proposal: New `datasette_metadata_*` tables inside `internal.db`

How 3rd party plugins currently use the `get_metadata()` hooks