Skip to content

Conversation

@uranusjr
Copy link
Member

@uranusjr uranusjr commented Aug 13, 2024

A part of AIP-76.

The Dataset class now accepts either name or uri, and fills the other by the provided value if both are not explicitly set.

Although not strictly required by this change, 'name' on DatasetAlias also now received the same parse-time check as Dataset's name and uri fields so they emit the same errors on incorrectly inputs.

This conflicts with #41348. One must be rebased if we merge the other.

TODO:

  • Modify the UI and CLI to show the name instead of/together with the URI.
  • Add tests.

@uranusjr
Copy link
Member Author

uranusjr commented Aug 28, 2024

Status:

  • Added name to Dataset. Model changes and migration added.
  • Both name and uri are optional, but the user must supply at least one of them. There’s a check when a Dataset is created.
  • The database enforces both name and uri must be unique. This means you can’t have two assets (different names) point to the same URI. It makes dataset resolution logic a lot simpler. Probably good enough in most cases?
  • The DAG processor (during bulk_write_to_db) collects all datasets and de-duplicate name and URI values. Currently this is done trivially (just randomly drop one of them). We might want to emit warnings, especially since the previous constraint can be confusing for users in edge cases.

Todo:

  • Fix existing tests. Many tests currently do something like this Dataset(uri). Should we fix all of them, or should we just make the positional argument the URI instead? Currently it’s the name.
  • DatasetAlias’s add interface (in OutletEventAccessor) currently only takes the URI. It should also accept the name and intelligently select the correct Dataset from the string value. Matching name over URI.
  • The yield Metadata interface might have the same issue?
  • Add tests for various name-URI combinations.

The Dataset class now accepts *either* name or uri, and fills the other
by the provided value if both are not explicitly set.

Although not strictly required by this change, 'name' on DatasetAlias
also now received the same parse-time check as Dataset's name and uri
fields so they emit the same errors on incorrectly inputs.
Instead of automatically fill from the other field, try to find in the
database if there's a best matching choice, and use that instead.

Various issues. Going to write them up later...
@uranusjr
Copy link
Member Author

I’ll do a separate PR now that things are refactored (and this conflicts like hell)

@uranusjr uranusjr closed this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant