Skip to content

We need a codified way to indicate what datasets are suitable for what tasks #60

@mmcdermott

Description

@mmcdermott

This is also a necessary component of ensuring viable testing, so that we don't have tests fail b/c they examine invalid tasks and datasets.

There are a few considerations here:

  1. Some datasets are not suitable for tasks at a base make-up level (e.g., eICU is at the hospital-stay level, not patient level, so should not be used for 30d hospital readmission. MIMIC-IV is a cohort of patients who were at some point admitted to either the ED or ICU, so is not suitable for 30d general hospital readmission (though we invalidate that now)).
  2. Some datasets are suitable for tasks, but are not currently configured for them because predicates have not been set up by local data owners. This means that from a testing perspective, we likely need a way to indicate "general viability" for dataset X task combinations, and also "set-up task X dataset" combos used in testing.
  3. Some datasets are suitable for tasks, but do not permit appropriate predicate definitions without ACES updates or something. This is like 2, technically, but requires different operationalization.

We should also aim for a series of minimal improvements rather than aiming for perfection.

Metadata

Metadata

Assignees

Labels

Blocking External UseBlocking external use with tools, users, or models.Code Cleanliness/Tech DebtDatasetsAssociated with the curated set of datasets in MEDS-DEVExperimental APIAssociated with ease of running experiments within the MEDS-DEV framework.TasksAssociated with the curated set of tasks in MEDS-DEVTestingAssociated with testing &/or CI practices to ensure validityUsability/InterfaceAssociated with usability by non-expert usershelp wantedExtra attention is neededpriority:highHigh priority; should be included in subsequent release candidate.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions