-
Notifications
You must be signed in to change notification settings - Fork 9
We need a codified way to indicate what datasets are suitable for what tasks #60
Copy link
Copy link
Closed
Labels
Blocking External UseBlocking external use with tools, users, or models.Blocking external use with tools, users, or models.Code Cleanliness/Tech DebtDatasetsAssociated with the curated set of datasets in MEDS-DEVAssociated with the curated set of datasets in MEDS-DEVExperimental APIAssociated with ease of running experiments within the MEDS-DEV framework.Associated with ease of running experiments within the MEDS-DEV framework.TasksAssociated with the curated set of tasks in MEDS-DEVAssociated with the curated set of tasks in MEDS-DEVTestingAssociated with testing &/or CI practices to ensure validityAssociated with testing &/or CI practices to ensure validityUsability/InterfaceAssociated with usability by non-expert usersAssociated with usability by non-expert usershelp wantedExtra attention is neededExtra attention is neededpriority:highHigh priority; should be included in subsequent release candidate.High priority; should be included in subsequent release candidate.
Description
This is also a necessary component of ensuring viable testing, so that we don't have tests fail b/c they examine invalid tasks and datasets.
There are a few considerations here:
- Some datasets are not suitable for tasks at a base make-up level (e.g., eICU is at the hospital-stay level, not patient level, so should not be used for 30d hospital readmission. MIMIC-IV is a cohort of patients who were at some point admitted to either the ED or ICU, so is not suitable for 30d general hospital readmission (though we invalidate that now)).
- Some datasets are suitable for tasks, but are not currently configured for them because predicates have not been set up by local data owners. This means that from a testing perspective, we likely need a way to indicate "general viability" for dataset X task combinations, and also "set-up task X dataset" combos used in testing.
- Some datasets are suitable for tasks, but do not permit appropriate predicate definitions without ACES updates or something. This is like 2, technically, but requires different operationalization.
We should also aim for a series of minimal improvements rather than aiming for perfection.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Blocking External UseBlocking external use with tools, users, or models.Blocking external use with tools, users, or models.Code Cleanliness/Tech DebtDatasetsAssociated with the curated set of datasets in MEDS-DEVAssociated with the curated set of datasets in MEDS-DEVExperimental APIAssociated with ease of running experiments within the MEDS-DEV framework.Associated with ease of running experiments within the MEDS-DEV framework.TasksAssociated with the curated set of tasks in MEDS-DEVAssociated with the curated set of tasks in MEDS-DEVTestingAssociated with testing &/or CI practices to ensure validityAssociated with testing &/or CI practices to ensure validityUsability/InterfaceAssociated with usability by non-expert usersAssociated with usability by non-expert usershelp wantedExtra attention is neededExtra attention is neededpriority:highHigh priority; should be included in subsequent release candidate.High priority; should be included in subsequent release candidate.