Format, YAML
YAML file
The YAML file (download YAML for SRP265240) in a zip archive contains a full metadata description of the dataset, explaining how each sample is annotated and how the entire study is annotated. YAML contains the relevant parts of the annotation tree, followed by the list of selected studies, each containing selected samples. A number of annotation fields are included both for each study and for every sample. YAML content is included separately in the archive and is also present in the parquet file as the file-level metadata (bundle_yaml_b64_zstd).
Ontology trees
The YAML export includes the simplified onthology trees used for annotation (only entries relevant to the current export are present). Annotations in data set later reference the nodes of these trees, allowing both to indentify the annotation uniquely and to understand its place in the annotation hierarchy. Curators build annotation trees at the time of data curation.
ontologies:
- kind: Anatomy
root: &a19
name: Select by
children:
- &a20
name: Whole plant
children:
- &a21
name: Shoot
children:
- &a22
name: Fruit
children:
- &a23
name: Pericarp
children:
- &a24
name: Fruit wall
children:
- &a25
name: Exocarp
- &a26
name: Mesocarp
- kind: Development
root: &a27
name: Select by
children:
- &a28
name: "Modified BBCH (Feldmann & Rutikanga, 2021)"
description: >
Feldmann, F., & Rutikanga, A. (2021). Phenological growth stages and
BBCH-identification keys of chilli (Capsicum annuum L., Capsicum chinense Jacq., Capsicum
baccatum L.). Journal of Plant Diseases and Protection, 128(2), 549–555.
https://doi.org/10.1007/s41348-020-00395-x
children:
- &a29
name: Fruit development
children:
- &a30
name: Fruit has reached typical form and size (709)
description: >
Modified BBCH scale (F.Feldmann & A.Rutikanga, 2021)
To specify the exact location in the ontology tree hierarchy and, at the same time, keep the format reasonably compact, YAML anchors are used to reference the relevant location in the annotation tree. To make this more human-readable, the name of the anchored tree node is repeated next to the reference as a YAML comment.
studies:
- sr: SRP265240
annotation:
kind: Study
annotations:
- *a53 # Pericarp development
- *a54 # Genotypes
- *a55 # Fully annotated
fields:
library_layout: PAIRED
submission_ID: SUB7398732; SUB7400708
submission_date: 2020-05-07
samples:
- sr: SRR11873600
study_sr: SRP265240
annotations:
- kind: Anatomy
annotations:
- *a31 # Exocarp
- *a32 # Mesocarp
- kind: Development
annotations:
- *a6 # Fruit has reached typical form and size (709)
- *a15 # Mature green
- *a10 # 30 days after anthesis
- kind: Genotypes
annotations:
- *a37 # HJ10-1
- *a40 # Wildtype
Experiment groups and control
Experiments focusing on treatment (stimulus) typically include multiple repetitions of the same stimulus, along with corresponding control samples. This information is specified in the sample entry, which can be marked as treatment, control, or (if there is a sequence of treatments) both.
The assigned number indicates the experimental group: all samples with the same number under treatment are biological replicates of that experiment.
If the keyword control is provided instead, the sample serves as the control and should be compared against the experiment indicated by that number.