Format, YAML

The YAML file (download YAML for SRP265240) in a zip archive contains a full metadata description of the dataset, explaining how each sample is annotated and how the entire study is annotated. YAML contains the relevant parts of the annotation tree, followed by the list of selected studies, each containing selected samples. A number of annotation fields are included both for each study and for every sample. YAML content is included separately in the archive and is also present in the parquet file as the file-level metadata (bundle_yaml_b64_zstd).

The YAML export includes the simplified onthology trees used for annotation (only entries relevant to the current export are present). Annotations in data set later reference the nodes of these trees, allowing both to indentify the annotation uniquely and to understand its place in the annotation hierarchy. Curators build annotation trees at the time of data curation.

ontologies:
  - kind: Anatomy
    root: &a19
      name: Select by
      children:
        - &a20
          name: Whole plant
          children:
            - &a21
              name: Shoot
              children:
                - &a22
                  name: Fruit
                  children:
                    - &a23
                      name: Pericarp
                      children:
                        - &a24
                          name: Fruit wall
                          children:
                            - &a25
                              name: Exocarp
                            - &a26
                              name: Mesocarp
  - kind: Development
    root: &a27
      name: Select by
      children:
        - &a28
          name: "Modified BBCH (Feldmann & Rutikanga, 2021)"
          description: >
              Feldmann, F., & Rutikanga, A. (2021). Phenological growth stages and
              BBCH-identification keys of chilli (Capsicum annuum L., Capsicum chinense Jacq., Capsicum
              baccatum L.). Journal of Plant Diseases and Protection, 128(2), 549–555.
              https://doi.org/10.1007/s41348-020-00395-x
          children:
            - &a29
              name: Fruit development
              children:
                - &a30
                  name: Fruit has reached typical form and size (709)
                  description: >
                      Modified BBCH scale (F.Feldmann & A.Rutikanga, 2021)

To specify the exact location in the ontology tree hierarchy and, at the same time, keep the format reasonably compact, YAML anchors are used to reference the relevant location in the annotation tree. To make this more human-readable, the name of the anchored tree node is repeated next to the reference as a YAML comment.

studies:
  - sr: SRP265240
    annotation:
      kind: Study
      annotations:
        - *a53 # Pericarp development
        - *a54 # Genotypes
        - *a55 # Fully annotated
    fields:
      library_layout: PAIRED
      submission_ID: SUB7398732; SUB7400708
      submission_date: 2020-05-07
    samples:
      - sr: SRR11873600
        study_sr: SRP265240
        annotations:
          - kind: Anatomy
            annotations:
              - *a31 # Exocarp
              - *a32 # Mesocarp
          - kind: Development
            annotations:
              - *a6 # Fruit has reached typical form and size (709)
              - *a15 # Mature green
              - *a10 # 30 days after anthesis
          - kind: Genotypes
            annotations:
              - *a37 # HJ10-1
              - *a40 # Wildtype

Experiments focusing on treatment (stimulus) typically include multiple repetitions of the same stimulus, along with corresponding control samples. This information is specified in the sample entry, which can be marked as treatment, control, or (if there is a sequence of treatments) both.

The assigned number indicates the experimental group: all samples with the same number under treatment are biological replicates of that experiment.

If the keyword control is provided instead, the sample serves as the control and should be compared against the experiment indicated by that number.