Skip to content

Commit 1481fb4

Browse files
manuelcorpasclaude
andcommitted
feat: add WES clinical report skill (EN) + validate ES skill conformance
- New skill: wes-clinical-report-en with English PDF generation (700 lines) - Rewrite wes-clinical-report-es SKILL.md to pass 17/17 conformance checks - Fix ES module-level mkdir that blocked imports when /Volumes unmounted - Add test suites: 23 EN tests + 27 ES tests (50 total, all passing) - Add demo markdown reports (synthetic 8 P/LP variants, 6 PGx markers) - Add both skills to CLAUDE.md routing table, CLI reference, and demo data - YAML inputs/outputs fields added to both SKILL.md files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 98be04b commit 1481fb4

9 files changed

Lines changed: 4151 additions & 1 deletion

File tree

CLAUDE.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,12 @@ When the user asks a question, match it to a skill and act:
3838
| Fine-mapping, SuSiE, SuSiE-inf, ABF, credible sets, PIP, posterior inclusion probability, causal variant, fine map locus, FINEMAP, polyfun, infinitesimal fine-mapping | `skills/fine-mapping/` | Run `fine_mapping.py` |
3939
| LLM benchmark, benchmark language models, biobank knowledge retrieval, coverage score, weighted coverage, model comparison biobank, semantic similarity benchmark | `skills/llm-biobank-bench/` | Read SKILL.md, apply methodology |
4040
| Cell segmentation, nucleus segmentation, microscopy, fluorescence microscopy, cellpose, cpsam, image segmentation, cell counting, segmentation mask | `skills/cell-detection/` | Run `cell_detection.py` |
41+
| WES clinical report English, exome PDF report, whole exome sequencing report, clinical exome PDF | `skills/wes-clinical-report-en/` | Run `wes_clinical_report_en.py` |
42+
| WES clinical report Spanish, informe clinico WES, exome PDF espanol, Predice, Inbiomedic, Novogene report | `skills/wes-clinical-report-es/` | Run `wes_clinical_report_es.py` |
4143

4244
## How to Use a Skill
4345

44-
### Skills with Python scripts (pharmgx-reporter, equity-scorer, nutrigx_advisor, scrna-orchestrator, bio-orchestrator, clinpgx, gwas-prs, gwas-lookup, profile-report, ukb-navigator, galaxy-bridge, rnaseq-de, methylation-clock, protocols-io, soul2dna, genome-match, recombinator, labstep, fine-mapping, cell-detection)
46+
### Skills with Python scripts (pharmgx-reporter, equity-scorer, nutrigx_advisor, scrna-orchestrator, bio-orchestrator, clinpgx, gwas-prs, gwas-lookup, profile-report, ukb-navigator, galaxy-bridge, rnaseq-de, methylation-clock, protocols-io, soul2dna, genome-match, recombinator, labstep, fine-mapping, cell-detection, wes-clinical-report-en, wes-clinical-report-es)
4547
1. Read the skill's `SKILL.md` for domain context
4648
2. Run the Python script with correct CLI arguments (see below)
4749
3. Show the user the output — open any generated figures and explain results
@@ -189,6 +191,20 @@ python skills/labstep/labstep.py --experiment-id ID
189191
python skills/labstep/labstep.py --protocols [--search QUERY] [--count N]
190192
python skills/labstep/labstep.py --protocol-id ID
191193
python skills/labstep/labstep.py --inventory [--search QUERY]
194+
195+
# WES Clinical Report (English) — professional PDF from WES markdown
196+
python skills/wes-clinical-report-en/wes_clinical_report_en.py \
197+
--report-dir <reports_dir> --output-dir <pdf_dir>
198+
python skills/wes-clinical-report-en/wes_clinical_report_en.py \
199+
--report-dir <reports_dir> --output-dir <pdf_dir> --samples Sample3
200+
python skills/wes-clinical-report-en/wes_clinical_report_en.py --demo
201+
202+
# WES Clinical Report (Spanish) — informe clinico PDF desde WES markdown
203+
python skills/wes-clinical-report-es/wes_clinical_report_es.py \
204+
--report-dir <reports_dir> --output-dir <pdf_dir>
205+
python skills/wes-clinical-report-es/wes_clinical_report_es.py \
206+
--report-dir <reports_dir> --output-dir <pdf_dir> --samples Sample3
207+
python skills/wes-clinical-report-es/wes_clinical_report_es.py --demo
192208
```
193209

194210
## Demo Data
@@ -221,6 +237,8 @@ For instant demos when the user has no data:
221237
| Labstep demo (3 experiments, protocols, inventory) | `--demo` flag | labstep |
222238
| Fine-mapping demo (200-variant locus, 2 causal signals, SuSiE) | `--demo` flag | fine-mapping |
223239
| CellposeSAM demo (synthetic 512×512 fluorescence nuclei image, ~67 cells) | `--demo` flag | cell-detection |
240+
| WES demo report (8 P/LP variants, 6 PGx, synthetic) | `skills/wes-clinical-report-en/examples/demo_WES_Report.md` | wes-clinical-report-en |
241+
| WES demo report (same, for Spanish output) | `skills/wes-clinical-report-es/examples/demo_WES_Report.md` | wes-clinical-report-es |
224242
| Corpas 30x chr20 SNPs + indels (WGS) | `corpas-30x/subsets/chr20_snps_indels.vcf.gz` | variant-annotation, equity-scorer |
225243
| Corpas 30x SV calls (WGS) | `corpas-30x/subsets/sv_calls.vcf.gz` | variant-annotation |
226244
| Corpas 30x CNV calls (WGS) | `corpas-30x/subsets/cnv_calls.vcf.gz` | variant-annotation |
@@ -307,6 +325,12 @@ python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
307325
# CellposeSAM demo
308326
python skills/cell-detection/cell_detection.py --demo --output /tmp/cell_detection_demo
309327

328+
# WES Clinical Report (English) demo
329+
python skills/wes-clinical-report-en/wes_clinical_report_en.py --demo
330+
331+
# WES Clinical Report (Spanish) demo
332+
python skills/wes-clinical-report-es/wes_clinical_report_es.py --demo
333+
310334
```
311335

312336
## Development Rules (STRICT)
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
name: wes-clinical-report-en
3+
description: >-
4+
Generates professional clinical PDF reports in English from WES (Whole Exome
5+
Sequencing) data with clinical interpretation summary, pharmacogenomic
6+
alerts, and follow-up recommendations.
7+
version: 1.0.0
8+
author: Manuel Corpas
9+
license: PROPRIETARY
10+
tags: [WES, exome, clinical-report, english, pharmacogenomics, PDF, ANNOVAR]
11+
inputs:
12+
- name: WES markdown report
13+
format: markdown (.md)
14+
required: true
15+
description: Structured WES report with sections 1-7 (Exome Summary through Methods)
16+
- name: Logo left
17+
format: image (JPG/PNG)
18+
required: false
19+
description: Left institutional logo for cover and header
20+
- name: Logo right
21+
format: image (JPG/PNG)
22+
required: false
23+
description: Right institutional logo for cover and header
24+
outputs:
25+
- name: Clinical PDF report
26+
format: PDF (A4)
27+
description: Professional clinical report with interpretation, tables, and disclaimer
28+
metadata:
29+
openclaw:
30+
requires:
31+
bins:
32+
- python3
33+
env: []
34+
config: []
35+
always: false
36+
emoji: "🧬"
37+
homepage: https://github.com/ClawBio/ClawBio
38+
os: [darwin, linux]
39+
min_python: "3.9"
40+
dependencies:
41+
- reportlab
42+
- pandas
43+
private: true
44+
install: []
45+
trigger_keywords:
46+
- WES clinical report
47+
- exome PDF report english
48+
- clinical report english
49+
- Novogene report english
50+
- WES report PDF
51+
---
52+
53+
# WES Clinical Report (English)
54+
55+
Skill for generating professional clinical PDF reports in English from
56+
whole exome sequencing (WES) data. Designed for Novogene WES data
57+
(GATK + ANNOVAR pipeline) but adaptable to any WES pipeline with
58+
equivalent annotations.
59+
60+
## Trigger
61+
62+
**Fire this skill when the user says any of:**
63+
- "generate WES clinical report in English"
64+
- "English exome PDF report"
65+
- "WES report PDF"
66+
- "clinical report from exome data"
67+
- "Novogene report English"
68+
- "exome clinical PDF"
69+
70+
**Do NOT fire when:**
71+
- User asks for a Spanish report (use `wes-clinical-report-es`)
72+
- User asks for variant annotation only (use `variant-annotation`)
73+
- User asks for ACMG classification only (use `clinical-variant-reporter`)
74+
75+
## Scope
76+
77+
One skill, one task: convert WES markdown reports into professional
78+
English-language clinical PDFs with interpretation.
79+
80+
## Workflow
81+
82+
1. Parse WES markdown report (structured sections 1-7)
83+
2. Extract KPI metrics from Exome Summary
84+
3. Extract pathogenic variants, PGx alerts, rare damaging variants
85+
4. Build interpretive summary paragraph
86+
5. Render all sections as styled PDF with clinical tables
87+
6. Add ancestry estimation (section 8) if data available
88+
7. Add limitations section (section 9)
89+
8. Add disclaimer and report metadata
90+
9. Output PDF to specified directory
91+
92+
## Capabilities
93+
94+
1. **Clinical interpretation summary**: key findings, high-risk PGx alerts,
95+
prioritised rare variants, clinical follow-up recommendations.
96+
2. **Clinically significant variants**: ClinVar P/LP, ACMG SF v3.2,
97+
cancer predisposition panel, conflicting variants.
98+
3. **Pharmacogenomics**: CPIC star alleles, clinical effects, affected
99+
medications with contextualised high-risk alerts.
100+
4. **Fitness and nutrition traits**: genotypes with evidence grades
101+
(Corpas et al. 2021).
102+
5. **Rare damaging variant prioritisation**: REVEL, CADD, gnomAD AF.
103+
6. **Disease and pathway context**: OMIM, GWAS, COSMIC, KEGG.
104+
7. **Institutional logos**: configurable left/right logos on cover and header.
105+
106+
## Example Output
107+
108+
```
109+
Page 1 (cover):
110+
[Logo Left] [Logo Right]
111+
+---------------------------------------------+
112+
| Whole Exome Sequencing Report [SampleN] |
113+
| Platform / Reference / Date |
114+
+---------------------------------------------+
115+
[KPIs: Total SNPs | Missense | Stopgain | Rare Damaging | ClinVar]
116+
117+
Results Interpretation
118+
(auto-generated clinical summary paragraph)
119+
120+
Pages 2+:
121+
1. Exome Summary
122+
2. Clinically Significant Variants
123+
3. Pharmacogenomics
124+
4. Fitness and Nutrition Traits
125+
5. Prioritised Rare Damaging Variants
126+
6. Disease and Pathway Context
127+
7. Methods
128+
8. Ancestry Estimation
129+
9. Limitations
130+
[Disclaimer]
131+
```
132+
133+
## Usage
134+
135+
```bash
136+
# Generate reports for all samples
137+
python skills/wes-clinical-report-en/wes_clinical_report_en.py \
138+
--report-dir /path/to/REPORTS/ \
139+
--output-dir /path/to/PDF-EN/ \
140+
--logo-left /path/to/logo_left.jpg \
141+
--logo-right /path/to/logo_right.jpg
142+
143+
# Generate report for a single sample
144+
python skills/wes-clinical-report-en/wes_clinical_report_en.py \
145+
--report-dir /path/to/REPORTS/ \
146+
--output-dir /path/to/PDF-EN/ \
147+
--samples Sample3
148+
149+
# Demo with default Novogene data
150+
python skills/wes-clinical-report-en/wes_clinical_report_en.py --demo
151+
```
152+
153+
## Input format
154+
155+
The skill consumes WES reports in markdown format generated by the
156+
analysis pipeline (scripts 02-12 in `ANALYSIS/SCRIPTS/`). Each markdown
157+
report must follow this structure:
158+
159+
```markdown
160+
# Whole Exome Sequencing Report: SampleN
161+
> **Project** ... | **Platform** ... | ...
162+
## 1. Exome Summary
163+
## 2. Clinically Significant Variants
164+
## 3. Pharmacogenomics
165+
## 4. Fitness and Nutrition Traits
166+
## 5. Prioritised Rare Damaging Variants
167+
## 6. Disease and Pathway Context
168+
## 7. Methods
169+
```
170+
171+
## Gotchas
172+
173+
1. **Logo paths must exist**: if logo files are missing, the report still
174+
generates but without institutional branding. The script silently skips
175+
missing logos.
176+
2. **Table truncation**: tables with more than 20 rows are truncated in
177+
the PDF with a note to consult TSV files. Do not assume all data is
178+
visible in the PDF.
179+
3. **Ancestry data is optional**: section 8 requires
180+
`ancestry_results.json` in the ancestry output directory. If absent,
181+
the section shows "No ancestry data available."
182+
4. **ClinVar classifications are time-sensitive**: the report reflects
183+
ClinVar state at annotation time. Do not treat classifications as
184+
permanent.
185+
5. **PGx star alleles from SNVs only**: CYP2D6 CNV analysis is not
186+
included. Do not claim complete metaboliser phenotyping.
187+
188+
## Safety
189+
190+
ClawBio is a research and educational tool. It is not a medical device
191+
and does not provide clinical diagnoses. Consult a healthcare professional
192+
before making any medical decisions.
193+
194+
## Agent Boundary
195+
196+
The agent dispatches and explains; the skill executes. The agent should
197+
not modify PDF generation logic inline. All report customisation goes
198+
through CLI flags.
199+
200+
## Chaining Partners
201+
202+
- `variant-annotation`: upstream VCF annotation feeding markdown reports
203+
- `clinical-variant-reporter`: ACMG classification for deeper analysis
204+
- `wes-clinical-report-es`: Spanish language version of the same report
205+
206+
## Maintenance
207+
208+
- Review cadence: quarterly (aligned with ClinVar release cycle)
209+
- Staleness signals: ClinVar version drift, CPIC guideline updates
210+
- Deprecation: if WES is superseded by WGS-only clinical pipelines
211+
212+
## Requirements
213+
214+
- Python 3.9+
215+
- reportlab >= 4.0
216+
- WES markdown reports (see input format above)
217+
- Institutional logos in JPG/PNG (optional)
218+
219+
## Privacy
220+
221+
This skill is **private** and not included in the ClawBio public catalog.
222+
It contains institutional report templates that should not be distributed
223+
publicly.
224+
225+
## References
226+
227+
- Corpas et al. (2021) "Whole Genome Interpretation for a Family of Five"
228+
*Frontiers in Genetics* 12:535123
229+
- CPIC guidelines for pharmacogenomics
230+
- ClinVar / gnomAD / OMIM / COSMIC / KEGG for variant annotation
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Whole Exome Sequencing Report: Sample1
2+
> **Project** X202SC26016276-Z01-F001 | **Platform** Illumina NovaSeq PE150 | **Capture** Xplus WES (60.5 Mb) | **Reference** GRCh38/hg38
3+
4+
---
5+
6+
## 1. Exome Summary
7+
8+
| Metric | Count |
9+
|--------|-------|
10+
| Total SNP variants | 25,432 |
11+
| Missense | 11,234 |
12+
| Synonymous | 12,100 |
13+
| Stopgain | 85 |
14+
| Frameshift | 42 |
15+
| Splicing | 120 |
16+
| Loss-of-function (stopgain + frameshift) | 127 |
17+
| Rare coding (gnomAD < 1%) | 3,210 |
18+
| Rare + computationally damaging | 245 |
19+
| ClinVar Pathogenic / Likely Pathogenic | 8 |
20+
21+
The homozygosity to heterozygosity ratio is **2.35**, which is above the expected ~1.5 for outbred populations.
22+
23+
## 2. Clinically Significant Variants
24+
25+
Sample1 carries 8 variant(s) classified as Pathogenic or Likely Pathogenic in ClinVar:
26+
27+
| Gene | Variant | Zygosity | Classification | Consequence | Associated Condition |
28+
|------|---------|----------|----------------|-------------|---------------------|
29+
| BRCA2 | c.5946delT | Het | Pathogenic | frameshift deletion | Hereditary breast/ovarian cancer |
30+
| MUTYH | c.536A>G | Het | Pathogenic | missense SNV | MUTYH-associated polyposis |
31+
| GJB2 | c.35delG | Hom | Pathogenic | frameshift deletion | Deafness, autosomal recessive 1A |
32+
| SERPINA1 | c.1096G>A | Het | Pathogenic | missense SNV | Alpha-1-antitrypsin deficiency |
33+
| HFE | c.845G>A | Het | Pathogenic | missense SNV | Hereditary hemochromatosis |
34+
| DPYD | c.1905+1G>A | Het | Pathogenic/LP | splicing | Dihydropyrimidine dehydrogenase deficiency |
35+
| MEFV | c.2080A>G | Het | Likely Pathogenic | missense SNV | Familial Mediterranean fever |
36+
| GAA | c.525delT | Het | Pathogenic | frameshift deletion | Pompe disease |
37+
38+
An additional 12 variant(s) have conflicting or uncertain classifications.
39+
40+
The most clinically relevant are listed below (coding variants in disease-associated genes):
41+
42+
ACMG SF v3.2 actionable genes: 156 coding variants identified across 73 medically actionable genes, of which 2 have ClinVar P/LP classification.
43+
44+
Cancer predisposition panel: 3 with P/LP classification across 95 cancer predisposition genes.
45+
46+
## 3. Pharmacogenomics
47+
48+
The following pharmacogenomic markers were identified from CPIC-defined star-allele positions. Variants are reported where the genotype differs from the reference allele.
49+
50+
| Gene | Variant | Allele | Zygosity | Clinical Effect | Affected Medications |
51+
|------|---------|--------|----------|-----------------|---------------------|
52+
| CYP2D6 | rs3892097 | *4 | Het | Slow metaboliser | codeine, tramadol, tamoxifen |
53+
| NAT2 | rs1801280 | *5 | Hom | Slow acetylator | isoniazid, hydralazine |
54+
| CYP2C19 | rs4244285 | *2 | Het | Slow metaboliser | clopidogrel, omeprazole |
55+
| SLCO1B1 | rs4149056 | *5 | Het | Decreased transport | simvastatin, atorvastatin |
56+
| VKORC1 | rs9923231 | - | Het | Low-dose warfarin | warfarin |
57+
| CYP1A2 | rs762551 | *1F | Hom | Ultra-rapid (inducible) | caffeine, clozapine |
58+
59+
## 4. Fitness and Nutrition Traits
60+
61+
Genotypes at positions associated with fitness and nutrition traits (Corpas et al. 2021, Tables 3-5). Only markers captured by the WES panel and with non-reference genotypes are shown.
62+
63+
Evidence grades: A = strong replication, B = moderate, C = preliminary.
64+
65+
### Fitness
66+
67+
| Gene | Variant | Trait | Interpretation | Ev. |
68+
|------|---------|-------|----------------|-----|
69+
| ACTN3 | rs1815739 | Muscle fibre type (power vs endurance) | XX - endurance phenotype | A |
70+
71+
### Nutrition
72+
73+
| Gene | Variant | Trait | Interpretation | Ev. |
74+
|------|---------|-------|----------------|-----|
75+
| MTHFR | rs1801133 | Folate metabolism (C677T) | CT - 35% reduced | A |
76+
| GC | rs2282679 | Vitamin D binding protein | Lower vitamin D | A |
77+
| FADS1 | rs174547 | Omega-3 conversion | Poor converter | B |
78+
| TCF7L2 | rs7903146 | Type 2 diabetes risk | Moderate | A |
79+
| ADH1B | rs1229984 | Alcohol metabolism speed | Ultra-rapid | A |
80+
| TAS2R38 | rs713598 | Bitter taste perception | Medium taster | B |
81+
82+
## 5. Prioritised Rare Damaging Variants
83+
84+
245 variants pass all filters: coding, rare (gnomAD AF < 0.01), and computationally predicted damaging (CADD > 20 or REVEL > 0.5). Top 15 ranked by pathogenicity prediction score:
85+
86+
| Gene | Variant | Consequence | Zygosity | REVEL | CADD | gnomAD AF | OMIM Disease |
87+
|------|---------|-------------|----------|-------|------|-----------|-------------|
88+
| ABCA4 | c.5882G>A | missense SNV | Het | 0.92 | 33.0 | 0.0012 | Stargardt disease |
89+
| GJB2 | c.35delG | frameshift deletion | Hom | - | 35.0 | 0.0089 | Deafness, autosomal recessive 1A |
90+
| USH2A | c.2299delG | frameshift deletion | Het | - | 34.0 | 0.0034 | Usher syndrome type 2A |
91+
| CFTR | c.1521_1523delCTT | frameshift deletion | Het | - | 32.0 | 0.0078 | Cystic fibrosis |
92+
| ATP7B | c.3207C>A | missense SNV | Het | 0.88 | 29.5 | 0.0015 | Wilson disease |
93+
94+
## 6. Disease and Pathway Context
95+
96+
Across the full variant set: 1,245 variants map to OMIM disease entries, 456 overlap GWAS Catalog associations, and 89 have COSMIC somatic mutation records.
97+
98+
KEGG pathways enriched in rare coding variants:
99+
- hsa04010: MAPK signalling pathway (12 variants)
100+
- hsa04151: PI3K-Akt signalling pathway (9 variants)
101+
- hsa04110: Cell cycle (7 variants)
102+
- hsa04310: Wnt signalling pathway (5 variants)
103+
104+
## 7. Methods
105+
106+
Whole exome sequencing was performed on an Illumina NovaSeq 6000 platform using 150 bp paired-end reads with the Xplus capture kit (60.5 Mb target region). Reads were aligned to the GRCh38/hg38 reference genome using BWA-MEM. Variant calling was performed with GATK HaplotypeCaller v4.3.0 following GATK Best Practices. Functional annotation was performed with ANNOVAR, incorporating ClinVar (2024), gnomAD v3.1.2 (9 population groups), COSMIC, OMIM, SIFT, PolyPhen-2, CADD, REVEL, and 15 additional databases. Pharmacogenomic analysis used CPIC star-allele definitions with evidence enrichment from the ClinPGx API (PharmGKB). Fitness and nutrition trait interpretation followed the evidence framework of Corpas et al. (2021) Frontiers in Genetics 12:535123. Variant prioritisation applied sequential filters: coding consequence, population frequency (gnomAD AF < 0.01), computational pathogenicity (CADD > 20 or REVEL > 0.5).
107+
108+
*Report prepared by ClawBio WES Analysis Pipeline on 2026-04-05*
109+
110+
> **Disclaimer**: This report is generated for research and educational purposes only. It is not a clinical diagnostic report and should not be used for making medical decisions without consulting a qualified healthcare professional.

0 commit comments

Comments
 (0)