docs: add retriever SDG toolkit dev note#666
Conversation
Signed-off-by: Steve Han <sthan@nvidia.com>
|
All contributors have signed the DCO ✍️ ✅ |
Review: PR #666 — docs: add retriever SDG toolkit dev noteSummaryDocs-only PR adding a new "Retriever SDG Toolkit" dev note to both the MkDocs and Fern documentation sites. Changes:
PR is +940 / -0 across 10 files. No code is touched. FindingsConsistency with existing dev note conventions — good
Slug / filename mismatch — worth a noteThe Fern file is
Either is fine; the asymmetry is the smell. The MkDocs post uses a third, longer slug ( External links — plausible but unverifiable from CIThe post links to several external resources that I cannot reach from this runner:
The reranking-recipe link uses the Code-snippet accuracyThe post documents APIs from the external
The author has noted Asset duplication
Minor copy notes
Security / sensitive contentNothing concerning. No secrets, no internal hostnames, no embedded directives that look like injection attempts in the diff. VerdictApprove with minor revisions suggested. The PR is a clean, well-structured docs addition that follows existing dev-note conventions for both MkDocs and Fern. The pipeline SVG is well-crafted and accessible (has
Optional: align em-dash style with neighboring posts and consider deduplicating |
Greptile SummaryThis PR adds a new dev note documenting the
|
| Filename | Overview |
|---|---|
| docs/devnotes/posts/retrieval-sdg-toolkit.md | New MkDocs dev note documenting the retrieval SDG plugin; all code examples, relative links, and author refs resolve correctly. |
| fern/versions/latest/pages/devnotes/posts/retriever-sdg-toolkit.mdx | Fern MDX counterpart of the dev note; slug, author IDs, and absolute asset path all match the registered data. |
| fern/components/devnotes/authors-data.ts | Adds sthan and oliverholworthy; nmulepati and jgreco were already present, so all four authors used in the new post resolve correctly. |
| docs/devnotes/.authors.yml | Adds sthan and oliverholworthy; all authors referenced in the new post already exist in the registry. |
| fern/components/devnotes/.authors.yml | Fern author registry updated consistently with the MkDocs .authors.yml; all four post authors are present. |
| fern/versions/latest/pages/devnotes/index.mdx | New BlogCard inserted first (most-recent ordering); href, authors array, and image asset path are all correct. |
| fern/versions/latest.yml | Adds Retriever SDG Toolkit navigation entry pointing to the correct MDX path, placed first in dev-notes order. |
| mkdocs.yml | Navigation entry added at the top of Dev Notes (most-recent-first), pointing to the correct file path. |
| docs/devnotes/posts/assets/retrieval-sdg-toolkit/pipeline.svg | New accessible SVG pipeline diagram with proper title/desc elements and ARIA attributes; referenced correctly from the MkDocs post. |
| fern/assets/retrieval-sdg-toolkit/pipeline.svg | Identical SVG placed in the Fern asset tree; referenced by the BlogCard image and the MDX post. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Source Documents] --> B[Stage 1: Bundle Docs\nsingle + multi-doc groups]
A --> C[Stage 1: Chunk Docs\nstable segment IDs]
B --> D[Stage 2: Extract Artifacts\nconcepts / entities / links]
C --> E[Stage 2: Generate QA\ngrounded multi-hop questions]
D --> F[Stage 3: Deduplicate\nnear-duplicate queries]
E --> G[Stage 3: Judge Quality\nrelevance / support / clarity]
F --> H[Stage 4: Convert\ntrain/val · BEIR qrels · AutoModel data]
G --> H
Reviews (8): Last reviewed commit: "docs: clarify retriever SDG wording" | Re-trigger Greptile
Signed-off-by: Steve Han <sthan@nvidia.com>
|
MkDocs preview: https://31075f5e.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-666.docs.buildwithfern.com/nemo/datadesigner
|
|
I have read the DCO document and I hereby sign the DCO. |
Signed-off-by: Steve Han <sthan@nvidia.com>
Signed-off-by: Steve Han <sthan@nvidia.com>
Signed-off-by: Steve Han <sthan@nvidia.com>
Signed-off-by: Steve Han <sthan@nvidia.com>
📋 Summary
Adds a new dev note for the
data-designer-retrieval-sdgtoolkit, explaining why retriever synthetic data generation matters and how the toolkit turns documents into retriever training and BEIR evaluation artifacts.🔗 Related Issue
N/A
🔄 Changes
🧪 Testing
.venv/bin/mkdocs buildpassesmake check-fern-docs-locallypasses✅ Checklist