fix(docs): unbreak published Fern site by lbliii · Pull Request #615 · NVIDIA-NeMo/DataDesigner

lbliii · 2026-05-07T18:29:28Z

Summary

The published Fern site at datadesigner.docs.buildwithfern.com/nemo/datadesigner was broken: every page (e.g. /concepts/person-sampling) returned a Server Components render error in production while local previews worked. Two unrelated migration leftovers were the cause.

1. Notebook bundles too large for Fern's SSR bundler

fern/components/notebooks/{5,6}-*.ts shipped at 1.8 MB and 4.6 MB. The image notebooks emit IPython.display.HTML grids containing inline data:image/png;base64,... URIs, which bypassed the existing image/png MIME shrinker in ipynb-to-fern-json.py and were copied verbatim through the text/html branch. Fern's hosted RSC payload limit choked, and because the version bundle is shared, the whole site went down — not just the notebook pages.

Fix: Added shrink_inline_b64_in_html() to the converter so the HTML branch reuses the same 800 px JPEG q=82 path that the bare-image branch already uses. Applied in place to the committed bundles, preserving every other cell output:

Notebook	Before	After
5-generating-images	1.8 MB	423 KB
6-editing-images-with-image-context	4.6 MB	1.3 MB

The script change makes future make generate-fern-notebooks-with-outputs runs idempotent — full-resolution Flux outputs get downsized at conversion time before they hit any .ts.

2. Leftover MkDocs tab syntax on the agent-rollout-ingestion page

fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx still used PyMdown === "Title" tab blocks from the original MkDocs source (missed during #581). Fern's MDX runtime doesn't recognize the syntax. Converted the five tabs to <Tabs> / <Tab title="..."> JSX components, preserving titles, intro text, and code snippets verbatim.

Test plan

cd fern && fern check — 0 errors.
Verified shrunk base64 payloads start with valid JPEG magic (/9j/4AA...).
Confirmed all non-image cell outputs preserved in notebooks 5 and 6 (only text/html cells with inline base64 were modified).
After merge, confirm datadesigner.docs.buildwithfern.com/nemo/datadesigner/concepts/person-sampling and other pages render.
Confirm tutorial pages for notebooks 5 and 6 show images (now JPEG, ≤800 px longest edge).
Confirm /concepts/agent-rollout-ingestion renders the five tabs correctly.

The image notebooks (5, 6) emit `IPython.display.HTML` blocks containing inline `data:image/png;base64,...` URIs to render side-by-side image grids. Those bypassed the existing `image/png` MIME shrinker and shipped multi-MB strings through the `text/html` branch, producing 1.8 MB and 4.6 MB .ts modules. Fern's hosted SSR bundler couldn't render the version, taking every page down with a Server Components error. Add `shrink_inline_b64_in_html()` so the html branch resizes embedded base64 images through the same 800px JPEG q=82 path the standalone image branch already uses. Apply in-place to the committed bundles: notebook 5 1.8 MB → 423 KB, notebook 6 4.6 MB → 1.3 MB. Other outputs preserved. Signed-off-by: Lawrence Lane <llane@nvidia.com>

The agent-rollout-ingestion concept page still used PyMdown `=== "Title"` tab blocks left over from the MkDocs source. Fern's MDX runtime doesn't recognize the syntax, breaking the published page. Convert the five tab blocks to Fern's <Tabs>/<Tab title="..."> JSX components, preserving titles, intro text, and code snippets verbatim. Signed-off-by: Lawrence Lane <llane@nvidia.com>

github-actions · 2026-05-07T18:31:33Z

Review of PR #615 — `fix(docs): unbreak published Fern site`

Summary

Two targeted fixes to unbreak the hosted Fern site:

fern/scripts/ipynb-to-fern-json.py — adds shrink_inline_b64_in_html() so inline data:image/...;base64,... URIs embedded in IPython.display.HTML outputs are funnelled through the existing shrink_image_b64 pipeline, instead of slipping past the image/png MIME branch and landing in the .ts bundle at full resolution.
fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx — converts five leftover PyMdown === "Title" tab blocks to Fern <Tabs>/<Tab> JSX (missed during the MkDocs→Fern migration in docs: migrate documentation from MkDocs to Fern #581).
fern/components/notebooks/{5,6}-*.{ts,json} — regenerated artifacts showing the payload drop (1.8 MB → 423 KB, 4.6 MB → 1.3 MB).

Scope is narrow and matches the stated root cause. No runtime code paths are touched.

Findings

`fern/scripts/ipynb-to-fern-json.py`

Correctness — regex terminators look right. INLINE_DATA_URI_RE uses a non-greedy [A-Za-z0-9+/=\s]+? for the payload with a lookahead (?=[\"'\s)]). That covers the three realistic contexts (HTML attribute quotes, whitespace, CSS url(...)). Because base64 alphabet is A-Za-z0-9+/=, none of the terminator characters can appear inside the payload, so the non-greedy match won't prematurely stop mid-URI. Good.
Minor — "idempotent" is slightly overstated (fern/scripts/ipynb-to-fern-json.py:99–108 / summary). shrink_image_b64 unconditionally re-encodes to JPEG q=82 even when max(img.size) <= max_dim. On repeat runs of make generate-fern-notebooks-with-outputs, already-shrunk JPEG inline URIs will match INLINE_DATA_URI_RE again (the regex accepts jpeg) and be decoded + re-encoded, accumulating generational loss. Payload size will stabilize, but visual quality slowly degrades. Cheapest fix: short-circuit in shrink_image_b64 when img.format == "JPEG" and max(img.size) <= max_dim — return the original (b64, "image/jpeg") untouched. Not blocking for this PR; worth a follow-up issue.
Nit — the lookahead class includes \" but not <. If a future cell emits <img src=data:image/png;base64,AAA...> (unquoted attribute), the URI would be followed by > and the regex would greedily consume until a whitespace/quote/paren boundary is found. Current notebook authors always quote attributes, so this is theoretical. Mentioning only because adding > to the terminator class is zero cost.
Missing — no unit test for shrink_inline_b64_in_html. The project generally tests new logic, but this is a build script under fern/scripts/ (outside the three installable packages) and there's no existing test harness for it. Consistent with the neighboring code; acceptable.
Docstring style (fern/scripts/ipynb-to-fern-json.py:90–94). Multi-line docstring here is longer than the STYLEGUIDE one-liner preference, but this matches the existing style of shrink_image_b64 directly above it, which is the right call — consistency with the neighbor wins.

`fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx`

Content preserved. Spot-checked all five tab titles (Claude Code, Codex, Hermes Agent, Pi Coding Agent, ATIF), intro prose, and code blocks — titles and Python snippets are verbatim. Link to Harbor's ATIF docs preserved.
Indentation removed is correct for Fern. Old PyMdown syntax required 4-space indented content inside === "Title"; Fern <Tab> children sit at column 0. The diff correctly dedents.
Consistency check. Compared to other pages using <Tabs> under fern/versions/v0.5.8/ — matches the same JSX pattern. Good.

Notebook bundles (`5-.{ts,json}`, `6-.{ts,json}`)

Not reviewed line-by-line — these are generator output. Payload reductions in the PR description (1.8→0.4 MB, 4.6→1.3 MB) are consistent with 800 px JPEG q=82 on Flux outputs.
PR author confirms non-image cells are unmodified; the regex is scoped to inline data URIs only, so this is believable from the code.

Risks

Low blast radius. No Python package code touched; change is confined to the fern/ docs pipeline. Cannot regress runtime behavior.
Forward compatibility. The converter change is backward-compatible — notebooks without inline HTML base64 payloads are unaffected (the regex simply never matches).
Rollback is trivial — revert the script change; re-run the generator; previous bundles restored.

Suggestions (non-blocking)

Short-circuit shrink_image_b64 when the decoded image is already JPEG and within size limits, to make the converter truly idempotent across repeat generator runs.
Consider widening the lookahead character class to include > for defense against unquoted HTML attributes in future notebook outputs.
Follow-up issue: add a lightweight test for ipynb-to-fern-json.py (golden-file or regex-level) so future refactors don't silently regress.

Verdict

Approve intent. The fix is well-scoped, root-caused, and matches the PR description. The idempotency nit and regex-terminator nit are minor follow-ups, not blockers. Recommend merging once the test-plan boxes for the hosted site render are checked post-merge.

greptile-apps · 2026-05-07T18:32:31Z

Greptile Summary

This PR fixes a broken Fern hosted docs site caused by two migration leftovers: oversized notebook payloads that crashed Fern's SSR bundler, and leftover MkDocs tab and admonition syntax that Fern's MDX runtime couldn't parse.

Notebook payload shrinking: shrink_inline_b64_in_html() is added to ipynb-to-fern-json.py so that inline data:image/png;base64,... URIs embedded in IPython.display.HTML outputs are downsampled to 800 px JPEG (q=82) via the existing shrink_image_b64 path; notebooks 5 and 6 are regenerated with the fix applied (1.8 MB → 423 KB, 4.6 MB → 1.3 MB).
MDX syntax migration: agent-rollout-ingestion.mdx five PyMdown === \"Title\" blocks are converted to <Tabs>/<Tab title=\"...\"> JSX, and default-model-settings.mdx one !!! tip admonition is converted to <Tip>, both now valid Fern MDX.

Confidence Score: 5/5

Safe to merge — all changes are targeted doc fixes with no impact on library code or data-generation logic.

The script change is a straightforward additive shim that routes an already-validated code path (shrink_image_b64) for a new input source. The MDX edits are pure syntax replacements with content preserved verbatim. No library, API surface, or runtime logic is touched.

No files require special attention.

Important Files Changed

Filename	Overview
fern/scripts/ipynb-to-fern-json.py	Adds INLINE_DATA_URI_RE regex and shrink_inline_b64_in_html() to intercept base64 images embedded in text/html MIME outputs, routing them through the existing JPEG-downsampler before they reach the .ts bundle.
fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx	Converts five MkDocs PyMdown === "Title" tab blocks to Fern / JSX; all titles, prose, and code snippets are preserved verbatim.
fern/versions/v0.5.8/pages/concepts/models/default-model-settings.mdx	Converts a single MkDocs !!! tip admonition to a Fern component; content unchanged.
fern/components/notebooks/5-generating-images.ts	Auto-generated notebook bundle regenerated after inline-image shrinking; reduced from ~1.8 MB to ~423 KB.
fern/components/notebooks/6-editing-images-with-image-context.ts	Auto-generated notebook bundle regenerated after inline-image shrinking; reduced from ~4.6 MB to ~1.3 MB.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Notebook cell output] --> B{output_type?}
    B -- display_data / execute_result --> C{MIME key?}
    C -- image/png --> D[shrink_image_b64\n→ JPEG ≤800px]
    C -- text/html --> E[shrink_inline_b64_in_html]
    E --> F{inline data: URIs\nin HTML?}
    F -- yes --> G[INLINE_DATA_URI_RE.sub\n→ shrink_image_b64 per match]
    F -- no --> H[pass-through HTML]
    G --> I[emit type=text, format=html]
    H --> I
    C -- text/plain --> J[emit type=text, format=plain]
    D --> K[emit type=image]
    B -- stream --> L[emit type=text, format=plain]

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into lbliii/fix-fern..." | Re-trigger Greptile}

greptile-apps · 2026-05-07T18:32:35Z

+INLINE_DATA_URI_RE = re.compile(
+    r"data:image/(png|jpe?g);base64,([A-Za-z0-9+/=\s]+?)(?=[\"'\s)])",
+    re.IGNORECASE,


Regex whitespace handling is self-contradicting

INLINE_DATA_URI_RE includes \s in the character class so the capture group can span whitespace-wrapped base64, and _sub even calls "".join(match.group(2).split()) to strip that whitespace. However, the lazy quantifier +? combined with \s in the lookahead (?=["'\s)]) means the match terminates at the first whitespace it encounters — so \s in the character class is never reachable and the whitespace-stripping in _sub is always a no-op. If a future notebook emits pretty-printed HTML with line-wrapped base64 (e.g. src='data:image/png;base64,\nAAA...), the regex captures only the empty prefix before the newline, passes garbage to shrink_image_b64, which fails silently and returns an empty string, and the replacement produces a broken data:image/jpeg;base64, URI.

Prompt To Fix With AI

This is a comment left during a code review. Path: fern/scripts/ipynb-to-fern-json.py Line: 64-66 Comment: **Regex whitespace handling is self-contradicting** `INLINE_DATA_URI_RE` includes `\s` in the character class so the capture group can span whitespace-wrapped base64, and `_sub` even calls `"".join(match.group(2).split())` to strip that whitespace. However, the lazy quantifier `+?` combined with `\s` in the lookahead `(?=["'\s)])` means the match terminates at the *first* whitespace it encounters — so `\s` in the character class is never reachable and the whitespace-stripping in `_sub` is always a no-op. If a future notebook emits pretty-printed HTML with line-wrapped base64 (e.g. `src='data:image/png;base64,\nAAA...`), the regex captures only the empty prefix before the newline, passes garbage to `shrink_image_b64`, which fails silently and returns an empty string, and the replacement produces a broken `data:image/jpeg;base64,` URI. How can I resolve this? If you propose a fix, please make it concise.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>

lbliii added 2 commits May 7, 2026 14:28

lbliii requested a review from a team as a code owner May 7, 2026 18:29

lbliii temporarily deployed to agentic-ci May 7, 2026 18:29 — with GitHub Actions Inactive

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

fix(docs): convert leftover MkDocs admonition to Fern Tip

afa7f68

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>

andreatgretel approved these changes May 7, 2026

View reviewed changes

Merge branch 'main' into lbliii/fix-fern-published-docs

8340272

andreatgretel merged commit fba8f0b into main May 7, 2026
49 checks passed

andreatgretel deleted the lbliii/fix-fern-published-docs branch May 7, 2026 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docs): unbreak published Fern site#615

fix(docs): unbreak published Fern site#615
andreatgretel merged 4 commits into
mainfrom
lbliii/fix-fern-published-docs

lbliii commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

greptile-apps Bot commented May 7, 2026 •

edited

Loading

Confidence Score: 5/5

Flowchart

Uh oh!

greptile-apps Bot May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lbliii commented May 7, 2026

Summary

1. Notebook bundles too large for Fern's SSR bundler

2. Leftover MkDocs tab syntax on the agent-rollout-ingestion page

Test plan

Uh oh!

github-actions Bot commented May 7, 2026

Review of PR #615 — fix(docs): unbreak published Fern site

Summary

Findings

fern/scripts/ipynb-to-fern-json.py

fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx

Notebook bundles (5-*.{ts,json}, 6-*.{ts,json})

Risks

Suggestions (non-blocking)

Verdict

Uh oh!

greptile-apps Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Review of PR #615 — `fix(docs): unbreak published Fern site`

`fern/scripts/ipynb-to-fern-json.py`

`fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx`

Notebook bundles (`5-.{ts,json}`, `6-.{ts,json}`)

greptile-apps Bot commented May 7, 2026 •

edited

Loading