fix(docs): unbreak published Fern site#615
Conversation
The image notebooks (5, 6) emit `IPython.display.HTML` blocks containing inline `data:image/png;base64,...` URIs to render side-by-side image grids. Those bypassed the existing `image/png` MIME shrinker and shipped multi-MB strings through the `text/html` branch, producing 1.8 MB and 4.6 MB .ts modules. Fern's hosted SSR bundler couldn't render the version, taking every page down with a Server Components error. Add `shrink_inline_b64_in_html()` so the html branch resizes embedded base64 images through the same 800px JPEG q=82 path the standalone image branch already uses. Apply in-place to the committed bundles: notebook 5 1.8 MB → 423 KB, notebook 6 4.6 MB → 1.3 MB. Other outputs preserved. Signed-off-by: Lawrence Lane <llane@nvidia.com>
The agent-rollout-ingestion concept page still used PyMdown `=== "Title"` tab blocks left over from the MkDocs source. Fern's MDX runtime doesn't recognize the syntax, breaking the published page. Convert the five tab blocks to Fern's <Tabs>/<Tab title="..."> JSX components, preserving titles, intro text, and code snippets verbatim. Signed-off-by: Lawrence Lane <llane@nvidia.com>
Review of PR #615 —
|
Greptile SummaryThis PR fixes a broken Fern hosted docs site caused by two migration leftovers: oversized notebook payloads that crashed Fern's SSR bundler, and leftover MkDocs tab and admonition syntax that Fern's MDX runtime couldn't parse.
|
| Filename | Overview |
|---|---|
| fern/scripts/ipynb-to-fern-json.py | Adds INLINE_DATA_URI_RE regex and shrink_inline_b64_in_html() to intercept base64 images embedded in text/html MIME outputs, routing them through the existing JPEG-downsampler before they reach the .ts bundle. |
| fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx | Converts five MkDocs PyMdown === "Title" tab blocks to Fern / JSX; all titles, prose, and code snippets are preserved verbatim. |
| fern/versions/v0.5.8/pages/concepts/models/default-model-settings.mdx | Converts a single MkDocs !!! tip admonition to a Fern component; content unchanged. |
| fern/components/notebooks/5-generating-images.ts | Auto-generated notebook bundle regenerated after inline-image shrinking; reduced from ~1.8 MB to ~423 KB. |
| fern/components/notebooks/6-editing-images-with-image-context.ts | Auto-generated notebook bundle regenerated after inline-image shrinking; reduced from ~4.6 MB to ~1.3 MB. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Notebook cell output] --> B{output_type?}
B -- display_data / execute_result --> C{MIME key?}
C -- image/png --> D[shrink_image_b64\n→ JPEG ≤800px]
C -- text/html --> E[shrink_inline_b64_in_html]
E --> F{inline data: URIs\nin HTML?}
F -- yes --> G[INLINE_DATA_URI_RE.sub\n→ shrink_image_b64 per match]
F -- no --> H[pass-through HTML]
G --> I[emit type=text, format=html]
H --> I
C -- text/plain --> J[emit type=text, format=plain]
D --> K[emit type=image]
B -- stream --> L[emit type=text, format=plain]
Reviews (2): Last reviewed commit: "Merge branch 'main' into lbliii/fix-fern..." | Re-trigger Greptile
| INLINE_DATA_URI_RE = re.compile( | ||
| r"data:image/(png|jpe?g);base64,([A-Za-z0-9+/=\s]+?)(?=[\"'\s)])", | ||
| re.IGNORECASE, |
There was a problem hiding this comment.
Regex whitespace handling is self-contradicting
INLINE_DATA_URI_RE includes \s in the character class so the capture group can span whitespace-wrapped base64, and _sub even calls "".join(match.group(2).split()) to strip that whitespace. However, the lazy quantifier +? combined with \s in the lookahead (?=["'\s)]) means the match terminates at the first whitespace it encounters — so \s in the character class is never reachable and the whitespace-stripping in _sub is always a no-op. If a future notebook emits pretty-printed HTML with line-wrapped base64 (e.g. src='data:image/png;base64,\nAAA...), the regex captures only the empty prefix before the newline, passes garbage to shrink_image_b64, which fails silently and returns an empty string, and the replacement produces a broken data:image/jpeg;base64, URI.
Prompt To Fix With AI
This is a comment left during a code review.
Path: fern/scripts/ipynb-to-fern-json.py
Line: 64-66
Comment:
**Regex whitespace handling is self-contradicting**
`INLINE_DATA_URI_RE` includes `\s` in the character class so the capture group can span whitespace-wrapped base64, and `_sub` even calls `"".join(match.group(2).split())` to strip that whitespace. However, the lazy quantifier `+?` combined with `\s` in the lookahead `(?=["'\s)])` means the match terminates at the *first* whitespace it encounters — so `\s` in the character class is never reachable and the whitespace-stripping in `_sub` is always a no-op. If a future notebook emits pretty-printed HTML with line-wrapped base64 (e.g. `src='data:image/png;base64,\nAAA...`), the regex captures only the empty prefix before the newline, passes garbage to `shrink_image_b64`, which fails silently and returns an empty string, and the replacement produces a broken `data:image/jpeg;base64,` URI.
How can I resolve this? If you propose a fix, please make it concise.Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Summary
The published Fern site at
datadesigner.docs.buildwithfern.com/nemo/datadesignerwas broken: every page (e.g./concepts/person-sampling) returned a Server Components render error in production while local previews worked. Two unrelated migration leftovers were the cause.1. Notebook bundles too large for Fern's SSR bundler
fern/components/notebooks/{5,6}-*.tsshipped at 1.8 MB and 4.6 MB. The image notebooks emitIPython.display.HTMLgrids containing inlinedata:image/png;base64,...URIs, which bypassed the existingimage/pngMIME shrinker inipynb-to-fern-json.pyand were copied verbatim through thetext/htmlbranch. Fern's hosted RSC payload limit choked, and because the version bundle is shared, the whole site went down — not just the notebook pages.Fix: Added
shrink_inline_b64_in_html()to the converter so the HTML branch reuses the same 800 px JPEG q=82 path that the bare-image branch already uses. Applied in place to the committed bundles, preserving every other cell output:The script change makes future
make generate-fern-notebooks-with-outputsruns idempotent — full-resolution Flux outputs get downsized at conversion time before they hit any.ts.2. Leftover MkDocs tab syntax on the agent-rollout-ingestion page
fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdxstill used PyMdown=== "Title"tab blocks from the original MkDocs source (missed during #581). Fern's MDX runtime doesn't recognize the syntax. Converted the five tabs to<Tabs>/<Tab title="...">JSX components, preserving titles, intro text, and code snippets verbatim.Test plan
cd fern && fern check— 0 errors./9j/4AA...).text/htmlcells with inline base64 were modified).datadesigner.docs.buildwithfern.com/nemo/datadesigner/concepts/person-samplingand other pages render./concepts/agent-rollout-ingestionrenders the five tabs correctly.