Skip to content

fix(docs): unbreak published Fern site#615

Merged
andreatgretel merged 4 commits into
mainfrom
lbliii/fix-fern-published-docs
May 7, 2026
Merged

fix(docs): unbreak published Fern site#615
andreatgretel merged 4 commits into
mainfrom
lbliii/fix-fern-published-docs

Conversation

@lbliii

@lbliii lbliii commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

The published Fern site at datadesigner.docs.buildwithfern.com/nemo/datadesigner was broken: every page (e.g. /concepts/person-sampling) returned a Server Components render error in production while local previews worked. Two unrelated migration leftovers were the cause.

1. Notebook bundles too large for Fern's SSR bundler

fern/components/notebooks/{5,6}-*.ts shipped at 1.8 MB and 4.6 MB. The image notebooks emit IPython.display.HTML grids containing inline data:image/png;base64,... URIs, which bypassed the existing image/png MIME shrinker in ipynb-to-fern-json.py and were copied verbatim through the text/html branch. Fern's hosted RSC payload limit choked, and because the version bundle is shared, the whole site went down — not just the notebook pages.

Fix: Added shrink_inline_b64_in_html() to the converter so the HTML branch reuses the same 800 px JPEG q=82 path that the bare-image branch already uses. Applied in place to the committed bundles, preserving every other cell output:

Notebook Before After
5-generating-images 1.8 MB 423 KB
6-editing-images-with-image-context 4.6 MB 1.3 MB

The script change makes future make generate-fern-notebooks-with-outputs runs idempotent — full-resolution Flux outputs get downsized at conversion time before they hit any .ts.

2. Leftover MkDocs tab syntax on the agent-rollout-ingestion page

fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx still used PyMdown === "Title" tab blocks from the original MkDocs source (missed during #581). Fern's MDX runtime doesn't recognize the syntax. Converted the five tabs to <Tabs> / <Tab title="..."> JSX components, preserving titles, intro text, and code snippets verbatim.

Test plan

  • cd fern && fern check — 0 errors.
  • Verified shrunk base64 payloads start with valid JPEG magic (/9j/4AA...).
  • Confirmed all non-image cell outputs preserved in notebooks 5 and 6 (only text/html cells with inline base64 were modified).
  • After merge, confirm datadesigner.docs.buildwithfern.com/nemo/datadesigner/concepts/person-sampling and other pages render.
  • Confirm tutorial pages for notebooks 5 and 6 show images (now JPEG, ≤800 px longest edge).
  • Confirm /concepts/agent-rollout-ingestion renders the five tabs correctly.

lbliii added 2 commits May 7, 2026 14:28
The image notebooks (5, 6) emit `IPython.display.HTML` blocks containing
inline `data:image/png;base64,...` URIs to render side-by-side image
grids. Those bypassed the existing `image/png` MIME shrinker and shipped
multi-MB strings through the `text/html` branch, producing 1.8 MB and
4.6 MB .ts modules. Fern's hosted SSR bundler couldn't render the
version, taking every page down with a Server Components error.

Add `shrink_inline_b64_in_html()` so the html branch resizes embedded
base64 images through the same 800px JPEG q=82 path the standalone
image branch already uses. Apply in-place to the committed bundles:
notebook 5 1.8 MB → 423 KB, notebook 6 4.6 MB → 1.3 MB. Other outputs
preserved.

Signed-off-by: Lawrence Lane <llane@nvidia.com>
The agent-rollout-ingestion concept page still used PyMdown
`=== "Title"` tab blocks left over from the MkDocs source. Fern's MDX
runtime doesn't recognize the syntax, breaking the published page.
Convert the five tab blocks to Fern's <Tabs>/<Tab title="..."> JSX
components, preserving titles, intro text, and code snippets verbatim.

Signed-off-by: Lawrence Lane <llane@nvidia.com>
@lbliii lbliii requested a review from a team as a code owner May 7, 2026 18:29
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Review of PR #615fix(docs): unbreak published Fern site

Summary

Two targeted fixes to unbreak the hosted Fern site:

  1. fern/scripts/ipynb-to-fern-json.py — adds shrink_inline_b64_in_html() so inline data:image/...;base64,... URIs embedded in IPython.display.HTML outputs are funnelled through the existing shrink_image_b64 pipeline, instead of slipping past the image/png MIME branch and landing in the .ts bundle at full resolution.
  2. fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx — converts five leftover PyMdown === "Title" tab blocks to Fern <Tabs>/<Tab> JSX (missed during the MkDocs→Fern migration in docs: migrate documentation from MkDocs to Fern #581).
  3. fern/components/notebooks/{5,6}-*.{ts,json} — regenerated artifacts showing the payload drop (1.8 MB → 423 KB, 4.6 MB → 1.3 MB).

Scope is narrow and matches the stated root cause. No runtime code paths are touched.

Findings

fern/scripts/ipynb-to-fern-json.py

  • Correctness — regex terminators look right. INLINE_DATA_URI_RE uses a non-greedy [A-Za-z0-9+/=\s]+? for the payload with a lookahead (?=[\"'\s)]). That covers the three realistic contexts (HTML attribute quotes, whitespace, CSS url(...)). Because base64 alphabet is A-Za-z0-9+/=, none of the terminator characters can appear inside the payload, so the non-greedy match won't prematurely stop mid-URI. Good.
  • Minor — "idempotent" is slightly overstated (fern/scripts/ipynb-to-fern-json.py:99–108 / summary). shrink_image_b64 unconditionally re-encodes to JPEG q=82 even when max(img.size) <= max_dim. On repeat runs of make generate-fern-notebooks-with-outputs, already-shrunk JPEG inline URIs will match INLINE_DATA_URI_RE again (the regex accepts jpeg) and be decoded + re-encoded, accumulating generational loss. Payload size will stabilize, but visual quality slowly degrades. Cheapest fix: short-circuit in shrink_image_b64 when img.format == "JPEG" and max(img.size) <= max_dim — return the original (b64, "image/jpeg") untouched. Not blocking for this PR; worth a follow-up issue.
  • Nit — the lookahead class includes \" but not <. If a future cell emits <img src=data:image/png;base64,AAA...> (unquoted attribute), the URI would be followed by > and the regex would greedily consume until a whitespace/quote/paren boundary is found. Current notebook authors always quote attributes, so this is theoretical. Mentioning only because adding > to the terminator class is zero cost.
  • Missing — no unit test for shrink_inline_b64_in_html. The project generally tests new logic, but this is a build script under fern/scripts/ (outside the three installable packages) and there's no existing test harness for it. Consistent with the neighboring code; acceptable.
  • Docstring style (fern/scripts/ipynb-to-fern-json.py:90–94). Multi-line docstring here is longer than the STYLEGUIDE one-liner preference, but this matches the existing style of shrink_image_b64 directly above it, which is the right call — consistency with the neighbor wins.

fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx

  • Content preserved. Spot-checked all five tab titles (Claude Code, Codex, Hermes Agent, Pi Coding Agent, ATIF), intro prose, and code blocks — titles and Python snippets are verbatim. Link to Harbor's ATIF docs preserved.
  • Indentation removed is correct for Fern. Old PyMdown syntax required 4-space indented content inside === "Title"; Fern <Tab> children sit at column 0. The diff correctly dedents.
  • Consistency check. Compared to other pages using <Tabs> under fern/versions/v0.5.8/ — matches the same JSX pattern. Good.

Notebook bundles (5-*.{ts,json}, 6-*.{ts,json})

  • Not reviewed line-by-line — these are generator output. Payload reductions in the PR description (1.8→0.4 MB, 4.6→1.3 MB) are consistent with 800 px JPEG q=82 on Flux outputs.
  • PR author confirms non-image cells are unmodified; the regex is scoped to inline data URIs only, so this is believable from the code.

Risks

  • Low blast radius. No Python package code touched; change is confined to the fern/ docs pipeline. Cannot regress runtime behavior.
  • Forward compatibility. The converter change is backward-compatible — notebooks without inline HTML base64 payloads are unaffected (the regex simply never matches).
  • Rollback is trivial — revert the script change; re-run the generator; previous bundles restored.

Suggestions (non-blocking)

  1. Short-circuit shrink_image_b64 when the decoded image is already JPEG and within size limits, to make the converter truly idempotent across repeat generator runs.
  2. Consider widening the lookahead character class to include > for defense against unquoted HTML attributes in future notebook outputs.
  3. Follow-up issue: add a lightweight test for ipynb-to-fern-json.py (golden-file or regex-level) so future refactors don't silently regress.

Verdict

Approve intent. The fix is well-scoped, root-caused, and matches the PR description. The idempotency nit and regex-terminator nit are minor follow-ups, not blockers. Recommend merging once the test-plan boxes for the hosted site render are checked post-merge.

@greptile-apps

greptile-apps Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a broken Fern hosted docs site caused by two migration leftovers: oversized notebook payloads that crashed Fern's SSR bundler, and leftover MkDocs tab and admonition syntax that Fern's MDX runtime couldn't parse.

  • Notebook payload shrinking: shrink_inline_b64_in_html() is added to ipynb-to-fern-json.py so that inline data:image/png;base64,... URIs embedded in IPython.display.HTML outputs are downsampled to 800 px JPEG (q=82) via the existing shrink_image_b64 path; notebooks 5 and 6 are regenerated with the fix applied (1.8 MB → 423 KB, 4.6 MB → 1.3 MB).
  • MDX syntax migration: agent-rollout-ingestion.mdx five PyMdown === \"Title\" blocks are converted to <Tabs>/<Tab title=\"...\"> JSX, and default-model-settings.mdx one !!! tip admonition is converted to <Tip>, both now valid Fern MDX.

Confidence Score: 5/5

Safe to merge — all changes are targeted doc fixes with no impact on library code or data-generation logic.

The script change is a straightforward additive shim that routes an already-validated code path (shrink_image_b64) for a new input source. The MDX edits are pure syntax replacements with content preserved verbatim. No library, API surface, or runtime logic is touched.

No files require special attention.

Important Files Changed

Filename Overview
fern/scripts/ipynb-to-fern-json.py Adds INLINE_DATA_URI_RE regex and shrink_inline_b64_in_html() to intercept base64 images embedded in text/html MIME outputs, routing them through the existing JPEG-downsampler before they reach the .ts bundle.
fern/versions/v0.5.8/pages/concepts/agent-rollout-ingestion.mdx Converts five MkDocs PyMdown === "Title" tab blocks to Fern / JSX; all titles, prose, and code snippets are preserved verbatim.
fern/versions/v0.5.8/pages/concepts/models/default-model-settings.mdx Converts a single MkDocs !!! tip admonition to a Fern component; content unchanged.
fern/components/notebooks/5-generating-images.ts Auto-generated notebook bundle regenerated after inline-image shrinking; reduced from ~1.8 MB to ~423 KB.
fern/components/notebooks/6-editing-images-with-image-context.ts Auto-generated notebook bundle regenerated after inline-image shrinking; reduced from ~4.6 MB to ~1.3 MB.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Notebook cell output] --> B{output_type?}
    B -- display_data / execute_result --> C{MIME key?}
    C -- image/png --> D[shrink_image_b64\n→ JPEG ≤800px]
    C -- text/html --> E[shrink_inline_b64_in_html]
    E --> F{inline data: URIs\nin HTML?}
    F -- yes --> G[INLINE_DATA_URI_RE.sub\n→ shrink_image_b64 per match]
    F -- no --> H[pass-through HTML]
    G --> I[emit type=text, format=html]
    H --> I
    C -- text/plain --> J[emit type=text, format=plain]
    D --> K[emit type=image]
    B -- stream --> L[emit type=text, format=plain]
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into lbliii/fix-fern..." | Re-trigger Greptile

Comment on lines +64 to +66
INLINE_DATA_URI_RE = re.compile(
r"data:image/(png|jpe?g);base64,([A-Za-z0-9+/=\s]+?)(?=[\"'\s)])",
re.IGNORECASE,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Regex whitespace handling is self-contradicting

INLINE_DATA_URI_RE includes \s in the character class so the capture group can span whitespace-wrapped base64, and _sub even calls "".join(match.group(2).split()) to strip that whitespace. However, the lazy quantifier +? combined with \s in the lookahead (?=["'\s)]) means the match terminates at the first whitespace it encounters — so \s in the character class is never reachable and the whitespace-stripping in _sub is always a no-op. If a future notebook emits pretty-printed HTML with line-wrapped base64 (e.g. src='data:image/png;base64,\nAAA...), the regex captures only the empty prefix before the newline, passes garbage to shrink_image_b64, which fails silently and returns an empty string, and the replacement produces a broken data:image/jpeg;base64, URI.

Prompt To Fix With AI
This is a comment left during a code review.
Path: fern/scripts/ipynb-to-fern-json.py
Line: 64-66

Comment:
**Regex whitespace handling is self-contradicting**

`INLINE_DATA_URI_RE` includes `\s` in the character class so the capture group can span whitespace-wrapped base64, and `_sub` even calls `"".join(match.group(2).split())` to strip that whitespace. However, the lazy quantifier `+?` combined with `\s` in the lookahead `(?=["'\s)])` means the match terminates at the *first* whitespace it encounters — so `\s` in the character class is never reachable and the whitespace-stripping in `_sub` is always a no-op. If a future notebook emits pretty-printed HTML with line-wrapped base64 (e.g. `src='data:image/png;base64,\nAAA...`), the regex captures only the empty prefix before the newline, passes garbage to `shrink_image_b64`, which fails silently and returns an empty string, and the replacement produces a broken `data:image/jpeg;base64,` URI.

How can I resolve this? If you propose a fix, please make it concise.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
@andreatgretel andreatgretel merged commit fba8f0b into main May 7, 2026
49 checks passed
@andreatgretel andreatgretel deleted the lbliii/fix-fern-published-docs branch May 7, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants