Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Render Jupyter Notebooks#59685

Closed
bevzzz wants to merge 37 commits into
sourcegraph:mainfrom
bevzzz:feat/jupyter-notebooks
Closed

Render Jupyter Notebooks#59685
bevzzz wants to merge 37 commits into
sourcegraph:mainfrom
bevzzz:feat/jupyter-notebooks

Conversation

@bevzzz

@bevzzz bevzzz commented Jan 17, 2024

Copy link
Copy Markdown
Contributor

Hi everyone!

This PR uses nb, a package for rendering Jupyter notebooks, to address #57797.
I wrote nb because I could not find an existing implementation in Go and it also seemed like a fun project to do.
It's still lacking some important features, but I wanted to get your feedback early on.

Regarding styling and CSS: it is currently mirroring the standard Jupyter theme, and is more of a POC.
It needs some polish and should definitely be reconciled with the existing branding. I could use some help / advice here :)

This discussion has some reference links (I'll be sure to add these examples to the demo when the PR is ready).
@umpox @pdubroy (since you posted the above examples) would you give me some pointers?

It goes without saying that I would really appreciate any feedback regarding the package itself 🙌

Test plan

  • package nb has good unit test coverage
  • internal/htmlutils: unit tests in policy_test.go
  • unit test for loading and removing MathJax script in Markdown.tsx
  • other changes are tested manually, since there's very little logic there

TODO

  • upgrade nb to v0.2.0 (once it's released) as it will add some rich formatting capabilities
  • use goldmark.Markdown to render "markdown" cells in the notebooks
  • convert ANSI colors to HTML in "stream" output cells
  • add CSS styling
  • render mathematical expressions

@cla-bot cla-bot Bot added the cla-signed label Jan 17, 2024
@bevzzz bevzzz marked this pull request as draft January 17, 2024 21:55
@BolajiOlajide BolajiOlajide requested a review from a team January 25, 2024 18:00
@BolajiOlajide BolajiOlajide added the team/code-search Issues owned by the code search team label Jan 25, 2024
@bevzzz bevzzz force-pushed the feat/jupyter-notebooks branch from 718c87f to a4b445f Compare January 31, 2024 10:42
Introduces a new dependency:
- github.com/bevzzz/nb

And updates bazel buildfiles.

Issue: sourcegraph#57797
Common bluemonday.Policy defined in a shared package,
which presents a convenient API for working with the 3 main functions.
Key policy rules tested in policy_test.go to prevent regressions.

Switched from .Sanitize to .SanitizeReader in markdown.go:
bluemonday always uses the latter under the hood,
so we just cut to the chase and omit double conversion.

sourcegraph#57797
@bevzzz bevzzz force-pushed the feat/jupyter-notebooks branch from a4b445f to 23ec3ba Compare January 31, 2024 10:43
Bump nb version to 0.2.1 (introduces Extension API) and use extensions:
- render Markdown cells with goldmark.Markdown. internal/markdown
now exposes a pre-configured renderer with markdown.Goldmark().
- add syntax highlighting to code cells with nb-synth and organize
common chroma configurations in htmlutils/.
- replace ANSI characters in stream output cells with ansihtml. Here
I also updated the standard bluemonday.Policy to allow span classes
prefixed with 'ansi-'
- Added 2 lightweight extensions from goldmark-jupyter to goldmark
and nb. This adds support for cell attachments in markdown cells
(read more about it in https://github.com/bevzzz/nb/extension/extra/goldmark-jupyter README)
- updated BAZEL build config

sourcegraph#59685
global-styles/ansi.scss defines styling for classes
that ansihtml uses in its colorized spans.
For the moment, it does not differentiate between
light/dark themes (more of a POC solution).

These are only applicable to richHTML in "Markdown" module
(haven't yet found a good way to separate rendered Jupyter notebooks
into another component), we import them with the @use directiv
 in Markdown.module.scss.
As far I understand, that should limit the scope to which the styles apply.

See https://github.com/robert-nix/ansihtml/blob/master/html.go#L288 for reference.

sourcegraph#59685
Extended the bluemonday.Policy to allow "jp-" classes in "div" elements.

Most of the CSS in jupyter.scss is borrowed from the official Jupyter theme,
with a few adaptations to accomodate external classes, e.g. "chroma".

sourcegraph#59685
Deleted ipynb_test.go, only used it to debug some things locally.
When rendering Markdown (and Jupyter) files
we can load MathJax scripts on the fly and
have them render LaTeX/MathML.

Spike because I'm not sure if loading external scripts is
a good-enough solution. Works quite well though.
Doing this server-side does not seem feasible,
and I haven't found any reliable Go libraries to do it.
MathJax is something of an industry standard and the
go-to solution here.

NOTE: Markdown.tsx is used in a lot of places, but
I've added a parameter to only enable MathJax for "rendered file"
usecase, when we are displaying it in code mirror.
@bevzzz

bevzzz commented Feb 1, 2024

Copy link
Copy Markdown
Contributor Author

I think this PR is now ready to be reviewed.
In summary:

  • Markdown is rendered correctly
  • Code, JSON/XML, and console output (stream cells) are highlighted
  • Mathematical expressions are rendered correctly

Some changes that I still have in the pipeline for nb:

  • Support older notebook versions (v1, v2, v3, other minor v.4.*)
  • Customizable CSS (class prefix / classes)

Those are important, but should not affect this PR much when the changes are released.

@bevzzz bevzzz marked this pull request as ready for review February 1, 2024 16:04

@camdencheek camdencheek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Left a few comments on the backend side of things. @vovakulikov or @fkling, could you take a look at the client changes?

Comment thread internal/markdown/markdown.go Outdated
Comment thread internal/htmlutil/policy.go Outdated
Comment thread internal/htmlutil/policy.go Outdated
Comment thread internal/htmlutil/syntax.go Outdated
Comment thread client/wildcard/src/components/Markdown/Markdown.tsx Outdated
Comment thread client/wildcard/src/components/Markdown/mathjax.ts Outdated
Comment thread client/wildcard/src/components/Markdown/mathjax.ts Outdated
vovakulikov
vovakulikov previously approved these changes Feb 13, 2024

@vovakulikov vovakulikov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution, I left a few suggestions about MathJax loading, as soon as you can switch to the npm package instead of CDN you have greenlight from my side, thanks again.

@vovakulikov vovakulikov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry misclicked Approve and comment radio

@vovakulikov vovakulikov self-requested a review February 13, 2024 17:57
@vovakulikov vovakulikov dismissed their stale review February 13, 2024 17:58

misclicked Approve and comment radio

@bevzzz

bevzzz commented Feb 15, 2024

Copy link
Copy Markdown
Contributor Author

Thanks for your review, I'm going to implement these suggestions in the coming days 🙌

bevzzz and others added 4 commits February 18, 2024 16:37
This more idiomatic way avoids declaring somewhat ambigiously named
"once" variables in every package.

htmlutil.Policy() is no longer exported and should be used via
SanitizeBuffer, SanitizeString, or SanitiseBytes instead.

GH sourcegraph#57797
Push "enabled" check to useMathJax and
return early if the hook should not be applied.

GH sourcegraph#57797
Instead of sourcing the scripts and fonts from the CDN,
we bundle required assets and serve them as static files.

mathjax-full is huge (approx. 30M), which is alright
because it is not imported anywhere and will not be
a part of the final bundle.

Changes to BUILD files from running 'bazel configure'.

Excluded 'pnpm-lock.yaml' from targets for 'check-yaml' pre-commit hook,
because one of the added deps uses syntax which pre-commit-hooks falsely
identifies as invalid YAML:

pre-commit/pre-commit-hooks#984

GH sourcegraph#57797
@camdencheek camdencheek force-pushed the feat/jupyter-notebooks branch from c621821 to 0ba15e2 Compare May 9, 2024 22:15
@camdencheek

Copy link
Copy Markdown
Member

Hey @bevzzz! Really sorry about taking so long to get back to ya. I went ahead and just pushed a commit with some changes so we can get this merged.

In short:

  • I removed the MathJax dependency for now. It was causing trouble with our Svelte rewrite and added quite a bit of complexity to builds and distribution. We will look at adding it back later, but for now just getting a rendered version of the code in notebooks is a big win.
  • I moved the styles to global so they can be applied in the Svelte webapp in addition to the React webapp
  • I cleaned up a handful of lint errors and bazel build errors

Will merge as soon as I can appease CI!

@camdencheek camdencheek enabled auto-merge (squash) May 9, 2024 22:52
@camdencheek camdencheek disabled auto-merge May 9, 2024 22:52
@bevzzz

bevzzz commented May 10, 2024

Copy link
Copy Markdown
Contributor Author

Hey @camdencheek! Nvmnd, I am glad you had the time to look at the PR again.
Regarding the Mathjax, it's a bit hairy, yes. Should you need help with adding it back to the project, let me know, I'd be happy to contribute.

Thanks for handling the CI issues too -- some errors there seem to be slightly "above my paygrade" (unauthorized upload in the scip- jobs and such)

@camdencheek

Copy link
Copy Markdown
Member

@bevzzz It's merged and live on sourcegraph.com! Sorry, we seem to be having issues merging PRs from forks, so I had to recreate this #62583.

Thanks again for all your work on this -- I know at least a few people who will be very excited about the ability to view jupyter notebooks on Sourcegraph 🙂

@bevzzz

bevzzz commented May 11, 2024

Copy link
Copy Markdown
Contributor Author

That's so cool! I also know a couple of people who're gonna love it :)
Thanks, @camdencheek

@jdorfman

jdorfman commented May 13, 2024

Copy link
Copy Markdown
Member

Thank you @bevzzz for your contribution! We gave you a shout-out: https://sourcegraph.com/blog/code-search-now-supports-jupyter-notebook-rendering

Would love to send you some swag too: community@sourcegraph.com

@bevzzz

bevzzz commented May 14, 2024

Copy link
Copy Markdown
Contributor Author

@jdorfman that's very kind of you! I've sent an email to community@sourcegraph.com.

Glad I could contribute :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/code-search Issues owned by the code search team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants