Page MenuHomeSoftware Heritage

Add support for client side rendering of Jupyter notebooks
ClosedPublic

Authored by anlambert on Apr 12 2019, 6:18 PM.

Details

Summary

That diff adds support for client side rendering of Jupyter notebooks inside
the content view of swh-web/browse.

Related T1641

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Apr 12 2019, 6:18 PM
anlambert planned changes to this revision.Apr 12 2019, 6:25 PM
anlambert updated this revision to Diff 4601.Apr 16 2019, 5:28 PM

Update:

  • rebase
  • add support for math typesetting upon notebook rendering through the use of the MathJax library
  • prevent notebook rendering to override our default font
  • increase max content display size
anlambert updated this revision to Diff 4602.Apr 16 2019, 5:39 PM

Fix tests

anlambert planned changes to this revision.Apr 16 2019, 5:58 PM
anlambert updated this revision to Diff 4623.Apr 17 2019, 10:38 PM

Update:

  • Rebase
  • Add support for image rendering in the notebook when the src url is relative to a directory present in the swh archive
  • Put XSS filtering related code in a dedicated file
  • Fix numerous issues regarding math typesetting (it was tough to handle all possible math formula as some LaTeX code escaping and small hacks were required in markdown code prior converting it to HTML, but I ended up with something that works great everywhere)
  • CSS improvements and fixes

Currently this diff contains a single commit. I intend to improve the git history
as future planned changes.

anlambert planned changes to this revision.Apr 17 2019, 10:39 PM
anlambert updated this revision to Diff 4627.Apr 18 2019, 5:02 PM

Update:

  • move the XSS filtering improvements in a separate diff (D1425)
  • more CSS fixes and improvements (better handle small screens for instance)
  • factorize WebLabels configuration for MathJax
anlambert edited the summary of this revision. (Show Details)Apr 18 2019, 5:02 PM
anlambert retitled this revision from [WIP] Add support for client side rendering of Jupyter notebooks to Add support for client side rendering of Jupyter notebooks.
vlorentz requested changes to this revision.Apr 18 2019, 6:03 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/web/assets/config/mathjax-js-files.js
21–23 ↗(On Diff #4627)

Do we really want to start pulling JS from two other domains/orgs?

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
181–183

This seems like a bad way to check for relativity, eg. an absolute URL starting with ftp:// or a relative starting with http:blahblah.

Maybe search for :// in the URL instead?

195–199

Same.

(Maybe there should be a utility function to check for relativity?)

195–200

In the chunk above, you do:

if relative:
   url = base + url

and here you do:

if relative:
    foo = base + url
else:
    foo = url
358–361

Same

swh/web/assets/src/bundles/webapp/notebook-rendering.js
18–20 ↗(On Diff #4627)

yum

24–26 ↗(On Diff #4627)

JS is weird...

61 ↗(On Diff #4627)

Do we use transpiling to older JS versions? If not, this will break on not-so-old browsers.

75–85 ↗(On Diff #4627)

Why don't we want these underscores to be interprested by showdown?

129–136 ↗(On Diff #4627)

Let's open a task to remember that

swh/web/assets/src/bundles/webapp/notebook.css
107 ↗(On Diff #4627)

Why?

This revision now requires changes to proceed.Apr 18 2019, 6:03 PM
anlambert added inline comments.Apr 18 2019, 6:37 PM
swh/web/assets/config/mathjax-js-files.js
21–23 ↗(On Diff #4627)

First one is the CDN from which we get the minified js source files of MathJax, second one enables to get the unminified source code and is only used in the WebLabels page.
The goal is to respect LibreJS specification here.

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
181–183

Seems better indeed

swh/web/assets/src/bundles/webapp/notebook-rendering.js
24–26 ↗(On Diff #4627)

Tell me about it

61 ↗(On Diff #4627)

Of course we do

75–85 ↗(On Diff #4627)

To get correct math typesetting when using MathJax.

LaTeX formula containing substrings like { A }_{ 1 } will be turned by showdown to { A }<em>{ 1 }
(even with the literalMidWordUnderscores option set to true) and MathJax will fail to properly
render it afterwards.

anlambert updated this revision to Diff 4640.Apr 19 2019, 3:36 PM

Update:

  • rebase
  • address comments
  • more CSS fixes / improvements
  • unescape HTML from text that will be inserted in pre elements
anlambert added inline comments.Apr 19 2019, 4:00 PM
swh/web/assets/src/bundles/webapp/notebook-rendering.js
129–136 ↗(On Diff #4627)
swh/web/assets/src/bundles/webapp/notebook.css
107 ↗(On Diff #4627)

I adapted https://github.com/jsvine/nbpreview/blob/master/css/vendor/notebook.css for our web application.

The CSS rule

div[style="max-height:1000px;max-width:1500px;overflow:auto;"] {
    max-height: none !important;    
}

was related to pandas dataframe formatting.

I removed it as it effectively seems weird to have such a rule here.

anlambert updated this revision to Diff 4644.Apr 19 2019, 4:02 PM

Update: Fix url in notebook.css comment

anlambert updated this revision to Diff 4646.Apr 19 2019, 5:53 PM

Update: last CSS polishing

anlambert updated this revision to Diff 4647.Apr 19 2019, 7:40 PM

Update: Check that hljs supports the notebook language prior calling the highlight function

zack added a subscriber: zack.EditedApr 19 2019, 7:49 PM

I think we need to have a conversation about how we decide to add support for rendering specific file formats in the webapp.

  • There are a million different file formats out there, why are we rendering Markdown and Jupyter notebooks and not something else?
  • When we have support for a million (or even just a dozen, really) file formats, are we going to test for which-is-which in a long chain of IFs in a template or do we need a proper plugin system?
  • How do we decide if something is a given file format or not, especially considering that getting it wrong might even entail security vulnerabilities, injections, etc.? (because we're rendering content we do not control directly)

In summary, this feature is really cool. But I think we rushed a bit into getting it in. I'm not opposed to actually landing this, but we really need to sit down and think about the process, the architecture, and the policy for deciding how (and if) to add support for other formats in the future.

There are a million different file formats out there, why are we rendering Markdown and Jupyter notebooks and not something else?

It was a user request so I think it is wise to listen to their need, plus the proposed use case is really interesting for showcasing the
scientific knowledge contained into the archive.
Regarding the rendering implementation, that's why I tagged this as easy hack as we already had all the required software components
client side to implement this in a few line of code (I did not see coming the LaTeX escaping hell tough).

When we have support for a million (or even just a dozen, really), are we going to test for which-is-which in a long chained series of IFs in a template? That doesn't seem really wise…

Sure, this calls for clever refactoring.

How do we decide if something is a given file format or not, especially considering that getting it wrong might even entail security vulnerabilities, injections, etc.? (because we're rendering content we do not control directly)

All produced HTML gets filtered for XSS injection before inserting it into the DOM. So apart some external image bytes, nothing sensible will be loaded.
@olasd suggested to have some kind of user input to approve the loading of external resources, this could be added to ensure more safety.

we really need to sit down and think about the process, the architecture, and the policy for deciding how (and if) add support for other formats in the future

I will create a task on this next week and start brainstorming about it.

anlambert edited the summary of this revision. (Show Details)Apr 26 2019, 11:35 PM
vlorentz accepted this revision.Apr 29 2019, 2:28 PM
This revision is now accepted and ready to land.Apr 29 2019, 2:28 PM
This revision was automatically updated to reflect the committed changes.