Details

Reviewers

Group Reviewers

Commits

rDWAPPSf5f921c6b50f: browse: Add Jupyter notebook rendering in content views
rDWAPPSb1186de6bfb5: generate-weblabels-webpack-plugin: Fixes and improvements

Summary

That diff adds support for client side rendering of Jupyter notebooks inside
the content view of swh-web/browse.

Related T1641

Diff Detail

Repository

rDWAPPS Web applications

Branch

jupyter-nb-rendering

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 5499
Build 7465: tox-on-jenkins	Jenkins
Build 7464: arc lint + arc unit

Event Timeline

anlambert created this revision.Apr 12 2019, 6:18 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptApr 12 2019, 6:18 PM

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/387/ for more details.

Harbormaster completed remote builds in B5408: Diff 4557.Apr 12 2019, 6:21 PM

anlambert planned changes to this revision.Apr 12 2019, 6:25 PM

anlambert mentioned this in T1641: Jupyter notebooks rendering.

Update:

rebase
add support for math typesetting upon notebook rendering through the use of the MathJax library
prevent notebook rendering to override our default font
increase max content display size

Build has FAILED

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/391/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/391/console

Harbormaster failed remote builds in B5467: Diff 4601!Apr 16 2019, 5:32 PM

Fix tests

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/392/ for more details.

Harbormaster completed remote builds in B5468: Diff 4602.Apr 16 2019, 5:43 PM

anlambert planned changes to this revision.Apr 16 2019, 5:58 PM

Update:

Rebase
Add support for image rendering in the notebook when the src url is relative to a directory present in the swh archive
Put XSS filtering related code in a dedicated file
Fix numerous issues regarding math typesetting (it was tough to handle all possible math formula as some LaTeX code escaping and small hacks were required in markdown code prior converting it to HTML, but I ended up with something that works great everywhere)
CSS improvements and fixes

Currently this diff contains a single commit. I intend to improve the git history
as future planned changes.

anlambert planned changes to this revision.Apr 17 2019, 10:39 PM

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/393/ for more details.

Harbormaster completed remote builds in B5499: Diff 4623.Apr 17 2019, 10:42 PM

Update:

move the XSS filtering improvements in a separate diff (D1425)
more CSS fixes and improvements (better handle small screens for instance)
factorize WebLabels configuration for MathJax

anlambert edited the summary of this revision. (Show Details)Apr 18 2019, 5:02 PM

anlambert added a parent revision: D1425: assets: XSS filtering improvements.

anlambert retitled this revision from [WIP] Add support for client side rendering of Jupyter notebooks to Add support for client side rendering of Jupyter notebooks.

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/395/ for more details.

Harbormaster completed remote builds in B5505: Diff 4627.Apr 18 2019, 5:05 PM

vlorentz requested changes to this revision.Apr 18 2019, 6:03 PM

vlorentz added a subscriber: vlorentz.

vlorentz added inline comments.

swh/web/assets/config/mathjax-js-files.js
21–23 ↗	(On Diff #4627)	Do we really want to start pulling JS from two other domains/orgs?
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
180–182	This seems like a bad way to check for relativity, eg. an absolute URL starting with `ftp://` or a relative starting with `http:blahblah`. Maybe search for `://` in the URL instead?
191–195	Same. (Maybe there should be a utility function to check for relativity?)
191–196	In the chunk above, you do: if relative: url = base + url and here you do: if relative: foo = base + url else: foo = url
354–357	Same
swh/web/assets/src/bundles/webapp/notebook-rendering.js
18–20	yum
24–26	JS is weird...
61	Do we use transpiling to older JS versions? If not, this will break on not-so-old browsers.
75–85	Why don't we want these underscores to be interprested by showdown?
129–136	Let's open a task to remember that
swh/web/assets/src/bundles/webapp/notebook.css
107	Why?

This revision now requires changes to proceed.Apr 18 2019, 6:03 PM

anlambert added inline comments.Apr 18 2019, 6:37 PM

swh/web/assets/config/mathjax-js-files.js
21–23 ↗	(On Diff #4627)	First one is the CDN from which we get the minified js source files of MathJax, second one enables to get the unminified source code and is only used in the WebLabels page. The goal is to respect LibreJS specification here.
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
180–182	Seems better indeed
swh/web/assets/src/bundles/webapp/notebook-rendering.js
24–26	Tell me about it
61	Of course we do
75–85	To get correct math typesetting when using MathJax. LaTeX formula containing substrings like `{ A }_{ 1 }` will be turned by showdown to `{ A }<em>{ 1 }` (even with the literalMidWordUnderscores option set to true) and MathJax will fail to properly render it afterwards.

Rebase

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/397/ for more details.

Harbormaster completed remote builds in B5510: Diff 4632.Apr 19 2019, 12:09 PM

Update:

rebase
address comments
more CSS fixes / improvements
unescape HTML from text that will be inserted in pre elements

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/399/ for more details.

Harbormaster completed remote builds in B5519: Diff 4640.Apr 19 2019, 3:41 PM

anlambert mentioned this in T1680: Properly bundle MathJax once version 3.0 is released.Apr 19 2019, 3:55 PM

anlambert added inline comments.Apr 19 2019, 4:00 PM

swh/web/assets/src/bundles/webapp/notebook-rendering.js
129–136	T1680
swh/web/assets/src/bundles/webapp/notebook.css
107	I adapted https://github.com/jsvine/nbpreview/blob/master/css/vendor/notebook.css for our web application. The CSS rule div[style="max-height:1000px;max-width:1500px;overflow:auto;"] { max-height: none !important; } was related to pandas dataframe formatting. I removed it as it effectively seems weird to have such a rule here.

Update: Fix url in notebook.css comment

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/400/ for more details.

Harbormaster completed remote builds in B5523: Diff 4644.Apr 19 2019, 4:06 PM

Update: last CSS polishing

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/401/ for more details.

Harbormaster completed remote builds in B5525: Diff 4646.Apr 19 2019, 5:57 PM

Update: Check that hljs supports the notebook language prior calling the highlight function

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/402/ for more details.

Harbormaster completed remote builds in B5526: Diff 4647.Apr 19 2019, 7:44 PM

I think we need to have a conversation about how we decide to add support for rendering specific file formats in the webapp.

There are a million different file formats out there, why are we rendering Markdown and Jupyter notebooks and not something else?
When we have support for a million (or even just a dozen, really) file formats, are we going to test for which-is-which in a long chain of IFs in a template or do we need a proper plugin system?
How do we decide if something is a given file format or not, especially considering that getting it wrong might even entail security vulnerabilities, injections, etc.? (because we're rendering content we do not control directly)

In summary, this feature is really cool. But I think we rushed a bit into getting it in. I'm not opposed to actually landing this, but we really need to sit down and think about the process, the architecture, and the policy for deciding how (and if) to add support for other formats in the future.

There are a million different file formats out there, why are we rendering Markdown and Jupyter notebooks and not something else?

It was a user request so I think it is wise to listen to their need, plus the proposed use case is really interesting for showcasing the
scientific knowledge contained into the archive.
Regarding the rendering implementation, that's why I tagged this as easy hack as we already had all the required software components
client side to implement this in a few line of code (I did not see coming the LaTeX escaping hell tough).

When we have support for a million (or even just a dozen, really), are we going to test for which-is-which in a long chained series of IFs in a template? That doesn't seem really wise…

Sure, this calls for clever refactoring.

How do we decide if something is a given file format or not, especially considering that getting it wrong might even entail security vulnerabilities, injections, etc.? (because we're rendering content we do not control directly)

All produced HTML gets filtered for XSS injection before inserting it into the DOM. So apart some external image bytes, nothing sensible will be loaded.
@olasd suggested to have some kind of user input to approve the loading of external resources, this could be added to ensure more safety.

we really need to sit down and think about the process, the architecture, and the policy for deciding how (and if) add support for other formats in the future

I will create a task on this next week and start brainstorming about it.

anlambert mentioned this in T1688: Refactor content rendering.Apr 24 2019, 11:52 AM

Rebase

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/403/ for more details.

Harbormaster completed remote builds in B5539: Diff 4659.Apr 24 2019, 5:12 PM

anlambert edited the summary of this revision. (Show Details)Apr 26 2019, 11:35 PM

vlorentz accepted this revision.Apr 29 2019, 2:28 PM

This revision is now accepted and ready to land.Apr 29 2019, 2:28 PM

Rebase

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/411/ for more details.

Harbormaster completed remote builds in B5557: Diff 4677.May 2 2019, 11:23 AM

Closed by commit rDWAPPSb1186de6bfb5: generate-weblabels-webpack-plugin: Fixes and improvements (authored by anlambert). · Explain WhyMay 2 2019, 11:35 AM

This revision was automatically updated to reflect the committed changes.

Add support for client side rendering of Jupyter notebooks
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 4623

package.json

swh/web/assets/config/.eslintrc

swh/web/assets/config/bootstrap-pre-customize.scss

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js

swh/web/assets/config/webpack.config.development.js

swh/web/assets/src/bundles/webapp/index.js

swh/web/assets/src/bundles/webapp/notebook-rendering.js

swh/web/assets/src/bundles/webapp/notebook.css

swh/web/assets/src/bundles/webapp/readme-rendering.js

swh/web/assets/src/bundles/webapp/webapp-utils.js

swh/web/assets/src/bundles/webapp/xss-filtering.js

swh/web/browse/views/directory.py

swh/web/browse/views/revision.py

swh/web/browse/views/utils/snapshot_context.py

swh/web/config.py

swh/web/templates/includes/content-display.html

swh/web/templates/includes/show-metadata.html

swh/web/tests/browse/views/test_content.py

yarn.lock

Add support for client side rendering of Jupyter notebooksClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 4623

package.json

swh/web/assets/config/.eslintrc

swh/web/assets/config/bootstrap-pre-customize.scss

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js

swh/web/assets/config/webpack.config.development.js

swh/web/assets/src/bundles/webapp/index.js

swh/web/assets/src/bundles/webapp/notebook-rendering.js

swh/web/assets/src/bundles/webapp/notebook.css

swh/web/assets/src/bundles/webapp/readme-rendering.js

swh/web/assets/src/bundles/webapp/webapp-utils.js

swh/web/assets/src/bundles/webapp/xss-filtering.js

swh/web/browse/views/directory.py

swh/web/browse/views/revision.py

swh/web/browse/views/utils/snapshot_context.py

swh/web/config.py

swh/web/templates/includes/content-display.html

swh/web/templates/includes/show-metadata.html

swh/web/tests/browse/views/test_content.py

yarn.lock

Add support for client side rendering of Jupyter notebooks
ClosedPublic
Actions

Revision Contents
Changeset List