Page MenuHomeSoftware Heritage

Jupyter notebooks rendering
Closed, MigratedEdits Locked

Description

A Jupyter notebook contains markdown, metadata and code. But its file format is just JSON.
It would be nice if the SHA were "notebook-aware" and showed them nicely.

The easy way is to show a link to nbviewer for every notebook file.

Just put the raw-file link and the nbviewer link together, like in

https://nbviewer.jupyter.org/urls/archive.softwareheritage.org/browse/content/sha1_git%3Ad55160e7c923017b137e9a4bbe43234596008bc3/raw//%3Ffilename%3Dconvert.ipynb

The hard way is to convert notebooks on the fly to html (as github does).

Event Timeline

anlambert renamed this task from Jupyter notebooks to Jupyter notebooks rendering.Apr 10 2019, 4:56 PM
anlambert triaged this task as Normal priority.
anlambert added projects: Web app, Easy hack.

There is a complication when a notebook refers to images via a relative link.
For example, the notebook above (it is a real one) refers to an image with by images/dans.png, but when I am in NBviewer looking at the SHA copy of it, that gets translated to the url

https://nbviewer.jupyter.org/urls/archive.softwareheritage.org/browse/content/sha1_git%3Ad55160e7c923017b137e9a4bbe43234596008bc3/raw//images/dans.png

while the image resides in the SHA under

https://archive.softwareheritage.org/browse/content/sha1_git:0c0869a149271cbff4fde6059a3ee26509036e39/raw/?filename=dans.png

NBviewer cannot know the magic of going from one hash to the other (I think).
So, probably it is more difficult, and you need to call NBconvert yourself, and then postprocess the result in order to transform all local hyperlinks to links that work inside the SHA.

I started working on this by adding client side rendering of notebooks in the archive web application (see D1415).

This is still a work in progress but I made great progress today (see F3499510 for a preview).

Images are currently not displayed but I know how to fix that. I will continue this work next week.

Antoine, I am really impressed! This already makes a big difference. And there are more than a million notebooks out there on GitHub.

@dirkroorda , this is now deployed to production. See [1] as an example.

Please note that to be able to load images included in the notebook and stored into the archive, you need to browse the notebook file either from its origin or its directory context.

Regarding the rendering of notebooks stored into the archive through nbviewer, I think we should consider adding the Software Heritage archive
as a provider [2] (like GitHub).

[1] https://archive.softwareheritage.org/browse/origin/https://github.com/annotation/tutorials/content/uruk/imagery.ipynb/
[2] https://github.com/jupyter/nbviewer/tree/master/nbviewer/providers

Antoine, outright marvellous! The notebooks render beautifully. And already now in production!
Because of this, I consider SHA as the premier choice of citing GitHub repos.
Many thanks!