Page MenuHomeSoftware Heritage

swh web ui: first implementation of source tree graphical navigation feature by providing HTML views for swh directory and content objects
ClosedPublic

Authored by anlambert on Sep 15 2017, 8:04 PM.

Details

Summary

First attempt of implementing HTML views for easily browsing a source tree archived by swh (T759)

This is my resulting work of this week: first draft of HTML views for browsing directory and content objects.
This is pretty basic at that point but enables to flawlessly browse a source tree from any provided root
directory identified in the swh archive.

New web end points added:

  • /ui/ -> points to the linux kernel source tree
  • /ui/directory/<sha1_git>/ and /ui/directory/<sha1_git>/<path>/ -> browse a directory
  • /ui/content/<sha1_git>/ -> display content

The content ui end point will try to highlight code using Pygments before displaying it.
In order to get a correct highlighting, it is better to come from a directory view when
navigating to it as filename is used as a hint to get the adequate Pygments lexer. Mime type
is used instead which leads to poor results as almost everything is identified as text/plain
or equivalent.

Let the reviews begin !

Diff Detail

Repository
rDWAPPS Web applications
Branch
swh-web-ui-file-browser
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1030
Build 1362: Software Heritage Python tests
Build 1361: arc lint + arc unit

Event Timeline

I fixed the broken build due to a new missing dependency (python3-magic).
Now it's all green.

zack requested changes to this revision.Sep 19 2017, 11:27 AM

The main thing we agreed upon when discussing F2F about all this yesterday is that we want to do rendering on the client side, leveraging the information our indexer have extracted from the blob if available.
We mentioned highlight.js, but it might make sense to check what github is using (if it's client side), given we already use it on the indexer.

Aside from that, I've added above only a few extra minor comments.

swh/web/tests/ui/views/data/mascot.png
1 ↗(On Diff #811)

uh? where does this nice bird comes from? :)
(just make sure the copyright/license of this thing is OK)

swh/web/urls.py
44

"/ui" makes sense from a developer point of view (and hence it is perfectly fine as Python module name), but not as much to final users.
I'm guessing we cannot mount it at "/" due to conflicts, right?
If so, I suggest using "/browse" as a user visible endpoint.

This revision now requires changes to proceed.Sep 19 2017, 11:27 AM

Finally found how syntax highlighting works on Github (based on http://linguist.readthedocs.io/en/latest/README/). All is done server-side using Ruby:

  • language detection is performed by the linguist library
  • syntax highlighting is done using Pygments through a Ruby wrapper

Regarding how to handle it in the swh web ui. As we do not want to perform server-side highlighting, I was thinking about
an hybrid solution based on Pygments for detecting the language and then using client-side highlighting with highlight.js.

highlight.js can automatically try to detect the language based on the content to highlight but it adds an overhead on the page load
and do not work very well in all situations.

In order to detect the language server-side, as we don't want to reinvent the wheel, we could use the Pygments API to detect it based on the extension of the file
to highlight (not its content) and thus give a hint to highlight.js when generating the HTML view. The only difficulty here is to compute a mapping
between Pygments lexers and highlight.js language classes.

swh/web/tests/ui/views/data/mascot.png
1 ↗(On Diff #811)

It comes from the kate text editor repository. I will replace the image with the swh logo.

swh/web/urls.py
44

Ok for using /browse as the visible endpoint, this is effectively more straightforward.
Nevertheless, this could be mounted at / too in the future by changing the redirection in default_view.

Updating D246: swh web browse ui: first implementation of source tree graphical navigation feature by providing HTML views for swh directory and content objects

Changes since last diff:

  • Urls root endpoint is now /browse instead of /ui
  • Syntax highlighting of textual content is now performed client-side through the use of the highlight.js library. The language to highlight is determined server-side by using either the content's filename of its mime type.
  • Image content are now displayed with an img HTML tag when the image type is supported by the browser

This is still a WIP, notably the design still needs to be improved.

Updating D246: swh web ui: first implementation of source tree graphical navigation feature by providing HTML views for swh directory and content objects

Resulting work of that week: implement django views for browsing metadata, directories and contents associated to the visit of an origin.

The new directory and content views notably enable to easily switch between the origin branches found during the visit.

Also some slight design improvements and add line numbers when displaying textual content.

This is still a WIP though, would love some feedback on the current implementation and uri scheme.

LGTM.

I've added some minor comments in-line.

Aside from that, and since you asked, the URL scheme looks good too. Does it differ in any way from the one documented in README-uri-scheme.md (it doesn't seem so, but I haven't checked)?
More generally, it is now kind of annoying to have the URL scheme documented both in that README file and in the docstrings, because they will get out of sync eventually.
At this point I'd be totally fine getting rid of the README, if the docstrings now contain all the corresponding information, but I'd still love to have a way to see at a glance all available URLs; if the Django mapping file provides that, by all means get rid of the README once you no longer need it.

requirements.txt
12

There are some weirdnesses around the naming of python-magic in debian and on PyPi—in particular there are *multiple* packages sharing the suffix "magic" on PyPi and only one is packaged in Debian as python*-magic.
Please check that python3-magic corresponds to what you'd get from PyPi as "magic", if you haven't done so already.

swh/web/browse/views/content.py
198–202

just a nit, the empty spaces between the bullets here can be removed, right?
(same in other "URL scheme" lists below)

swh/web/common/highlightjs.py
243

This is run when importing the module, which isn't great.
Can this be run either lazily when needed, or moved to an explicit init() method?

swh/web/static/js/highlight/highlightjs-line-numbers.min.js
3

for both this and the other minimized javascript file coming from highlight.js, we should also keep the non-minimized versions around, for reproducibility and editability purposes

This revision was automatically updated to reflect the committed changes.