Page MenuHomeSoftware Heritage

archive page: visually show archive coverage
Closed, MigratedEdits Locked

Description

Now that our archive coverage is steadily increasing, it has become important to visually show it to users and allow to appreciate how it changes over time.
Our current way of doing so is suboptimal (to put it mildly): we only have a bullet point list at https://www.softwareheritage.org/archive/ , which is often out-of-date.

I think we need something like libraries.io has, where we list the logos of the source code distribution places we crawl, e.g., one logo for each of GitHub, Debian, PyPI, etc.

One difficulty in doing so is our distinction between backend forges and its instance. E.g., we crawl GitLab.com, but we probably do not want to have one GitLab logo for each GitLab instance we crawl (and we already have more than one).

Another minor difficulty is where to put this: we have a duplication between https://www.softwareheritage.org/archive/ and https://archive.softwareheritage.org/ which we should probably fix anyway.

Short term proposal:

  1. add the list of logos of crawled forges to https://archive.softwareheritage.org/ ; for multiple instances of major services we can have a "and also" section where we only put the names of each instance, without repeating the logo
  2. embed that part of the page into https://www.softwareheritage.org/archive/ with some sort of widget, so that it is not duplicated

Long term proposal:

  1. export the list of crawled places from the DB that contains the scheduling tasks. This probably requires both distinguishing listing tasks from others and adding metadata such as where to find the relevant logo, for places for which we want to have a logo. (I'm assuming this would be easier/better than making the webapp access directly the scheduler DB.)
  2. make the web app access that information to dynamically generate the coverage representation

Event Timeline

zack triaged this task as High priority.Oct 8 2018, 3:48 PM
zack created this task.

This is what I have implemented so far: F3322096

@zack, Is this ok for you before I deploy it? Should I add HAL deposits too ? Did I miss something else ?

That is awesome, thanks a lot!

I suggest to leave only the logos on the page and move the texts you currently have in each box to hover tooltips.
To avoid perceived unfairness in the ordering I also suggest to sort them alphabetically.

Once T1117 is fixed, we can probably link each logo to relevant searches. For now let's just point to the upstream origin.

Adding HAL there would be nice too (with a tooltip saying it's push deposit rather than pull crawling).

Looks great !

+1 for adding HAL.

are the logos click-able?

Thanks for the review! HAL has been added and this is now how it looks: F3322266.
Currently working on integrating this into the main website
I should deploy it during the afternoon.

Can you keep the texts in the boxes you had with F3322096 ?
I find it clearer than just the logo.
Also, it helps to distinguish gitlab-Inria and hal, because hal has an hal-inria instance.

As @zack suggested it, the texts are now in the tooltips in order to gain display space.
For the inria logo, I think the best is to add a small GitLab logo in it to disambiguate
with HAL.

Thanks for the update, looks even greater now :-)

Further minor suggestions (i.e., it can be improved upon later if you're ready to go as is):

  • the boxes now look too "wide", maybe you can increase the number of columns from 3 to 5 or 6
  • also, do we still have a reason to keep the solid line borders now that the text is gone? I haven't tried, but I guess that without borders the page will feel more slick. YMMV

Further minor suggestions (i.e., it can be improved upon later if you're ready to go as is):

  • the boxes now look too "wide", maybe you can increase the number of columns from 3 to 5 or 6
  • also, do we still have a reason to keep the solid line borders now that the text is gone? I haven't tried, but I guess that without borders the page will feel more slick. YMMV

You were right, this is much better this way.

This is now deployed to the homepage of https://archive.softwareheritage.org but also on the main website https://www.softwareheritage.org/archive/.