Now that our archive coverage is steadily increasing, it has become important to visually show it to users and allow to appreciate how it changes over time.
Our current way of doing so is suboptimal (to put it mildly): we only have a bullet point list at https://www.softwareheritage.org/archive/ , which is often out-of-date.
I think we need something like libraries.io has, where we list the logos of the source code distribution places we crawl, e.g., one logo for each of GitHub, Debian, PyPI, etc.
One difficulty in doing so is our distinction between backend forges and its instance. E.g., we crawl GitLab.com, but we probably do not want to have one GitLab logo for each GitLab instance we crawl (and we already have more than one).
Another minor difficulty is where to put this: we have a duplication between https://www.softwareheritage.org/archive/ and https://archive.softwareheritage.org/ which we should probably fix anyway.
Short term proposal:
- add the list of logos of crawled forges to https://archive.softwareheritage.org/ ; for multiple instances of major services we can have a "and also" section where we only put the names of each instance, without repeating the logo
- embed that part of the page into https://www.softwareheritage.org/archive/ with some sort of widget, so that it is not duplicated
Long term proposal:
- export the list of crawled places from the DB that contains the scheduling tasks. This probably requires both distinguishing listing tasks from others and adding metadata such as where to find the relevant logo, for places for which we want to have a logo. (I'm assuming this would be easier/better than making the webapp access directly the scheduler DB.)
- make the web app access that information to dynamically generate the coverage representation