Page MenuHomeSoftware Heritage

revamp archive coverage page to list instances of mentioned listers
Open, HighPublic

Description

Now that we are increasing archive coverage quite a bit, the archive coverage page is starting to show some limits. In particular, we need a structured way to list the various instances of supported listers.

  • as a first approximation we can make the tooltip of each listed logo include a list of instances — this would work for now, but it won't scale for much, because there is only so much usable space in a tooltip
  • alternatively we can make each logo link to a dedicated page, where we list all deployed instances of the lister (which will probably mean solve T1266 as a prerequisite)
  • alternatively, don't know, a way in between maybe? e.g. a box that opens when clicking on each logo, with a proper <div> with links to each instance?

A proper solution (T1266) would require some work, but the current state is no longer good enough in terms of clarity of what we currently archive… Thoughts?

Event Timeline

zack triaged this task as Normal priority.Jul 1 2019, 6:11 PM
zack created this task.
olasd added a subscriber: olasd.Jul 1 2019, 6:31 PM

My gut feeling is that we're already past the point where maintaining the list by hand is workable: in the last week or two, we've added a dozen new sources and we're going to keep adding more (at a slower pace, but probably more on an "on the fly" basis).

I'm also not quite sure where to draw the line between "platform software supported for archival" (e.g. gitlab, bitbucket, PyPI archives, Debian archives, CGit) and "currently archived hosting platforms" (e.g. gitlab.com, framagit, bitbucket.org, debian.org, kernel.org, gnu.org). Showing the distinction will probably make more sense when/if we decide to tackle T1538.

For the question of showing from which actual hosters we archive code, I think we can have a middle ground where we curate a list of "prominent" sources (which we could hard-code in a first iteration, then programmatically determine by just taking the top N sources after solving T1266), kept on top of the section, and then pick a random sample of other origins to show on a second line below.

zack added a subscriber: vlorentz.Jul 4 2019, 10:29 AM
In T1870#34563, @olasd wrote:

My gut feeling is that we're already past the point where maintaining the list by hand is workable

I concur.

I'm also not quite sure where to draw the line between "platform software supported for archival" (e.g. gitlab, bitbucket, PyPI archives, Debian archives, CGit) and "currently archived hosting platforms" (e.g. gitlab.com, framagit, bitbucket.org, debian.org, kernel.org, gnu.org). Showing the distinction will probably make more sense when/if we decide to tackle T1538.

Yeah, that too.

As a way forward, let me note down here a proposal (by @vlorentz on IRC) which I quite like and would allow us to make progress on this task without having to go all the way down this rabbit hole:

  • create a new page, e.g., archive.s.o/coverage that is automatically generated by the web app and contains:
    • a table of all the listers currently in production, one per row
    • for each lister we give the lister type (e.g., gitlab) and the instance URL (e.g. https://gitlab.com)
    • rows are grouped by lister type, so that all, say, gitlab listers come together
    • we add a section heading for each group of listers of the same type, where we can have
      • the lister type logo (this information should hence be made machine-readable somewhere)
      • a anchor that we can link to, e.g., archive.s.o/coverage/gitlab
  • the current list of logos on archive.s.o remains for now curated by hand, and
    • for logos of listers that do appear in the table, we make them link to the corresponding archive.s.o/coverage anchors, so that one can easily access the list of all instances of a given lister

This way we can still have "exceptions" on the main archive.s.o page (e.g., HAL) but still progress toward a more automated solution.

rdicosmo raised the priority of this task from Normal to High.Sep 18 2019, 2:32 PM
rdicosmo added a subscriber: rdicosmo.

I fully support this last proposal, that makes total sense.
I would like to see an API entrypoint that provides the information that will go in archive.s.o/coverage.