Page MenuHomeSoftware Heritage

Reference bitbucket mercurial origins
Closed, ResolvedPublic

Description

Origins loaded so present in the archive and in the scheduler db (table origin-visit-stats).
As those origins were not listed through a lister though, they are filtered out during the scheduler metrics update (sql routine which does an inner join).

We need to reference those origins so they can show up properly in the scheduler metrics (and thus in https://archive.softwareheritage.org main page).

The simpler way seems to attach those origins with the bitbucket lister, retrofill the listed origins from origin-visit-stats as disabled ones.
And then let the remaining cogs do the rest.

Or simply reference the bitbucket mercurial origins in a discontinued entry in the archive's main page (coverage section) [1]

[1] D6475

Event Timeline

ardumont created this task.

I was thinking of something ad-hoc such as:

insert into listed_origins 
   (lister_id, url, visit_type, last_update, enabled) 
select 
    (select id from listers where name='bitbucket') as lister_id,
    url,
   'hg' as visit_type,
   last_visit as last_update,
   false as enabled
 from origin_visit_stats
 where visit_type = 'hg' and url > 'https://bitbucket.org/' and url < 'https://bitbucket.org0';

We could argue that adding a separate, "virtual" lister instance for these bulk archived origins would make sense, but I don't know if it's worth the bother.

In T3658#72284, @olasd wrote:

We could argue that adding a separate, "virtual" lister instance for these bulk archived origins would make sense, but I don't know if it's worth the bother.

Either way we should make sure that the rendering on the main page of the archive "works". In that sense, we should maybe have a separate lister "name" so that these origins can be rendered in the "discontinued hosting" section. @anlambert, do you have a suggestion?

ardumont renamed this task from Reference bitbucket mercurial origins in scheduler metrics to Reference bitbucket mercurial origins.Oct 15 2021, 9:45 AM

A first simple solution has been implemented in the webapp for now [1].
It's not deployed yet.

In the end for now, the actual stats in the scheduled db has not been retrofilled.
Do we still want to?

[1] D6475

ardumont changed the task status from Open to Work in Progress.Oct 15 2021, 9:49 AM
ardumont moved this task from Backlog to in-progress on the System administration board.
ardumont updated the task description. (Show Details)

New webapp version deployed [1], we can see the mercurial origins referenced as a discontinued service there.

[1] https://archive.softwareheritage.org/

I've opened T3674 to discuss how to properly reference origins that are not the output of listers.