Page MenuHomeSoftware Heritage

Use forge URL network location as default lister instance name
Closed, MigratedEdits Locked

Description

In order to simplify the mapping between a forge lister and the number of loaded origins after its execution (computed by swh-counters), a lister instance name should default to the network location of the listed forge URL.

For instance, below are the current gitlab lister instance names in production:

gitlab        | riseup
gitlab        | lip6
gitlab        | inria
gitlab        | freedesktop
gitlab        | ow2
gitlab        | common-lisp
gitlab        | gnome
gitlab        | gite.lirmm
gitlab        | gitlab
gitlab        | framagit

We would like to have the following instance names instead:

gitlab        | 0xacab.org
gitlab        | gitlab.lip6.fr
gitlab        | gitlab.inria.fr
gitlab        | gitlab.freedesktop.org
gitlab        | gitlab.ow2.org
gitlab        | gitlab.common-lisp.net
gitlab        | gitlab.gnome.org
gitlab        | gite.lirmm.fr
gitlab        | gitlab.com
gitlab        | framagit.org

Looking at swh-lister code, we could make the instance parameter optional and use the URL network location when not provided.

Looking at swh-scheduler code, if we change lister instance names, we will have to modify the values of the instance_name column for the listers table in production scheduler database (hopefully lister id is not computed from the instance name).

Event Timeline

anlambert triaged this task as Normal priority.Jun 23 2021, 11:47 AM
anlambert created this task.
olasd changed the task status from Open to Work in Progress.Jul 9 2021, 3:37 PM
olasd claimed this task.
olasd added a subscriber: olasd.

I've updated the listers with no credentials:

begin;

create function update_instance_name(lister_type text, old_name text, new_name text) returns setof task language sql as $$
  update listers set instance_name = new_name where name = lister_type and instance_name = old_name;
  update task
    set arguments = jsonb_set(arguments, '{kwargs,instance}', to_jsonb(new_name))
    where type in ('list-' || lister_type, 'list-' || lister_type || '-full', 'list-' || lister_type || '-incremental')
          and arguments#>>'{kwargs, instance}' = old_name
    returning *;
$$;

select * from update_instance_name('cgit', 'eclipse', 'git.eclipse.org');
select * from update_instance_name('cgit', 'happyassassin', 'www.happyassassin.net');
select * from update_instance_name('cgit', 'tor', 'gitweb.torproject.org');
select * from update_instance_name('cgit', 'hdiff.luite', 'hdiff.luite.com');
select * from update_instance_name('cgit', 'alpinelinux', 'git.alpinelinux.org');
select * from update_instance_name('cgit', 'openembedded', 'git.openembedded.org');
select * from update_instance_name('cgit', 'yoctoproject', 'git.yoctoproject.org');
select * from update_instance_name('cgit', 'zx2c4', 'git.zx2c4.com');
select * from update_instance_name('cgit', 'git-kernel', 'git.kernel.org');
select * from update_instance_name('cgit', 'fedora', 'fedorapeople.org');
select * from update_instance_name('cgit', 'baserock', 'git.baserock.org');
select * from update_instance_name('cgit', 'qt.io', 'code.qt.io');

select * from update_instance_name('gitlab', 'lip6', 'gitlab.lip6.fr');
select * from update_instance_name('gitlab', 'freedesktop', 'gitlab.freedesktop.org');
select * from update_instance_name('gitlab', 'riseup', '0xacab.org');
select * from update_instance_name('gitlab', 'framagit', 'framagit.org');
select * from update_instance_name('gitlab', 'debian', 'salsa.debian.org');
select * from update_instance_name('gitlab', 'common-lisp', 'gitlab.common-lisp.net');
select * from update_instance_name('gitlab', 'ow2', 'gitlab.ow2.org');
select * from update_instance_name('gitlab', 'inria', 'gitlab.inria.fr');
select * from update_instance_name('gitlab', 'gnome', 'gitlab.gnome.org');
select * from update_instance_name('gitlab', 'gite.lirmm', 'gite.lirmm.fr');

drop function update_instance_name(lister_type text, old_name text, new_name text);
commit;

I shall do the lister instances that need some credentials next.

It seems the remaining lister instances to process are the phabricator ones that also need credentials.
This is what we currently have in the listers table in scheduler database.

phabricator   | wikimedia
phabricator   | llvm
phabricator   | kde
phabricator   | swh
phabricator   | blender

Replacing instance names with network locations will give us:

phabricator   | phabricator.wikimedia.org
phabricator   | reviews.llvm.org
phabricator   | phabricator.kde.org
phabricator   | forge.softwareheritage.org
phabricator   | developer.blender.org

I am currently modifying swh-lister code to make the instance parameter optional in base pattern
lister and use the url network location if not provided.

I've duplicated the credentials for the relevant forges, and updated the following instance names:

select * from update_instance_name('gitlab', 'gitlab', 'gitlab.com');

select * from update_instance_name('phabricator', 'wikimedia', 'phabricator.wikimedia.org');
select * from update_instance_name('phabricator', 'llvm', 'reviews.llvm.org');
select * from update_instance_name('phabricator', 'kde', 'phabricator.kde.org');
select * from update_instance_name('phabricator', 'softwareheritage', 'forge.softwareheritage.org');
select * from update_instance_name('phabricator', 'blender', 'developer.blender.org');

Now, we should update the instance names for all listers for which there's a single, hardcoded instance (npm, pypi, github, ...). That needs to happen in the swh.lister code.

This task has been complete since a while now, closing it.