The launchpad and gitea listers are deployed on the staging environment since one week. there was no negative feedbacks on them so they can be deployed in production
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T1734 Create a Lister for launchpad.net | ||
Migrated | gitlab-migration | T2313 Archive git.fsfe.org (Gitea) | ||
Migrated | gitlab-migration | T2608 Deploy launchpad and gitea listers on production |
Event Timeline
Actions :
- deploy the new version of the lister on each worker
- update the lister data model
- create the new task-type on the scheduler
- manually launch a listing to create high priority loading tasks for launchpad and gitea repository to ingest soon the repositories and not at the end of the current git queue
- truncate lister cache to allow the recurring loading tasks to be created
- schedule the recurring listing tasks for both repositories
- make a first listing with 'high' priority 'oneshot' output tasks [1]
swhworker@worker01 $ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister launchpad --priority high ...
- Add scheduler tasks about forges to list (will output standard recurring with no priority tasks)
swhscheduler@saatchi $ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh scheduler
[1] so they get ingested soon (and not when the current full load-git queue is empty <- never happens)
New version of the lister package deployed :
- on workers
root@pergamon:~# clush -b -w @swh-workers 'apt-get update; apt install -y python3-swh.lister' ... root@pergamon:~# clush -b -w @swh-workers "dpkg -l python3-swh.lister" --------------- worker[01-16] (16) --------------- Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================-====================-============-================================================================= ii python3-swh.lister 0.1.4-1~swh1~bpo10+1 all Software Heritage Listers (bitbucket, git(lab|hub), pypi, etc...)
- on the scheduler :
root@saatchi:~# apt update && apt install python3-swh.lister ... Restarting services... systemctl restart gunicorn-swh-scheduler.service icinga2.service journalbeat.service postfix@-.service rabbitmq-server.service rpcbind.service ssh.service swh-scheduler-runner.service unbound.service
- lister model updated from worker01:
swhworker@worker01:/etc/softwareheritage$ swh lister --db-url postgresql://*****@db.internal.softwareheritage.org:5432/swh-lister db-init INFO:swh.lister.cli:Loading lister bitbucket INFO:swh.lister.cli:Loading lister cgit INFO:swh.lister.cli:Loading lister cran INFO:swh.lister.cli:Loading lister debian INFO:swh.lister.cli:Loading lister gitea INFO:swh.lister.cli:Loading lister github INFO:swh.lister.cli:Loading lister gitlab INFO:swh.lister.cli:Loading lister gnu INFO:swh.lister.cli:Loading lister launchpad INFO:swh.lister.cli:Loading lister npm INFO:swh.lister.cli:Loading lister packagist INFO:swh.lister.cli:Loading lister phabricator INFO:swh.lister.cli:Loading lister pypi INFO:swh.lister.cli:Initializing database INFO:swh.lister.core.models:Creating tables INFO:swh.lister.cli:Calling init hook for debian
- user guest granted to access the new tables :
swh-lister=> grant select swh-lister-> on all tables in schema public swh-lister-> to guest; GRANT
- schedult tak-types created:
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.gitea INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea INFO:swh.scheduler.cli.task_type:Create task type list-gitea-full in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-gitea-incremental in scheduler swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.launchpad INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.launchpad INFO:swh.scheduler.cli.task_type:Create task type list-launchpad-full in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-launchpad-incremental in scheduler INFO:swh.scheduler.cli.task_type:Create task type list-launchpad-new in scheduler
- initial manual launchpad listing launched :
swhworker@worker02:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister launchpad --priority high INFO:swh.core.config:Loading config file /etc/softwareheritage/lister.yml INFO:swh.core.config:Loading config file /etc/softwareheritage/global.ini INFO:swh.core.config:Loading config file /etc/softwareheritage/lister.yml
- initial gitea lister launched :
swhworker@worker02:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister gitea --priority high url=https://codeberg.org/api/v1/ limit=100 ... INFO:root:listing repos starting at 1198 INFO:root:listing repos starting at 1199 INFO:root:listing repos starting at 1200 INFO:root:stopping after page 1200, no next link found
Results :
swh-lister=> select count(*) from gitea_repo limit 10; count ------- 3598 (1 row) swh-lister=> select count(*) from launchpad_repo limit 10; count ------- 19602 (1 row)
- Task types registered on the scheduler :
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.launchpad INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.launchpad swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task-type register -p lister.gitea INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea
softwareheritage-scheduler=> select * from task_type where type like 'list-launchpad%' or type like 'list-gitea%'; type | description | backend_name | default_interval | min_interval | max_interval | backoff_factor | max_queue_length | num_retries | retry_delay ----------------------------+----------------------------------------+-------------------------------------------------------+------------------+--------------+--------------+----------------+------------------+-------------+------------- list-gitea-full | Full update of a Gitea instance | swh.lister.gitea.tasks.FullGiteaRelister | 90 days | 90 days | 90 days | 1 | | | list-gitea-incremental | Incremental update of a Gitea instance | swh.lister.gitea.tasks.IncrementalGiteaLister | 1 day | 1 day | 1 day | 1 | | | list-launchpad-full | Full update of Launchpad | swh.lister.launchpad.tasks.FullLaunchpadLister | 90 days | 90 days | 90 days | 1 | | | list-launchpad-incremental | Incremental update | swh.lister.launchpad.tasks.IncrementalLaunchpadLister | 1 day | 1 day | 1 day | 1 | | | list-launchpad-new | Update new entries of Launchpad | swh.lister.launchpad.tasks.NewLaunchpadLister | 1 day | 1 day | 1 day | 1 | | | (5 rows)
- lister's cache truncated :
swh-lister=> truncate gitea_repo; TRUNCATE TABLE swh-lister=> truncate launchpad_repo; TRUNCATE TABLE
- recurring task for full listing created :
- gitea
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add list-gitea-full url=https://codeberg.org/api/v1/ limit=100 INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks Task 337306005 Next run: in 3 months (2020-12-16 12:43:30+00:00) Interval: 90 days, 0:00:00 Type: list-gitea-full Policy: recurring Args: Keyword args: limit: 100 url: 'https://codeberg.org/api/v1/'
- launchpad:
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add list-launchpad-full INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks Task 337306006 Next run: just now (2020-09-17 12:46:37+00:00) Interval: 90 days, 0:00:00 Type: list-launchpad-full Policy: recurring Args: Keyword args:
- recurring task for increment listing creating :
- gitea
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add list-gitea-incremental url=https://codeberg.org/api/v1/ limit=100 INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks Task 337315168 Next run: just now (2020-09-17 12:51:44+00:00) Interval: 1 day, 0:00:00 Type: list-gitea-incremental Policy: recurring Args: Keyword args: limit: 100 url: 'https://codeberg.org/api/v1/'
- launchpad
swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler.yml task add list-launchpad-incremental INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml Created 1 tasks Task 337314502 Next run: just now (2020-09-17 12:51:21+00:00) Interval: 1 day, 0:00:00 Type: list-launchpad-incremental Policy: recurring Args: Keyword args:
Current ingestion ongoing from the launchpad instance:
softwareheritage=> select count(*) from origin where url like 'https://git.launchpad.net%'; count ------- 3753 (1 row)
We can see some with [1]
codeberg gitea instance should be next to this one:
softwareheritage=> select count(*) from origin where url like 'https://codeberg.org%'; count ------- 17 (1 row)
Progress status:
- launchpad: 14022 / 19605
- codeberg: 17 / 3599
As we triggered the listing first for lauchpad (oneshot tasks with high
priority) then for the gitea instance codeberg (tasks oneshot with same policy
high), the codeberg repositories will be ingested when the launchpad ones are
done.
(TIL) suggestion for later: Run the lister command with different --priority
(possible values are high, normal, low) so both forge can be ingested more in
parallel.
Progress status:
|-----------+-----------------+--------+------------+-----------------------+-----------------------------------------------| | forge | origin ingested | out of | difference | table (swh-lister db) | Note | |-----------+-----------------+--------+------------+-----------------------+-----------------------------------------------| | launchpad | 19568 | 19605 | -37 | launchpad_repo [1] | probably failure in repositories | | | | | | | (T2373 possible or 401 ¯\_(ツ)_/¯) | |-----------+-----------------+--------+------------+-----------------------+-----------------------------------------------| | codeberg | 3612 | 3599 | +13 | gitea_repo [2] | more origins because existing | | | | | | | origins prior to run (probably save code now) | |-----------+-----------------+--------+------------+-----------------------+-----------------------------------------------|
[1]
softwareheritage=> select count(*) from origin where url like 'https://git.launchpad.net%'; count ------- 19568
[2]
softwareheritage=> select count(*) from origin where url like 'https://codeberg.org%'; count ------- 3612
T2616 for the analysis part.
The gist of this task is done.
Remaining part must be common enough with our loader stack.