Page MenuHomeSoftware Heritage

Ingest Aur repository (Arch User Repository)
Open, NormalPublic

Description

Event Timeline

AUR Lister runs in Docker report

Aur Lister runs fine in Docker, quite long (+/- 30 minutes) to list origins.

Found 78702 AUR packages in aur_index
Successfully removed /tmp/aur_archive directory
Task swh.lister.aur.tasks.AurListerTask[a7ed0b48-3d3b-4aad-b158-6d888ff9aab5] succeeded in 1619.0577569839952s: {'pages': 78702, 'origins': 78702}

swh-scheduler=# select count(*) from listed_origins where visit_type = 'aur';
 count 
-------
 78702

Aur Loader runs in Docker report

Aur Loader runs in Docker but I don't get why It loads origins after the lister has completed (I.e I've not run origin scheduled next aur qty)

For now it looks good and is quite fast because the packages it download are very small. It grabs +/- 25000 origins in an hour without errors:

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='successful';

count

27057
(1 row)

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='failed';

count

0

(1 row)

vlorentz triaged this task as Normal priority.Aug 26 2022, 5:36 PM

I've made a complete run on docker

Lister:

2022-08-30 10:31:30,328: INFO/ForkPoolWorker-1] Task swh.lister.aur.tasks.AurListerTask[a24d7a3d-81ea-4ef9-90e7-e9cad8a3ffec] succeeded in 946.656092988007s: {'pages': 78803, 'origins': 78803}

swh-scheduler=# select count(*) from listed_origins where visit_type='aur';
-[ RECORD 1 ]
count | 78803

Loader (It takes between 2 and 3 hours to complete loading everything):

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='successful';
-[ RECORD 1 ]
count | 78799

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='failed';
-[ RECORD 1 ]
count | 4

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='not_found';
-[ RECORD 1 ]
count | 0