Page MenuHomeSoftware Heritage

Ingest bower.io (Javascript package manager)
Closed, MigratedEdits Locked

Description

  • D8333, D8336: Implement lister
  • D8336: Implement loader (Bower package loader is not needed as origins url are Git repositories)
  • Lister run in docker
  • Loader run in docker
  • D8333: Document lister
  • D8336: Document loader (Bower package loader is not needed as origins url are Git repositories)
  • T4555: Deploy on staging
  • Call for public review
  • Deploy on production

Event Timeline

Lister runs fine on Docker and takes 30sec to list 68864 origins

swh-lister_1                        | [2022-08-30 06:31:51,945: INFO/ForkPoolWorker-1] Fetching URL https://registry.bower.io/packages with params {}
swh-lister_1                        | [2022-08-30 06:32:20,070: INFO/ForkPoolWorker-1] Task swh.lister.bower.tasks.BowerListerTask[20414db4-682b-4b77-b8be-cac95ec3cf83] succeeded in 28.162076047010487s: {'pages': 1, 'origins': 68864}

I've launched 100 git tasks after running the lister on Docker.
Introspecting logs, not_found are 404 for repositories that do not exists anymore or No valid credentials provided.
Those with status 'failed' are mainly time out.

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='git' and last_visit_status='successful';
-[ RECORD 1 ]
count | 68

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='git' and last_visit_status='failed';                                                              
-[ RECORD 1 ]
count | 4

swh-scheduler=# select count(*) from origin_visit_stats where visit_type='git' and last_visit_status='not_found';
-[ RECORD 1 ]
count | 28
ardumont triaged this task as Normal priority.Aug 30 2022, 11:10 AM
ardumont added a project: Archive coverage.
bchauvet added a parent task: Unknown Object (Maniphest Task).Sep 2 2022, 10:50 AM
ardumont updated the task description. (Show Details)
ardumont changed the status of subtask T4555: staging: Deploy bower lister from Open to Work in Progress.Sep 26 2022, 1:39 PM