- D8333, D8336: Implement lister
- D8336: Implement loader (Bower package loader is not needed as origins url are Git repositories)
- Lister run in docker
- Loader run in docker
- D8333: Document lister
- D8336: Document loader (Bower package loader is not needed as origins url are Git repositories)
- T4555: Deploy on staging
- Call for public review
- Deploy on production
Description
Description
Revisions and Commits
Revisions and Commits
Status | Assigned | Task | ||
---|---|---|---|---|
Unknown Object (Maniphest Task) | ||||
Migrated | gitlab-migration | T4475 Ingest bower.io (Javascript package manager) | ||
Migrated | gitlab-migration | T4555 staging: Deploy bower lister |
Event Timeline
Comment Actions
Lister runs fine on Docker and takes 30sec to list 68864 origins
swh-lister_1 | [2022-08-30 06:31:51,945: INFO/ForkPoolWorker-1] Fetching URL https://registry.bower.io/packages with params {} swh-lister_1 | [2022-08-30 06:32:20,070: INFO/ForkPoolWorker-1] Task swh.lister.bower.tasks.BowerListerTask[20414db4-682b-4b77-b8be-cac95ec3cf83] succeeded in 28.162076047010487s: {'pages': 1, 'origins': 68864}
Comment Actions
I've launched 100 git tasks after running the lister on Docker.
Introspecting logs, not_found are 404 for repositories that do not exists anymore or No valid credentials provided.
Those with status 'failed' are mainly time out.
swh-scheduler=# select count(*) from origin_visit_stats where visit_type='git' and last_visit_status='successful'; -[ RECORD 1 ] count | 68 swh-scheduler=# select count(*) from origin_visit_stats where visit_type='git' and last_visit_status='failed'; -[ RECORD 1 ] count | 4 swh-scheduler=# select count(*) from origin_visit_stats where visit_type='git' and last_visit_status='not_found'; -[ RECORD 1 ] count | 28