as per parent task, but focusing on mercurial repositories (for which we don't have a loader yet)
Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T561 ingest bitbucket (meta task) | ||
Migrated | gitlab-migration | T593 ingest bitbucket hg/mercurial repositories | ||
Migrated | gitlab-migration | T329 hg / mercurial loader | ||
Migrated | gitlab-migration | T906 mercurial loader: Debian package | ||
Migrated | gitlab-migration | T907 mercurial loader: Align mercurial loader with other loaders | ||
Migrated | gitlab-migration | T908 mercurial loader: Define scheduler task(s) | ||
Migrated | gitlab-migration | T909 mercurial loader: Define puppet manifest for actual deployment | ||
Migrated | gitlab-migration | T964 2018-02-16 worker disk full postmortem | ||
Migrated | gitlab-migration | T982 failing worker consumes remaining tasks without processing them | ||
Migrated | gitlab-migration | T985 loader*: Make prepare method resilient to error and origin visit status compliant |
Event Timeline
Comment Actions
Given the recent announcement by bitbucket about dropping mercurial support, the priority of this task has just increased.
We do have a mercurial loader which we have already used, it's time to spin it on Bitbucket !
Comment Actions
The mercurial loader and the bitbucket lister have been running all summer.
- The bitbucket lister knows of 252402 mercurial origins.
- The mercurial loader has visited 251755 of these origins, with the following results:
latest_status | count ---------------+-------- | 251755 partial | 18004 full | 232641 ongoing | 1110
(there's clearly a bug somewhere in the loader as we don't have 1110 parallel workers ;))
I'd be tempted to consider that this task is done, and that a followup should be made to investigate and fix the failing repositories.
SQL queries for reference
Count bitbucket lister origins
(on the swh-lister database)
select origin_type, count(*) from bitbucket_repo group by origin_type;
Count latest visits for mercurial bitbucket origins
(on the softwareheritage database)
with origin_latest_visit as ( select origin.url, (select status from origin_visit where origin_visit.origin = origin.id and origin_visit.type = 'hg' order by date desc limit 1) as latest_status from origin where origin.url like 'https://bitbucket.org/%' ) select latest_status, count(*) from origin_latest_visit where latest_status is not null -- filter out non-mercurial origins group by rollup(latest_status); -- rollup adds a row with a null latest_status containing the sum of all rows
Comment Actions
I've rescheduled the tasks for the repositories that had not been loaded. we'll need to follow up separately on failing tasks.