as per parent task, but focusing on mercurial repositories (for which we don't have a loader yet)
|Resolved||None||T561 ingest bitbucket (meta task)|
|Resolved||fiendish||T593 ingest bitbucket hg/mercurial repositories|
|Resolved||ardumont||T329 hg / mercurial loader|
|Resolved||ardumont||T906 mercurial loader: Debian package|
|Resolved||ardumont||T907 mercurial loader: Align mercurial loader with other loaders|
|Resolved||ardumont||T908 mercurial loader: Define scheduler task(s)|
|Resolved||ardumont||T909 mercurial loader: Define puppet manifest for actual deployment|
|Resolved||ardumont||T964 2018-02-16 worker disk full postmortem|
|Resolved||ardumont||T982 failing worker consumes remaining tasks without processing them|
|Resolved||ardumont||T985 loader*: Make prepare method resilient to error and origin visit status compliant|
Given the recent announcement by bitbucket about dropping mercurial support, the priority of this task has just increased.
We do have a mercurial loader which we have already used, it's time to spin it on Bitbucket !
The mercurial loader and the bitbucket lister have been running all summer.
- The bitbucket lister knows of 252402 mercurial origins.
- The mercurial loader has visited 251755 of these origins, with the following results:
latest_status | count ---------------+-------- | 251755 partial | 18004 full | 232641 ongoing | 1110
(there's clearly a bug somewhere in the loader as we don't have 1110 parallel workers ;))
I'd be tempted to consider that this task is done, and that a followup should be made to investigate and fix the failing repositories.
SQL queries for reference
Count bitbucket lister origins
(on the swh-lister database)
select origin_type, count(*) from bitbucket_repo group by origin_type;
Count latest visits for mercurial bitbucket origins
(on the softwareheritage database)
with origin_latest_visit as ( select origin.url, (select status from origin_visit where origin_visit.origin = origin.id and origin_visit.type = 'hg' order by date desc limit 1) as latest_status from origin where origin.url like 'https://bitbucket.org/%' ) select latest_status, count(*) from origin_latest_visit where latest_status is not null -- filter out non-mercurial origins group by rollup(latest_status); -- rollup adds a row with a null latest_status containing the sum of all rows