The last origins are still ongoing. They are taking their time...
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 14 2021
Oct 11 2021
Oct 7 2021
I'm gonna attend to this soon.
A first run of bitbucket origins have been scheduled and mostly ingested now [1]
(remains only 13 large ones ongoing).
Oct 4 2021
Sep 29 2021
Deployed.
Sep 28 2021
Sep 27 2021
Sep 23 2021
Deployed v2.3.1 with that fix.
@Alphare any clues as to why the format here is not in sync? ^
Ok, found where the wrong format is found somehow, the branching_info.bookmarks is not in the right format.
The test helped, it's a mismatch format problem.
Uncomment the test, place the right pdb stanza in the code and behold:
I've patched the systemd swh-worker@loader_oneshot to actually lift --autoscale 10,20
Sep 22 2021
Sep 20 2021
I've patched the systemd swh-worker@loader_oneshot to actually lift --autoscale 10,20
from celery cli. It's actually holding fine. And that coupled with the filtering server
side makes for a huge bump in speed. The archive db does not seem to mind at all.
10:02:40 softwareheritage@belvedere:5432=> select now(), count(distinct url) from origin o inner join origin_visit ov on o.id=ov.origin where o.url like 'https://bitbucket.org/%' and ov.type='hg'; +------------------------------+--------+ | now | count | +------------------------------+--------+ | 2021-09-20 10:04:30.89072+00 | 280995 | +------------------------------+--------+ (1 row)
Deployed the loader mercurial v2.3 (with filtering server side).
As expected, less time is spent in the method fetching the new changesets.
Sep 17 2021
Now most of the time can be spent in reading the actual mapping extids -> hgnode-id [1] to filter on something we already see.
Which does not change much from actual visits which already ended up in snapshot.
However that changes a lot for visits on forks where we can bypass already done work on those forks.
Actually restarted the loader_oneshot which now makes usage of the latest v2.2.0 loader mercurial.
Another run in on a large repository (which cannot finish, the error is independent
though) [1]
Packaged within the v0.37 version.
Deployed both in staging and production.
Deployed on staging and everything looked good.
So deployed on production as well.
Sep 16 2021
First round of checks validates the new diff to make the current loader mercurial faster.
Sep 15 2021
Sep 14 2021
By the way, i forgot:
Sep 13 2021
Sep 9 2021
Sep 7 2021
New day, new datapoint [1]
Sep 1 2021
^ Temporarily disabled puppet agent and bumped the concurrency for that worker to 10 (around the time of the previous comment).
Aug 31 2021
Ongoing ingestion is rather slow [1]
Because like for git origins, we can't know in advance rather large repositories.
So sometimes, ingestion appears stuck because we are dealing with large repositories (more than 2 hours of loading [2]).
Aug 30 2021
It's currently ongoing on some large repositories (and some other large sourceforge svn repository).
Aug 27 2021
It's currently running (data point ongoing...).
So this ingestion got stopped or crashed, at some point.
Probably around the db outage from last week (which emptied the rabbitmq queue).
Aug 6 2021
All messages are queued in the oneshot:swh.loader.mercurial.tasks.LoadArchive queue [1]
Those are concurrently ingested by the worker17.
Aug 5 2021
Concurrent deployment is ongoing ^ so this should go faster now, datapoint for later.
(mozilla-central fork from previous comment still ongoing...)
That will also allow to have the systemd logs pushed to elasticsearch as well.
With the following change in the snippet code (check commit ^):
Aug 3 2021
Still progressing. It's not fast though since it's deployed rather simply. It's doing
one origin at a time which can end up seemingly stuck behind a big repository (currently
[1]):
Jul 30 2021
13:40:06 softwareheritage@belvedere:5432=> select now(), count(distinct url) from origin o inner join origin_visit ov on o.id=ov.origin where o.url like 'https://bitbucket.org/%' and ov.type='hg'; +-------------------------------+--------+ | now | count | +-------------------------------+--------+ | 2021-07-30 11:39:37.122152+00 | 253848 | +-------------------------------+--------+ (1 row)
(claiming i said ;)
(Claiming the task to find it back more easily through my activity view.)
Started in the same tmux session [2] as the sourceforge ingestion [1]
It would probably make sense to set up a new worker instance for this to avoid interfering with the regular loading.
Smoke test it with local bitbucket repositories
that's next.
Smoke test it with remote repositories.
Jul 29 2021
For history purpose readabillty, this must bev2.1 git-patched version (not a release per say).
A more recent version release which is a tag v2.1.0 [1] has been done built with the work
solving the extid version inconsistency issue @olasd started.
Latest mercurial loader v2.1 deployed [1] [2]
We should be able to continue with this now.
Deployed \o/
Closing.
It's working but the check does not pass green [1]. As far as i could tell, the unsuccessful
event [1] is seen as failure by the check.
At the end of it all though, the final production check end-to-end for mercurial origin should go green.