I've patched the systemd swh-worker@loader_oneshot to actually lift --autoscale 10,20
from celery cli. It's actually holding fine. And that coupled with the filtering server
side makes for a huge bump in speed. The archive db does not seem to mind at all.

Sep 20 2021, 6:13 PM · System administration, Mercurial loader

ardumont moved T3563: Analyze and make the bitbucket ingestion faster from in-progress to code-review/await-feedback/pause on the System administration board.

Sep 20 2021, 12:11 PM · System administration, Mercurial loader

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

10:02:40 softwareheritage@belvedere:5432=> select now(), count(distinct url) from origin o inner join origin_visit ov on o.id=ov.origin where o.url like 'https://bitbucket.org/%' and ov.type='hg';
+------------------------------+--------+
|             now              | count  |
+------------------------------+--------+
| 2021-09-20 10:04:30.89072+00 | 280995 |
+------------------------------+--------+
(1 row)

Sep 20 2021, 12:11 PM · System administration, Mercurial loader

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

Deployed the loader mercurial v2.3 (with filtering server side).
As expected, less time is spent in the method fetching the new changesets.

Sep 20 2021, 12:08 PM · System administration, Mercurial loader

Sep 17 2021

ardumont added a revision to T3584: loader mercurial edge case about missing mapping from revision to hgnode-id: D6300: Capture missing revision <-> hgnode-id scenario in a xfail test.

Sep 17 2021, 5:24 PM · Mercurial loader

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

Now most of the time can be spent in reading the actual mapping extids -> hgnode-id [1] to filter on something we already see.
Which does not change much from actual visits which already ended up in snapshot.
However that changes a lot for visits on forks where we can bypass already done work on those forks.

Sep 17 2021, 1:56 PM · System administration, Mercurial loader

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

Actually restarted the loader_oneshot which now makes usage of the latest v2.2.0 loader mercurial.

Sep 17 2021, 12:29 PM · System administration, Mercurial loader

ardumont added a project to T3584: loader mercurial edge case about missing mapping from revision to hgnode-id: Mercurial loader.

Sep 17 2021, 12:18 PM · Mercurial loader

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

Another run in on a large repository (which cannot finish, the error is independent
though) [1]

Sep 17 2021, 11:13 AM · System administration, Mercurial loader

ardumont closed T3567: storage: Allow extid reading with filter on extid version, a subtask of T3563: Analyze and make the bitbucket ingestion faster, as Resolved.

Sep 17 2021, 10:48 AM · System administration, Mercurial loader

ardumont closed T3567: storage: Allow extid reading with filter on extid version as Resolved.

Packaged within the v0.37 version.
Deployed both in staging and production.

Sep 17 2021, 10:48 AM · System administration, Mercurial loader

ardumont closed T3571: mercurial loader: Fix snapshot creation as Resolved.

Sep 17 2021, 10:45 AM · Mercurial loader

ardumont added a comment to T3571: mercurial loader: Fix snapshot creation.

Deployed on staging and everything looked good.
So deployed on production as well.

Sep 17 2021, 10:45 AM · Mercurial loader

Sep 16 2021

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

First round of checks validates the new diff to make the current loader mercurial faster.

Sep 16 2021, 5:15 PM · System administration, Mercurial loader

ardumont added a revision to T3567: storage: Allow extid reading with filter on extid version: D6275: Adapt extid filtering so it happens server side.

Sep 16 2021, 8:44 AM · System administration, Mercurial loader

ardumont updated the task description for T3567: storage: Allow extid reading with filter on extid version.

Sep 16 2021, 8:43 AM · System administration, Mercurial loader

Sep 15 2021

ardumont added a revision to T3571: mercurial loader: Fix snapshot creation: D6268: mercurial: Build snapshot on visits.

Sep 15 2021, 2:55 PM · Mercurial loader

ardumont added a project to T3572: mercurial loader: Refactor / clean up old implementations and rename appropriately the official one: Mercurial loader.

Sep 15 2021, 10:54 AM · Mercurial loader

ardumont added a project to T3571: mercurial loader: Fix snapshot creation: Mercurial loader.

Sep 15 2021, 10:53 AM · Mercurial loader

Sep 14 2021

ardumont added a comment to T3563: Analyze and make the bitbucket ingestion faster.

By the way, i forgot:

Sep 14 2021, 5:18 PM · System administration, Mercurial loader

Sep 13 2021

ardumont added a revision to T3567: storage: Allow extid reading with filter on extid version: D6249: Allow filtering extids per extid_version/extid_type when reading.

Sep 13 2021, 5:15 PM · System administration, Mercurial loader

ardumont triaged T3567: storage: Allow extid reading with filter on extid version as Normal priority.

Sep 13 2021, 10:53 AM · System administration, Mercurial loader

Sep 9 2021

ardumont changed the status of T3563: Analyze and make the bitbucket ingestion faster, a subtask of T3338: Load the archived bitbucket mercurial repositories, from Open to Work in Progress.

Sep 9 2021, 12:32 PM · System administration, Mercurial loader

ardumont changed the status of T3563: Analyze and make the bitbucket ingestion faster from Open to Work in Progress.

Sep 9 2021, 12:32 PM · System administration, Mercurial loader

Sep 7 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

New day, new datapoint [1]

Sep 7 2021, 11:03 AM · System administration, Mercurial loader

ardumont updated the task description for T3563: Analyze and make the bitbucket ingestion faster.

Sep 7 2021, 10:36 AM · System administration, Mercurial loader

ardumont triaged T3563: Analyze and make the bitbucket ingestion faster as Normal priority.

Sep 7 2021, 10:33 AM · System administration, Mercurial loader

Sep 1 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

^ Temporarily disabled puppet agent and bumped the concurrency for that worker to 10 (around the time of the previous comment).

Sep 1 2021, 9:21 AM · System administration, Mercurial loader

Aug 31 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

Ongoing ingestion is rather slow [1]
Because like for git origins, we can't know in advance rather large repositories.
So sometimes, ingestion appears stuck because we are dealing with large repositories (more than 2 hours of loading [2]).

Aug 31 2021, 4:57 PM · System administration, Mercurial loader

Aug 30 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

It's currently ongoing on some large repositories (and some other large sourceforge svn repository).

Aug 30 2021, 3:12 PM · System administration, Mercurial loader

Aug 27 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

It's currently running (data point ongoing...).

Aug 27 2021, 5:22 PM · System administration, Mercurial loader

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

So this ingestion got stopped or crashed, at some point.
Probably around the db outage from last week (which emptied the rabbitmq queue).

Aug 27 2021, 5:21 PM · System administration, Mercurial loader

Aug 6 2021

ardumont closed T3455: Make bitbucket origins ingestion concurrent as Resolved.

Aug 6 2021, 9:23 AM · System administration, Mercurial loader

ardumont closed T3455: Make bitbucket origins ingestion concurrent, a subtask of T3338: Load the archived bitbucket mercurial repositories, as Resolved.

Aug 6 2021, 9:23 AM · System administration, Mercurial loader

ardumont updated the task description for T3455: Make bitbucket origins ingestion concurrent.

Aug 6 2021, 9:22 AM · System administration, Mercurial loader

ardumont added a comment to T3455: Make bitbucket origins ingestion concurrent.

All messages are queued in the oneshot:swh.loader.mercurial.tasks.LoadArchive queue [1]
Those are concurrently ingested by the worker17.

Aug 6 2021, 9:17 AM · System administration, Mercurial loader

Aug 5 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

Concurrent deployment is ongoing ^ so this should go faster now, datapoint for later.
(mozilla-central fork from previous comment still ongoing...)

Aug 5 2021, 5:14 PM · System administration, Mercurial loader

ardumont updated the task description for T3455: Make bitbucket origins ingestion concurrent.

Aug 5 2021, 12:47 PM · System administration, Mercurial loader

ardumont changed the status of T3455: Make bitbucket origins ingestion concurrent, a subtask of T3338: Load the archived bitbucket mercurial repositories, from Open to Work in Progress.

Aug 5 2021, 12:45 PM · System administration, Mercurial loader

ardumont changed the status of T3455: Make bitbucket origins ingestion concurrent from Open to Work in Progress.

Aug 5 2021, 12:45 PM · System administration, Mercurial loader

ardumont added a comment to T3455: Make bitbucket origins ingestion concurrent.

That will also allow to have the systemd logs pushed to elasticsearch as well.

Aug 5 2021, 12:44 PM · System administration, Mercurial loader

ardumont moved T3455: Make bitbucket origins ingestion concurrent from Backlog to code-review/await-feedback/pause on the System administration board.

Aug 5 2021, 12:33 PM · System administration, Mercurial loader

ardumont added a comment to T3455: Make bitbucket origins ingestion concurrent.

With the following change in the snippet code (check commit ^):

Aug 5 2021, 12:32 PM · System administration, Mercurial loader

Aug 3 2021

ardumont updated the task description for T3455: Make bitbucket origins ingestion concurrent.

Aug 3 2021, 10:20 AM · System administration, Mercurial loader

ardumont renamed T3455: Make bitbucket origins ingestion concurrent from Make bitbucket origins ingestion go faster to Make bitbucket origins ingestion concurrent.

Aug 3 2021, 10:07 AM · System administration, Mercurial loader

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

Still progressing. It's not fast though since it's deployed rather simply. It's doing
one origin at a time which can end up seemingly stuck behind a big repository (currently
[1]):

Aug 3 2021, 10:06 AM · System administration, Mercurial loader

ardumont renamed T3455: Make bitbucket origins ingestion concurrent from Schedule properly the bitbucket origins to Make bitbucket origins ingestion go faster.

Aug 3 2021, 10:05 AM · System administration, Mercurial loader

ardumont triaged T3455: Make bitbucket origins ingestion concurrent as High priority.

Aug 3 2021, 10:05 AM · System administration, Mercurial loader

Jul 30 2021

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

13:40:06 softwareheritage@belvedere:5432=> select now(), count(distinct url) from origin o inner join origin_visit ov on o.id=ov.origin where o.url like 'https://bitbucket.org/%' and ov.type='hg';
+-------------------------------+--------+
|              now              | count  |
+-------------------------------+--------+
| 2021-07-30 11:39:37.122152+00 | 253848 |
+-------------------------------+--------+
(1 row)

Jul 30 2021, 2:49 PM · System administration, Mercurial loader

ardumont claimed T3338: Load the archived bitbucket mercurial repositories.

(claiming i said ;)

Jul 30 2021, 2:42 PM · System administration, Mercurial loader

ardumont placed T3338: Load the archived bitbucket mercurial repositories up for grabs.

(Claiming the task to find it back more easily through my activity view.)

Jul 30 2021, 1:42 PM · System administration, Mercurial loader

ardumont moved T3338: Load the archived bitbucket mercurial repositories from in-progress to code-review/await-feedback/pause on the System administration board.

Jul 30 2021, 1:33 PM · System administration, Mercurial loader

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

Started in the same tmux session [2] as the sourceforge ingestion [1]

Jul 30 2021, 1:33 PM · System administration, Mercurial loader

ardumont changed the status of T3338: Load the archived bitbucket mercurial repositories from Open to Work in Progress.

Jul 30 2021, 1:04 PM · System administration, Mercurial loader

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

It would probably make sense to set up a new worker instance for this to avoid interfering with the regular loading.

Jul 30 2021, 1:04 PM · System administration, Mercurial loader

ardumont closed T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial, a subtask of T3338: Load the archived bitbucket mercurial repositories, as Resolved.

Jul 30 2021, 1:00 PM · System administration, Mercurial loader

ardumont closed T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial as Resolved.

Jul 30 2021, 1:00 PM · System administration, Mercurial loader

ardumont added a comment to T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial.

Smoke test it with local bitbucket repositories

that's next.

Jul 30 2021, 12:59 PM · System administration, Mercurial loader

ardumont claimed T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial.

Jul 30 2021, 12:22 PM · System administration, Mercurial loader

ardumont changed the status of T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial from Open to Work in Progress.

Jul 30 2021, 12:16 PM · System administration, Mercurial loader

ardumont changed the status of T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial, a subtask of T3338: Load the archived bitbucket mercurial repositories, from Open to Work in Progress.

Jul 30 2021, 12:16 PM · System administration, Mercurial loader

ardumont added a comment to T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial.

Smoke test it with remote repositories.

Jul 30 2021, 12:16 PM · System administration, Mercurial loader

Jul 29 2021

ardumont renamed T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial from Deploy swh.loader.mercurial 1.0 in production to Smoke test ingestion of bitbucket repositories with latest loader mercurial.

Jul 29 2021, 6:36 PM · System administration, Mercurial loader

ardumont added a comment to T3336: Deploy swh.loader.mercurial 2.1 in staging.

For history purpose readabillty, this must bev2.1 git-patched version (not a release per say).
A more recent version release which is a tag v2.1.0 [1] has been done built with the work
solving the extid version inconsistency issue @olasd started.

Jul 29 2021, 6:33 PM · System administration, Mercurial loader

ardumont added a subtask for T3338: Load the archived bitbucket mercurial repositories: T3418: Decide a consistent policy on having multiple archived objects for the same extid.

Jul 29 2021, 6:30 PM · System administration, Mercurial loader

ardumont added a parent task for T3418: Decide a consistent policy on having multiple archived objects for the same extid: T3338: Load the archived bitbucket mercurial repositories.

Jul 29 2021, 6:30 PM · Storage manager, Mercurial loader

ardumont moved T3338: Load the archived bitbucket mercurial repositories from Backlog to Weekly backlog on the System administration board.

Latest mercurial loader v2.1 deployed [1] [2]
We should be able to continue with this now.

Jul 29 2021, 6:29 PM · System administration, Mercurial loader

ardumont closed T3418: Decide a consistent policy on having multiple archived objects for the same extid as Resolved.

Jul 29 2021, 6:21 PM · Storage manager, Mercurial loader

ardumont moved T3448: production: Deploy swh.loader.mercurial v2.1.0 from deployed/landed/monitoring to done on the System administration board.

Jul 29 2021, 6:21 PM · System administration, Storage manager, Mercurial loader