Page MenuHomeSoftware Heritage

Deploy Bzr loader to the staging environment
Closed, MigratedEdits Locked

Description

Now that the incremental loader has been committed, it seems like a good time to test it
in the staging environment. While a more optimized version of the loader is still under
development, the current version should be robust enough to handle anything archivable.

Plan:

  • D7117: swh-loader-bzr: Add module swh.loader.bzr.tasks module declared
  • Reference new swh-loader-bzr project in sentry [1]
  • Make loader run in docker
  • Prepare the necessary debian metadata files to allow CI package build
  • Debian packages built
    • unstable
    • D7132, D7133: stable is ko because conflit of breezy version (we need breezy > 3.1, only 3.0 is debian packaged).
    • backports python3-breezy and friends so the python3-swh.loader.bzr is finally happy on stable.
  • D7112: Prepare puppet manifest to deploy the worker service
  • D7124: Make the loader run in docker reproducible for dev
  • Actually deploy service on staging workers
  • Register the new load-bzr task-type in the scheduler (requires the python3-swh.loader.bzr install on the scheduler node)
  • Restart swh-scheduler-scheduler-recurrent service
  • Make it consume bzr tasks (D7172: sourceforge ones, launchpad ones?). T3945 to track remaining issues.

[1] https://sentry.softwareheritage.org/organizations/swh/issues/?project=22#welcome

Event Timeline

Alphare created this object in space S1 Public.
Alphare added a parent task: T3610: Bazaar/Breezy loader.
ardumont triaged this task as Normal priority.Feb 7 2022, 3:34 PM

We got those bzr origins currently listed in staging (and prod) infra [1].
Do they sound good enough as dataset?

[1] P1278

ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
  • Make it run within docker

Well, that needs some work as there is nothing there yet.

But i have finally something ready to run

swh-doco exec swh-loader swh loader run | grep bzr
+ cd /home/tony/work/inria/repo/swh/swh-environment/docker
+ docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run
Usage: swh loader run [OPTIONS] [archive|bzr|cran|
Error: Missing argument '[archive|bzr|cran|debian|deposit|git|git_disk|maven|mercurial|nixguix|npm|opam|pypi|svn]'.  Choose from:
        bzr,

Which made me realize we need to create the debian package. I'll attend to it after
having making the docker part ok.

ardumont changed the task status from Open to Work in Progress.Feb 8 2022, 5:08 PM
ardumont moved this task from Weekly backlog to in-progress on the System administration board.

loader-bzr run in docker \o/:

$ swh-doco exec swh-loader swh loader run bzr https://launchpad.net/wayland-protocols
+ cd /home/tony/work/inria/repo/swh/swh-environment/docker
+ docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run bzr https://launchpad.net/wayland-protocols
INFO:swh.loader.bzr.Loader:Load origin 'https://launchpad.net/wayland-protocols' with type 'bzr'
INFO:brz:Created new control directory.
{'status': 'eventful'}

Repository configured correctly regarding the ci.
But the hook was missing on the repository.
Fixed [1].
Let's do the first release.

Hook post-receive-swh-modules successfully installed on /srv/phabricator/repos/240/:
lrwxrwxrwx 1 phabricator phabricator 39 Feb  9 09:19 post-receive -> ../../../hooks/post-receive-swh-modules

[1] https://docs.softwareheritage.org/sysadm/deployment/howto-debian-packaging.html#setting-up-the-repository-on-phabricator

Here we go, first release is up. [1]

Now on to the debian part.

[1] https://pypi.org/project/swh.loader.bzr/

  • Prepare the necessary debian metadata files to allow CI package build [1]

Build is fine on unstable [2] (python3-breezy ~3.2).
Less so in stable [3] (current loader wants to use some function not present yet in python3-breezy 3.0.0)

[1]

$ ../bin/debpkg-bootstrap-branches v0.1.0 bzr python3-swh.core.db.pytestplugin
$ swh-debian-build-unstable  # local wrapper to build locally the package a-la ci
...
$ swh-debian-build-sign  # sign stuff to trigger the build on ci
$ git push origin --follow-tags  # push and let the ci build stuff

[2] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDBZR/job/gbp-buildpackage/1/console

[3] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDBZR/job/gbp-buildpackage/2/console

D7130 has been committed, so a new version of the loader for faster ingestion will need to be released.

Yes, thanks for the heads up.

For information, I still need to debug the other debian package failure first anyway (buster) [1].

[1] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDBZR/job/gbp-buildpackage/5/console

Previous one fixed.
pypi-upload job failed for some reason.
Fixed.

And now, another error [1]

[1] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDBZR/job/gbp-buildpackage/7/console

After some more fight and backporting dependencies [1] [2]. The build is now happy on
stable as well [3].

[1]

$ swhdebianrepo@pergamon:/srv/softwareheritage/repository$ for p in python3-fastbencode python3-patiencediff python3-breezy; do reprepro ls $p; done
python3-fastbencode | 0.0.5-1~swh1 | buster-swh | amd64
python3-patiencediff | 0.2.1-1~swh1 | buster-swh | amd64
python3-breezy | 3.2.1+bzr7585-1~swh1 | buster-swh | amd64

[1] I spent some time trying to figure out stuff (multiple paths taken). Only 1 path
worked, rebuilding locally the package and lessen the python3 deps to the stable one
(3.7). Then uploaded the deps to our reprepro instance. That finally make the
swh-loader-bzr build ok on stable. took the easy way and just rebuild manually and
uploaded the deb package

[3] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDBZR/job/gbp-buildpackage/10/

Of course, the build was ok, but not the actual install [1]
Another build release ongoing to fix that... [2]

And it's now ok.
Landing D7112 now.

[1] P1287

[2] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDBZR/job/gbp-buildpackage/

[3]

 vagrant node:
root@worker0:~# apt install python3-swh.loader.bzr
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  python3-swh.loader.bzr
0 upgraded, 1 newly installed, 0 to remove and 10 not upgraded.
Need to get 52.9 kB of archives.
After this operation, 123 kB of additional disk space will be used.
Get:1 https://debian.softwareheritage.org buster-swh/main amd64 python3-swh.loader.bzr all 1.1.0-1~swh3~bpo10+1 [52.9 kB]
Fetched 52.9 kB in 0s (881 kB/s)
(Reading database ... 85784 files and directories currently installed.)
Preparing to unpack .../python3-swh.loader.bzr_1.1.0-1~swh3~bpo10+1_all.deb ...
Unpacking python3-swh.loader.bzr (1.1.0-1~swh3~bpo10+1) ...
Setting up python3-swh.loader.bzr (1.1.0-1~swh3~bpo10+1) ...
root@worker0:~# dpkg -l python3-swh.loader.bzr | grep ii
ii  python3-swh.loader.bzr 1.1.0-1~swh3~bpo10+1 all          Software Heritage Bazaar/Breezy intent

And deployed.
Next step: Actually feeding them some bzr origins.
We'll need to discuss some more with @Alphare.

root@pergamon:/srv/softwareheritage/repository# clush -b -w @staging-loader-workers "systemctl status swh-worker@loader_bzr" | grep Active
     Active: active (running) since Fri 2022-02-11 16:55:54 UTC; 13s ago
     Active: active (running) since Fri 2022-02-11 16:55:54 UTC; 14s ago
     Active: active (running) since Fri 2022-02-11 16:55:54 UTC; 14s ago
ardumont updated the task description. (Show Details)

@Alphare fixed the sourceforge lister to list correctly bzr origins.
Tagged a v2.6.4 with that fix in the intent to deploy this on staging.

New lister did its work, 74 new origins got seen [1] [2].

Scheduler did not schedule them though.
Ah, I missed the new task tyep registering in the scheduler db.
It's done now.

And now the bzr origins are scheduled properly [3].
The old and invalid ones will fail.
Hopefully some of the new ones will work.

[1] P1289

[2]

10:04:28 swh-scheduler@db1:5432=> select now(), count(*) from listed_origins where visit_type='bzr';
select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
|              now              | count |
+-------------------------------+-------+
| 2022-02-15 09:05:26.474038+00 |   364 |
+-------------------------------+-------+
(1 row)

Time: 705.564 ms

[3]

Feb 15 09:11:07 scheduler0 swh[1373001]: INFO:swh.scheduler.celery_backend.recurrent_visits:bzr: 364 visits scheduled in queue swh.loader.bzr.tasks.LoadBazaar
ardumont claimed this task.
ardumont updated the task description. (Show Details)
ardumont moved this task from deployed/landed/monitoring to done on the System administration board.