Page MenuHomeSoftware Heritage

Deploy swh.loader.mercurial 2.1 in staging
Closed, ResolvedPublic

Description

Deploy swh.loader.mercurial 2.1 in staging, and smoke test it with remote and local (archived from BitBucket) repositories

Event Timeline

So that's been done on Friday, and things seem to work in general, but there is a bunch of issues:

  • loading the same origin again and again results in the snapshot flapping between two values; this has been noticed while loading the following repository: refugees/data/76/76f9bde3-96fb-40df-84e4-2a74536b5dec/main-repo.mercurial https://bitbucket.org/pbui/weaver. Archive contents: P1049
  • loading repositories archived with the old loader makes the new loader break with duplicate nodeids in the extid table. That's... Pretty bad. Noticed this issue with https://foss.heptapod.net/mercurial/hgview, which is our "save code now" test repository for mercurial. Loader output: P1050

Reproduction for the duplicate nodeids in the extid table:

loader.yml

---
storage:
  cls: pipeline
  steps:
    - cls: buffer
      min_batch_size:
        content: 1000
        content_bytes: 52428800  # 50 MB
        directory: 1000
        revision: 1000
        release: 1000
    - cls: filter
    - cls: postgresql
      db: ""
      objstorage:
        cls: memory

Initialize database:

dropdb swh-test-hg; swh db create -d swh-test-hg storage;  swh db init -d swh-test-hg storage

Load hgview with legacy loader:

PGDATABASE=swh-test-hg SWH_CONFIG_FILENAME=loader.yml swh --log-level=DEBUG loader run mercurial https://foss.heptapod.net/mercurial/hgview

Then load hvgiew with the new loader:

PGDATABASE=swh-test-hg SWH_CONFIG_FILENAME=~/work/swh-environment/scratch/loader.db.yml swh --log-level=DEBUG loader run mercurial_from_disk https://foss.heptapod.net/mercurial/hgview

The first *two* loads with the new loader complete successfully, and the third load fails with:

WARNING:swh.core.cli:Could not load subcommand scanner: No package metadata was found for swh-scanner
DEBUG:swh.loader.cli:ctx: <click.core.Context object at 0x7f5c7e87db50>
DEBUG:swh.core.config:Loading config file /home/nicolasd/work/swh-environment/scratch/loader.db.yml
DEBUG:swh.loader.cli:config_file: /home/nicolasd/work/swh-environment/scratch/loader.db.yml
DEBUG:swh.loader.cli:config: 
DEBUG:swh.loader.cli:kw: {}
DEBUG:swh.loader.cli:registry: {'task_modules': ['swh.loader.mercurial.tasks_from_disk'], 'loader': <class 'swh.loader.mercurial.from_disk.HgLoaderFromDisk'>}
DEBUG:swh.loader.cli:loader class: <class 'swh.loader.mercurial.from_disk.HgLoaderFromDisk'>
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
INFO:swh.loader.mercurial.LoaderFromDisk:Load origin 'https://foss.heptapod.net/mercurial/hgview' with type 'hg'
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.loader.mercurial.LoaderFromDisk:Cloning https://foss.heptapod.net/mercurial/hgview to None with timeout 7200 seconds
requesting all changes
adding changesets
adding manifests
adding file changes
added 1245 changesets with 2634 changes to 312 files
new changesets 7bf307e75523:6cd8ebdbc158
INFO:swh.loader.mercurial.LoaderFromDisk:New revisions found: 5
ERROR:swh.loader.mercurial.LoaderFromDisk:Loading failure, updating to `failed` status
Traceback (most recent call last):
  File "/home/nicolasd/work/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 339, in load
    self.store_data()
  File "/home/nicolasd/work/swh-environment/swh-loader-mercurial/swh/loader/mercurial/from_disk.py", line 332, in store_data
    self.store_revision(repo[rev])
  File "/home/nicolasd/work/swh-environment/swh-loader-mercurial/swh/loader/mercurial/from_disk.py", line 485, in store_revision
    parents=self.get_revision_parents(rev_ctx),
  File "/home/nicolasd/work/swh-environment/swh-loader-mercurial/swh/loader/mercurial/from_disk.py", line 432, in get_revision_parents
    revision_id = self.get_revision_id_from_hg_nodeid(parent_hg_nodeid)
  File "/home/nicolasd/work/swh-environment/swh-loader-mercurial/swh/loader/mercurial/from_disk.py", line 414, in get_revision_id_from_hg_nodeid
    assert len(from_storage) == 1, msg % (hg_nodeid, len(from_storage))
AssertionError: Expected 1 match from storage for hg node b"\t\x91j\xa0L\xf5\xc8\xb4?y\xa6\xe3\xa8'\x14\xa6\x9c@?\x14", got 2
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.
DEBUG:swh.loader.mercurial.LoaderFromDisk:Cleanup up repository /tmp/swh.loader.mercurial.from_diskedakj9pt-177021
{'status': 'failed'}

D5793 should fix the remaining issue. We had a discussion with @marmoute about whether considering closed branches (done in D5790) is *actually* a good idea in terms of presentation, but D5793 fixes the underlying issue, so we'll see about this issue next week.

olasd renamed this task from Deploy swh.loader.mercurial 1.0 in staging to Deploy swh.loader.mercurial 1.1 in staging.May 28 2021, 4:43 PM
olasd changed the task status from Open to Work in Progress.
olasd updated the task description. (Show Details)

After packaging swh.loader.mercurial 1.1 with @Alphare 's changes, all seems well on the staging environment (at least the inconsistencies I had noticed are not there anymore).

I've now mounted the bitbucket archive on a staging worker, and I'll be running a larger scale test over more archived bitbucket repositories over the weekend, to see if we're looking good to deploy this further.

base_dir=/srv/storage/space/mirrors/boatbucket
tail -n +10000 $base_dir/mapping-to-repos.txt | head -10000 | while read dir url; do
    repo_dir="$base_dir/$dir"
    visit_date=`stat -c %z $repo_dir/.hg/blackbox.log | sed -E 's/ \+0000/+0000/'`
    SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_mercurial.yml swh --log-level=DEBUG loader run mercurial_from_disk $url directory=$repo_dir visit_date="\"$visit_date\""
done 2>&1 | tee -a bitbucket-archive.2.log

After the weekend, the loader ran a few thousand loading tasks (out of 235k total). Out of those, only 2 failed for already known concurrency reasons. We should be good to go to production on this loader.

Great news! Let me know if I can help in any way.

olasd renamed this task from Deploy swh.loader.mercurial 1.1 in staging to Deploy swh.loader.mercurial 2.1 in staging.Jun 21 2021, 12:05 PM
olasd closed this task as Resolved.
olasd updated the task description. (Show Details)

Now that the branch structure has landed, I've deployed this latest version. After some cleanup of the duplicate extids left over from an earlier deployment, everything seems to be fine and the loader is ready for production.

For history purpose readabillty, this must bev2.1 git-patched version (not a release per say).
A more recent version release which is a tag v2.1.0 [1] has been done built with the work
solving the extid version inconsistency issue @olasd started.

See T3447 T3448 and co.

[1] https://forge.softwareheritage.org/source/swh-loader-mercurial/history/master/;v2.1.0