Page MenuHomeSoftware Heritage

mercurial.loader: Make it run within docker
ClosedPublic

Authored by ardumont on Jun 10 2020, 4:32 PM.

Details

Summary

[1] Without this diff, mercurial loader run within docker fails with multiple
errors (in order, one error appears after another):

  • TypeError: can not serialize 'map' object
  • TypeError: can not serialize 'set' object

So this diff fixes those:

  • map is not ok when calling storage.content_missing
  • set are not ok when calling storage.{revision|release}_missing

No idea why the tests do not capture any of those issues though.
I'm just unstucking this so people can run it within docker.

[1] The initial problem was along those lines (exactly like D3258#79482):

- swh.core.api.RemoteException: <RemoteException 500 AttributeError: ["'dict' object has no attribute 'url'"]>

where the self.origin being written to storage was a dict instead of an Origin
model object [1].

That error is now gone with the current loader-core at least v0.2.0.

Test Plan

tox + run on docker:

docker-compose.override.yml:

version: '2'

services:
  swh-loader:
    volumes:
      # - "$SWH_ENVIRONMENT_HOME/swh-loader-core:/src/swh-loader-core"
      - "$SWH_ENVIRONMENT_HOME/swh-loader-mercurial:/src/swh-loader-mercurial"
$ doco up
$ doco exec swh-loader run mercurial https://www.mercurial-scm.org/repo/evolve/

Finally:

$ time doco exec swh-loader swh loader run mercurial https://www.mercurial-scm.org/repo/evolve/
WARNING:swh.core.cli:Could not load subcommand search: cannot import name 'get_journal_client' from 'swh.journal.cli' (/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/cli.py)
INFO:swh.core.config:Loading config file /loader.yml
WARNING:swh.loader.mercurial.Bundle20Loader:No matching revision for tag 5.6.1 (hg changeset: 70694b2621ba9d919bc38303f8901e84caf5da0f). Skipping
{'status': 'eventful'}
docker-compose exec swh-loader swh loader run mercurial   0.59s user 0.61s system 0% cpu 2:10.59 total
$  time doco exec swh-loader swh loader run mercurial https://www.mercurial-scm.org/repo/evolve/
WARNING:swh.core.cli:Could not load subcommand search: cannot import name 'get_journal_client' from 'swh.journal.cli' (/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/cli.py)
INFO:swh.core.config:Loading config file /loader.yml
WARNING:swh.loader.mercurial.Bundle20Loader:No matching revision for tag 5.6.1 (hg changeset: 70694b2621ba9d919bc38303f8901e84caf5da0f). Skipping
{'status': 'uneventful'}
docker-compose exec swh-loader swh loader run mercurial   0.59s user 0.53s system 2% cpu 40.954 total

Diff Detail

Repository
rDLDHG Mercurial loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3258 (id=11549)

Rebasing onto 03c34b9efd...

Current branch diff-target is up to date.
Changes applied before test
commit f20891013265ed64094e763c75cf2a4d3ff330cd
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 10 16:26:34 2020 +0200

    mercurial.loader: Add missing type annotation to respect base class

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/7/ for more details.

  • mercurial.loader: Use list comprehension over map
  • mercurial.loader: Wrap list when calling <object>_missing endpoints

Build is green

Patch application report for D3258 (id=11550)

Rebasing onto 03c34b9efd...

Current branch diff-target is up to date.
Changes applied before test
commit f1866671a417194e94a158583924985fedaae293
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 10 16:49:35 2020 +0200

    mercurial.loader: Wrap list when calling <object>_missing endpoints
    
    Prior to this commit, those calls were raising type error:
TypeError: can not serialize 'set' object
```

commit 1cbcc8ddb59ed8c6a37df78ad003a58a80f2cdc4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Wed Jun 10 16:37:09 2020 +0200

mercurial.loader: Use list comprehension over map

Prior to this commit, map was raising type error during serialization step

```
TypeError: can not serialize 'map' object
```

commit f20891013265ed64094e763c75cf2a4d3ff330cd
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Wed Jun 10 16:26:34 2020 +0200

mercurial.loader: Add missing type annotation to respect base class
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/8/ for more details.
ardumont retitled this revision from mercurial.loader: Add missing type annotation to respect base class to mercurial.loader: Make it run within docker.Jun 10 2020, 4:52 PM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)
anlambert added a subscriber: anlambert.

Looks good to me. I have just tested with docker and prior to this diff, the mercurial loader was failing with that error:

swh-loader_1                    | [2020-06-10 15:19:48,291: ERROR/ForkPoolWorker-1] Task swh.loader.mercurial.tasks.LoadMercurial[92e86f02-f56c-4cdd-8c59-580d9850b739] raised unexpected: RemoteException({'type': 'AttributeError', 'args': ["'dict' object has no attribute 'url'"], 'message': "'dict' object has no attribute 'url'", 'traceback': ['Traceback (most recent call last):\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request\n    rv = self.dispatch_request()\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request\n    return self.view_functions[rule.endpoint](**req.view_args)\n', '  File "<decorator-gen-110>", line 2, in origin_add_one\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 148, in _negotiate\n    return f.negotiator(*args, **kwargs)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 82, in __call__\n    result = self.func(*args, **kwargs)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 453, in _f\n    return obj_meth(**kw)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/metrics.py", line 24, in d\n    return f(*a, **kw)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/db/common.py", line 62, in _meth\n    return meth(self, *args, db=db, cur=cur, **kwargs)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/storage.py", line 1201, in origin_add_one\n    origin_row = list(db.origin_get_by_url([origin.url], cur))[0]\n', "AttributeError: 'dict' object has no attribute 'url'\n"]})
swh-loader_1                    | Traceback (most recent call last):
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
swh-loader_1                    |     R = retval = fun(*args, **kwargs)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 51, in __call__
swh-loader_1                    |     result = super().__call__(*args, **kwargs)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
swh-loader_1                    |     return self.run(*args, **kwargs)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/mercurial/tasks.py", line 22, in load_hg
swh-loader_1                    |     return loader.load()
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 293, in load
swh-loader_1                    |     self._store_origin_visit()
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 170, in _store_origin_visit
swh-loader_1                    |     self.storage.origin_add_one(self.origin)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_
swh-loader_1                    |     return self.post(meth._endpoint_path, post_data)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post
swh-loader_1                    |     return self._decode_response(response)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 352, in _decode_response
swh-loader_1                    |     self.raise_for_status(response)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 30, in raise_for_status
swh-loader_1                    |     super().raise_for_status(response)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 342, in raise_for_status
swh-loader_1                    |     raise exception from None
swh-loader_1                    | swh.core.api.RemoteException: <RemoteException 500 AttributeError: ["'dict' object has no attribute 'url'"]>

Applying arc patch D3258 in swh-loader-mercurial and using it through docker-compose.override.yml makes the issue goes away.

I think is is time to add a docker test for the mercurial loader as we only have one for the git loader currently.

This revision is now accepted and ready to land.Jun 10 2020, 5:39 PM

I think is is time to add a docker test for the mercurial loader as we only have one for the git loader currently.

I agree but my understanding is that it will be rewritten completely soon. So
might be not immediately ;)

I have fixed it so @azecar (irc) could work without having first to debunk this
;)