Page MenuHomeSoftware Heritage

Make origin_add() handle multiple occurences of an origin properly
ClosedPublic

Authored by douardda on Sep 17 2020, 2:04 PM.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3975 (id=14007)

Rebasing onto 37ce2c4205...

First, rewinding head to replay your work on top of it...
Applying: Make origin_add() handle multiple occurences of an origin properly
Changes applied before test
commit 40c0f1fce959c9704a50bbc7c17d3506487784d7
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 17 13:44:42 2020 +0200

    Make origin_add() handle multiple occurences of an origin properly

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/946/ for more details.

olasd added a subscriber: olasd.

It took me a bit to understand what the sorteds were achieving (dedup the list while keeping it in the same order), so I wouldn't mind if you turned them into an explicit loop, or at least if you added a comment.

I also think the current algorithm is O(n^2) but n is usually tiny so *shrug*.

This revision is now accepted and ready to land.Sep 17 2020, 2:15 PM
In D3975#98089, @olasd wrote:

It took me a bit to understand what the sorteds were achieving (dedup the list while keeping it in the same order), so I wouldn't mind if you turned them into an explicit loop, or at least if you added a comment.

sure

I also think the current algorithm is O(n^2) but n is usually tiny so *shrug*.

yep I did make this assumption

Build is green

Patch application report for D3975 (id=14013)

Rebasing onto 37ce2c4205...

Current branch diff-target is up to date.
Changes applied before test
commit 5b3bce171479b9a3ae7828ac25d2b022c8c1be00
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 17 13:44:42 2020 +0200

    Make origin_add() handle multiple occurences of an origin properly

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/947/ for more details.

ardumont added inline comments.
swh/storage/cassandra/storage.py
846

keeping

swh/storage/postgresql/storage.py
1211–1213

keeping

Why do we need this?

because of

swh-mirror_storage.1.yz0sp9pdp7in@libra    | Traceback (most recent call last):
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/flask/app.py", line 1813, in full_dispatch_request
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     rv = self.dispatch_request()
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/flask/app.py", line 1799, in dispatch_request
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     return self.view_functions[rule.endpoint](**req.view_args)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "<decorator-gen-102>", line 2, in origin_add
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/core/api/negotiation.py", line 148, in _negotiate
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     return f.negotiator(*args, **kwargs)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/core/api/negotiation.py", line 82, in __call__
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     result = self.func(*args, **kwargs)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 457, in _f
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     return obj_meth(**kw)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/storage/metrics.py", line 24, in d
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     return f(*a, **kw)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/storage/metrics.py", line 77, in d
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     r = f(*a, **kw)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     return meth(self, *args, db=db, cur=cur, **kwargs)
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/storage/postgresql/storage.py", line 1217, in origin_add
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     if db.origin_add(url, cur):
swh-mirror_storage.1.yz0sp9pdp7in@libra    |   File "/usr/lib/python3/dist-packages/swh/storage/postgresql/db.py", line 909, in origin_add
swh-mirror_storage.1.yz0sp9pdp7in@libra    |     cur.execute(insert, (url,))
swh-mirror_storage.1.yz0sp9pdp7in@libra    | psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "origin_url_idx"
swh-mirror_storage.1.yz0sp9pdp7in@libra    | DETAIL:  Key (url)=(https://codeberg.org/softwerke-magdeburg/website.git) already exists.

typos (thx ardumont) + better ci message (thx vlorentz)

Build is green

Patch application report for D3975 (id=14016)

Rebasing onto 37ce2c4205...

Current branch diff-target is up to date.
Changes applied before test
commit 469c38c019439ddbe1a0d99a53b9a103f9873cb2
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 17 13:44:42 2020 +0200

    Make origin_add() handle multiple occurences of an origin properly
    
    this is needed to prevent some traceback in case an origin is present
    several times in the same origin_add batch, which situation has been
    seen in some mirror tests.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/948/ for more details.