there is no need for an url insertion in the origin table to result in a
unicity error. Conflicting insertion of the same URL in this table may
happen in case of concurrent process (loading or in a replayer session).
Details
- Reviewers
vlorentz ardumont - Group Reviewers
Reviewers - Commits
- rDSTO77ef651d9582: Make postgresql's origin_add not raise an error in case of conflict
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- origin_add
- Lint
Lint Skipped - Unit
Unit Tests Skipped - Build Status
Buildable 21274 Build 33036: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 33035: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D5671 (id=20257)
Rebasing onto 051b771523...
Current branch diff-target is up to date.
Changes applied before test
commit 94456f66618b03de818c00ef22b0c122b8140cd4 Author: David Douard <david.douard@sdfa3.org> Date: Tue May 4 16:06:02 2021 +0200 Make postgresql's origin_add not raise an error in case of conflict there is no need for an url insertion in the origin table to result in a unicity error. Conflicting insertion of the same URL in this table may happen in case of concurrent process (loading or in a replayer session).
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1302/ for more details.
swh/storage/postgresql/db.py | ||
---|---|---|
978 | ok for the main change but that one makes me wonder. Also, please add type to the impacted method. [1] that's somehow implicit here as the meaning in the returned value is not the same. |
swh/storage/postgresql/db.py | ||
---|---|---|
978 | The return value is used in postgresql.Storage.origin_add to return a counter of added origins: [...] added = 0 for url in to_add: if db.origin_add(url, cur): added += 1 return {"origin:add": added} |
swh/storage/postgresql/db.py | ||
---|---|---|
978 | right, i had forgotten that part. Thanks. So i gather cur.rowcount somehow returns something falsy [1] when the on conflict [1] this looks fine: In [11]: 0 if None else 1 Out[11]: 1 In [12]: 0 if False else 1 Out[12]: 1 In [13]: 0 if 0 else 1 Out[13]: 1 |
swh/storage/postgresql/db.py | ||
---|---|---|
978 | I was unsure of the behavior of cur.rowcount with the 'on conflict', so I checked (pg13 in a pifpaf session) and it seems to work as expected (aka only count the number of inserted rows): In [22]: c.execute("select * from origin") In [23]: c.fetchall() Out[23]: [(1, 'http://foo'), (2, 'http://bar'), (8, 'http://baz'), (11, 'http://biz')] In [31]: execute_values(c, "INSERT INTO origin (url) values %s ON CONFLICT DO NOTHING", (("http://bar",),("http://toto",), ("http://foo",), ("http://tutu",) )) In [32]: c.rowcount Out[32]: 2 |
Build is green
Patch application report for D5671 (id=20288)
Rebasing onto ffb38f71d9...
Current branch diff-target is up to date.
Changes applied before test
commit 77ef651d958285c3d91b10961f5a62a7612f9354 Author: David Douard <david.douard@sdfa3.org> Date: Tue May 4 16:06:02 2021 +0200 Make postgresql's origin_add not raise an error in case of conflict there is no need for an url insertion in the origin table to result in a unicity error. Conflicting insertion of the same URL in this table may happen in case of concurrent process (loading or in a replayer session).
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1306/ for more details.