there is no need for an url insertion in the origin table to result in a
unicity error. Conflicting insertion of the same URL in this table may
happen in case of concurrent process (loading or in a replayer session).
Details
- Reviewers
vlorentz ardumont - Group Reviewers
Reviewers - Commits
- rDSTO77ef651d9582: Make postgresql's origin_add not raise an error in case of conflict
Diff Detail
- Repository
- rDSTO Storage manager
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
Patch application report for D5671 (id=20257)
Rebasing onto 051b771523...
Current branch diff-target is up to date.
Changes applied before test
commit 94456f66618b03de818c00ef22b0c122b8140cd4
Author: David Douard <david.douard@sdfa3.org>
Date: Tue May 4 16:06:02 2021 +0200
Make postgresql's origin_add not raise an error in case of conflict
there is no need for an url insertion in the origin table to result in a
unicity error. Conflicting insertion of the same URL in this table may
happen in case of concurrent process (loading or in a replayer session).See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1302/ for more details.
| swh/storage/postgresql/db.py | ||
|---|---|---|
| 978 | ok for the main change but that one makes me wonder. Also, please add type to the impacted method. [1] that's somehow implicit here as the meaning in the returned value is not the same. | |
| swh/storage/postgresql/db.py | ||
|---|---|---|
| 978 | The return value is used in postgresql.Storage.origin_add to return a counter of added origins: [...]
added = 0
for url in to_add:
if db.origin_add(url, cur):
added += 1
return {"origin:add": added} | |
| swh/storage/postgresql/db.py | ||
|---|---|---|
| 978 | right, i had forgotten that part. Thanks. So i gather cur.rowcount somehow returns something falsy [1] when the on conflict [1] this looks fine: In [11]: 0 if None else 1 Out[11]: 1 In [12]: 0 if False else 1 Out[12]: 1 In [13]: 0 if 0 else 1 Out[13]: 1 | |
| swh/storage/postgresql/db.py | ||
|---|---|---|
| 978 | I was unsure of the behavior of cur.rowcount with the 'on conflict', so I checked (pg13 in a pifpaf session) and it seems to work as expected (aka only count the number of inserted rows): In [22]: c.execute("select * from origin")
In [23]: c.fetchall()
Out[23]: [(1, 'http://foo'), (2, 'http://bar'), (8, 'http://baz'), (11, 'http://biz')]
In [31]: execute_values(c, "INSERT INTO origin (url) values %s ON CONFLICT DO NOTHING", (("http://bar",),("http://toto",), ("http://foo",), ("http://tutu",) ))
In [32]: c.rowcount
Out[32]: 2 | |
Build is green
Patch application report for D5671 (id=20288)
Rebasing onto ffb38f71d9...
Current branch diff-target is up to date.
Changes applied before test
commit 77ef651d958285c3d91b10961f5a62a7612f9354
Author: David Douard <david.douard@sdfa3.org>
Date: Tue May 4 16:06:02 2021 +0200
Make postgresql's origin_add not raise an error in case of conflict
there is no need for an url insertion in the origin table to result in a
unicity error. Conflicting insertion of the same URL in this table may
happen in case of concurrent process (loading or in a replayer session).See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1306/ for more details.