Page MenuHomeSoftware Heritage

Race condition on person insertion in pgsql storage
Closed, MigratedEdits Locked

Description

While running a parallel script that inserts lots of revisions with the same author (\x, ie. the empty string) in an empty db, I got this error:

Traceback (most recent call last):
  File "./scripts/cassandra-bench-tools.py", line 264, in <module>
    cli(auto_envvar_prefix='SWH_BENCH')
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/vlorentz/.local/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "./scripts/cassandra-bench-tools.py", line 259, in pgsql_import_dataset
    push_revisions('/dev/stdin', storage)
  File "./scripts/cassandra-bench-tools.py", line 127, in push_revisions
    stats = storage.revision_add(batch)
  File "/home/vlorentz/swh-environment/swh-core/swh/core/db/common.py", line 49, in _meth
    return meth(self, *args, db=db, cur=cur, **kwargs)
  File "/home/vlorentz/swh-environment/swh-storage/swh/storage/storage.py", line 717, in revision_add
    db.revision_add_from_temp(cur)
  File "/home/vlorentz/swh-environment/swh-core/swh/core/db/db_utils.py", line 33, in _meth
    self._cursor(cur).execute('SELECT %s()' % stored_proc)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "person_fullname_idx"
DETAIL:  Key (fullname)=(\x) already exists.
CONTEXT:  SQL statement "with t as (
        select author_fullname as fullname, author_name as name, author_email as email from tmp_revision
    union
        select committer_fullname as fullname, committer_name as name, committer_email as email from tmp_revision
    ) insert into person (fullname, name, email)
    select distinct on (fullname) fullname, name, email from t
    where not exists (
        select 1
        from person p
        where t.fullname = p.fullname
    )"
PL/pgSQL function swh_person_add_from_revision() line 3 at SQL statement
SQL statement "SELECT swh_person_add_from_revision()"
PL/pgSQL function swh_revision_add() line 3 at PERFORM