Investigate and fix:
Jun 30 09:00:54 worker15 python3[23590]: [2019-06-30 09:00:54,448: ERROR/ForkPoolWorker-2] Task swh.lister.gitlab.tasks.RangeGitLabLister[474d600e-ff5c-43b0-83f9-afc29b1cfd88] raised unexpected: IntegrityError('(psycopg2.IntegrityError) duplicate key value violates unique constraint "gitlab_repo_pkey"\nDETAIL: Key (uid)=(debian/nathanruiz-guest/apt) already exists.\n',) [13/6560] Traceback (most recent call last): File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) psycopg2.IntegrityError: duplicate key value violates unique constraint "gitlab_repo_pkey" DETAIL: Key (uid)=(debian/nathanruiz-guest/apt) already exists. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 382, in trace_task R = retval = fun(*args, **kwargs) File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 45, in __call__ return super().__call__(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 641, in __protected_call__ return self.run(*args, **kwargs) File "/usr/lib/python3/dist-packages/swh/lister/gitlab/tasks.py", line 36, in range_gitlab_lister lister.run(min_bound=start, max_bound=end) File "/usr/lib/python3/dist-packages/swh/lister/core/page_by_page_lister.py", line 123, in run checks=check_existence) File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 492, in ingest_data injected = self.inject_repo_data_into_db(models_list) File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 435, in inject_repo_data_into_db injected_repos[m['uid']] = self.db_inject_repo(m) File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 372, in db_inject_repo sql_repo = self.db_query_equal('uid', model_dict['uid']) File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 335, in db_query_equal .filter(key == value).first() File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2659, in first ret = list(self[0:1]) File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2457, in __getitem__ return list(res) File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2760, in __iter__ self.session._autoflush() File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 1303, in _autoflush util.raise_from_cause(e) File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 1293, in _autoflush self.flush() File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2019, in flush self._flush(objects) File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2137, in _flush transaction.rollback(_capture_exception=True) File "/usr/lib/python3/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2101, in _flush flush_context.execute() File "/usr/lib/python3/dist-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute rec.execute(self) File "/usr/lib/python3/dist-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute uow File "/usr/lib/python3/dist-packages/sqlalchemy/orm/persistence.py", line 174, in save_obj mapper, table, insert) File "/usr/lib/python3/dist-packages/sqlalchemy/orm/persistence.py", line 767, in _emit_insert_statements execute(statement, multiparams) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception exc_info File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 185, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.IntegrityError) duplicate key value violates unique constraint "gitlab_repo_pkey" DETAIL: Key (uid)=(debian/nathanruiz-guest/apt) already exists. [SQL: 'INSERT INTO gitlab_repo (name, full_name, html_url, origin_url, origin_type, last_seen, task_id, uid, instance) VALUES (%(name)s, %(full_name)s, %(html_url)s, %(origin_url)s, %(origin_type)s, %(last_seen)s, %(task_id)s, %(uid)s, %(instance)s)'] [parameters: {'instance': 'debian', 'last_seen': datetime.datetime(2019, 6, 30, 9, 0, 36, 155540), 'origin_url': 'https://salsa.debian.org/nathanruiz-guest/apt.git', 'full_name': 'nathanruiz-guest/apt', 'name': 'apt', 'html_url': 'https://salsa.debian.org/nathanruiz-guest/apt', 'task_id': None, 'origin_type': 'git', 'uid': 'debian/nathanruiz-guest/apt'}] Jun 30 09:00:54 worker15 python3[23574]: [2019-06-30 09:00:54,518: INFO/MainProcess] Received task: swh.lister.gitlab.tasks.RangeGitLabLister[71da1490-b1ac-4d93-bc7f-5402472e05d1]
With @douardda, we might have encountered those occurrences already.
It was possibly due to range interval overlap IMSMW.
In any case, that must be dealt with:
- by either checking the range computations to avoid overlap
- as a fallback, either trap those errors (if the source of the error is not found for example). Then make sure the main process continues to avoid having holes