Page MenuHomeSoftware Heritage

lister: Failure during listing on a fresh new db (unicity constraint)
Open, NormalPublic

Description

Failure during listing debian distribution (staging):

                                            return super().__call__(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 641, in __protected_call__
                                            return self.run(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/swh/lister/debian/tasks.py", line 13, in list_debian_distribution
                                            DebianLister(distribution=distribution, **lister_args).run()
                                          File "/usr/lib/python3/dist-packages/swh/lister/debian/lister.py", line 243, in run
                                            _, new_area_packages = self.ingest_data(None)
                                          File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 496, in ingest_data
                                            injected = self.inject_repo_data_into_db(models_list)
                                          File "/usr/lib/python3/dist-packages/swh/lister/debian/lister.py", line 177, in inject_repo_data_into_db
                                            .filter(~exists_tmp_pkg(self.db_session, Package))\
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2925, in all
                                            return list(self)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3080, in __iter__
                                            self.session._autoflush()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 1582, in _autoflush
                                            util.raise_from_cause(e)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
                                            reraise(type(exception), exception, tb=exc_tb, cause=cause)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 277, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 1571, in _autoflush
                                            self.flush()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2446, in flush
                                            self._flush(objects)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2584, in _flush
                                            transaction.rollback(_capture_exception=True)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/langhelpers.py", line 67, in __exit__
                                            compat.reraise(exc_type, exc_value, exc_tb)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 277, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2544, in _flush
                                            flush_context.execute()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/unitofwork.py", line 416, in execute
                                            rec.execute(self)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/unitofwork.py", line 583, in execute
                                            uow,
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
                                            insert,
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/persistence.py", line 1116, in _emit_insert_statements
                                            statement, params
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 980, in execute
                                            return meth(self, multiparams, params)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection
                                            return connection._execute_clauseelement(self, multiparams, params)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement
                                            distilled_params,
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context
                                            e, statement, parameters, cursor, context
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception
                                            util.raise_from_cause(sqlalchemy_exception, exc_info)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
                                            reraise(type(exception), exception, tb=exc_tb, cause=cause)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 276, in reraise
                                            raise value.with_traceback(tb)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
                                            cursor, statement, parameters, context
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 536, in do_execute
                                            cursor.execute(statement, parameters)
                                        sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "package_area_id_name_version_key"
                                        DETAIL:  Key (area_id, name, version)=(1, 0ad, 0.0.21-2) already exists.
                                         [SQL: 'INSERT INTO package (area_id, name, version, directory, files, origin_id, task_id, revision_id) VALUES (%(area_id)s, %(name)s, %(version)s, %(directory)s, %(files)s, %(origin_id)s, %(task_id)s, %(revision_id)s) RETURNING package.id'] [parameters: {'area_id': 1, 'name': '0ad', 'version': '0.0.21-2', 'directory': 'pool/main/0/0ad', 'files': '{"0ad_0.0.21-2.dsc": {"name": "0ad_0.0.21-2.dsc", "size": 2363, "md5sum": "5f2af935f4537ede6169db8946d18d81", "sha256": "ee98572de81be0ffbf039951111f ... (303 characters truncated) ... .tar.xz", "size": 71420, "md5sum": "01d28e643619455fef8d40f1d1e7da7d", "sha256": "2f6e5b751872932971c4dbf618c32ddef1021f195d0457f57030b814cb1749c7"}}', 'origin_id': None, 'task_id': None, 'revision_id': None}] (Background on this error at: http://sqlalche.me/e/gkpj)
Nov 08 09:06:32 worker0 python3[20658]: [2019-11-08 09:06:32,853: INFO/ForkPoolWorker-1] Task swh.lister.debian.tasks.DebianListerTask[36bc65e0-d11a-4d4e-89f1-aef02f54d0d1] succeeded in 273.4153532029595s: None

Event Timeline

ardumont triaged this task as Normal priority.Nov 8 2019, 11:58 AM
ardumont created this task.
ardumont renamed this task from lister-debian: Failure during listing (unicity constraint) to lister-debian: Failure during listing on a fresh new db (unicity constraint).Nov 8 2019, 12:17 PM
olasd added a subscriber: olasd.Nov 8 2019, 1:38 PM

I guess it's time to get rid of this database? 0:-)

Mmm, it's not the debian lister specific it seems...

Nov 15 10:12:11 worker2 python3[25641]: [2019-11-15 10:12:11,825: ERROR/ForkPoolWorker-5] Task swh.lister.gnu.tasks.GNUListerTask[c850696a-0941-4063-b1da-1fb73ac21b40] raised unexpected: IntegrityError('(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "gnu_repo_pkey"\nDETAIL:  Key (uid)=(3dldf) already exists.\n')
                                        Traceback (most recent call last):
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
                                            cursor, statement, parameters, context
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 536, in do_execute
                                            cursor.execute(statement, parameters)
                                        psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "gnu_repo_pkey"
                                        DETAIL:  Key (uid)=(3dldf) already exists.


                                        The above exception was the direct cause of the following exception:

                                        Traceback (most recent call last):
                                          File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 382, in trace_task
                                            R = retval = fun(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 45, in __call__
                                            return super().__call__(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 641, in __protected_call__
                                            return self.run(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/swh/lister/gnu/tasks.py", line 13, in list_gnu_full
                                            GNULister(**lister_args).run()
                                          File "/usr/lib/python3/dist-packages/swh/lister/core/simple_lister.py", line 80, in run
                                            response, injected_repos = self.ingest_data(dump_not_used_identifier)
                                          File "/usr/lib/python3/dist-packages/swh/lister/core/simple_lister.py", line 54, in ingest_data
                                            injected = self.inject_repo_data_into_db(models)
                                          File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 433, in inject_repo_data_into_db
                                            injected_repos[m['uid']] = self.db_inject_repo(m)
                                          File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 367, in db_inject_repo
                                            sql_repo = self.db_query_equal('uid', model_dict['uid'])
                                          File "/usr/lib/python3/dist-packages/swh/lister/core/lister_base.py", line 330, in db_query_equal
                                            .filter(key == value).first()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2979, in first
                                            ret = list(self[0:1])
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2771, in __getitem__
                                            return list(res)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3080, in __iter__
                                            self.session._autoflush()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 1582, in _autoflush
                                            util.raise_from_cause(e)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
                                            reraise(type(exception), exception, tb=exc_tb, cause=cause)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 277, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 1571, in _autoflush
                                            self.flush()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2446, in flush
                                            self._flush(objects)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2584, in _flush
                                            transaction.rollback(_capture_exception=True)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/langhelpers.py", line 67, in __exit__
                                            compat.reraise(exc_type, exc_value, exc_tb)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 277, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/session.py", line 2544, in _flush
                                            flush_context.execute()
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/unitofwork.py", line 416, in execute
                                            rec.execute(self)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/unitofwork.py", line 583, in execute
                                            uow,
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
                                            insert,
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/orm/persistence.py", line 1063, in _emit_insert_statements
                                            c = cached_connections[connection].execute(statement, multiparams)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 980, in execute
                                            return meth(self, multiparams, params)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection
                                            return connection._execute_clauseelement(self, multiparams, params)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement
                                            distilled_params,
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context
                                            e, statement, parameters, cursor, context
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception
                                            util.raise_from_cause(sqlalchemy_exception, exc_info)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
                                            reraise(type(exception), exception, tb=exc_tb, cause=cause)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 276, in reraise
                                            raise value.with_traceback(tb)
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
                                            cursor, statement, parameters, context
                                          File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 536, in do_execute
                                            cursor.execute(statement, parameters)
                                        sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "gnu_repo_pkey"
                                        DETAIL:  Key (uid)=(3dldf) already exists.
                                         [SQL: 'INSERT INTO gnu_repo (name, full_name, html_url, origin_url, origin_type, last_seen, task_id, uid, time_last_updated) VALUES (%(name)s, %(full_name)s, %(html_url)s, %(origin_url)s, %(origin_type)s, %(last_seen)s, %(task_id)s, %(uid)s, %(time_last_updated)s)'] [parameters: {'name': '3dldf', 'full_name': '3dldf', 'html_url': 'https://ftp.gnu.org/gnu/3dldf/', 'origin_url': 'https://ftp.gnu.org/gnu/3dldf/', 'origin_type': 'tar', 'last_seen': datetime.datetime(2019, 11, 14, 15, 21, 7, 260446), 'task_id': None, 'uid': '3dldf', 'time_last_updated': '2013-12-13T19:00:36+00:00'}] (Background on this error at: http://sqlalche.me/e/gkpj)
ardumont renamed this task from lister-debian: Failure during listing on a fresh new db (unicity constraint) to lister: Failure during listing on a fresh new db (unicity constraint).Nov 15 2019, 11:17 AM

My feeling about this is that the lister should flush more often and deal more appropriately with unicity constraint failure.
Prior to removing altogether the db i mean.

And first, fix the model appropriately...
for gnu, the uid (primary key) should at least be the url... the name 3dldf for example exists at least for gnu and old-gnu entries so that cannot work.