In a fresh created SWH DB, with SQL_ASCII encoding and C ctype/collate, Git loading failed for me at the first revision ingestion like this:
2018-01-06 19:19:35,719 9439 Sending 100000 revisions Exception in thread Thread-2417: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-storage/swh/storage/db.py", line 185, in writer tblname, ', '.join(columns)), f) psycopg2.DataError: unsupported Unicode escape sequence DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8. CONTEXT: JSON data, line 1: {"extra_headers": [["mergetag",... COPY tmp_revision, line 540, column metadata: "{"extra_headers": [["mergetag", "object 7333b5aca412d6ad02667b5a513485838a91b136\ntype commit\ntag p..." 2018-01-06 19:19:40,757 9439 Loading failure, updating to `partial` status Traceback (most recent call last): File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 896, in load self.store_data() File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 1001, in store_data self.send_batch_revisions(self.get_revisions()) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 681, in send_batch_revisions send_in_packets(revisions, self.send_revisions, packet_size) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 42, in send_in_packets sender(formatted_objects) File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call return attempt.get(self._wrap_exception) File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 450, in send_revisions self.storage.revision_add(revision_list) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-storage/swh/storage/storage.py", line 550, in revision_add db.revision_add_from_temp(cur) File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-storage/swh/storage/db.py", line 38, in _meth self._cursor(cur).execute('SELECT %s()' % stored_proc) psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block 2018-01-06 19:19:40,766 9439 Updating origin_visit for origin 1 with status partial 2018-01-06 19:19:40,768 9439 Done updating origin_visit for origin 1 with status partial {'status': 'failed'}
For comparison, the in-production DB has encoding UTF8 and C.UTF8 ctype/collate.
Do we actually require an UTF8 encoded-DB or, at least, a non-ASCII one?
If so, I'd like to updated sql/bin/db-init accordingly and document this requirement.