Page MenuHomeSoftware Heritage

Empty partial visits when we fail to load due to externals property
Closed, MigratedEdits Locked

Description

When swh.loader.svn loads a repository which contains svn:externals properties, the loading interrupts itself, and a partial visit is registered in the archive.

However, the revisions that have been converted up to the one containing externals are not loaded, only the underlying contents and directories. The registered partial visit is empty.

Steps to reproduce on a local storage:

Use /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/c/cpcsdk/cpcsdk-repo.svndump.gz

import logging
logging.basicConfig(level=logging.DEBUG)

from swh.loader.svn.tasks import MountAndLoadSvnRepositoryTsk

t = MountAndLoadSvnRepositoryTsk()
t.run(archive_path='cpcsdk-repo.svndump.gz', origin_url='http://cpcsdk.googlecode.com/svn/', visit_date='2016-05-03T15:16:32+00:00')

rev: 456, swhrev: 975f88a53cec719fe0f3041f618bfae7528ac2be, dir: 2dedb203719e482743d4bd6822ae65cde41cf607

The loading b0rks at revision 457 because of svn:externals; when done, note that the revision 975f88a53cec719fe0f3041f618bfae7528ac2be is not loaded to the archive, but the directory 2dedb203719e482743d4bd6822ae65cde41cf607 is.

Event Timeline

Ok, just so you know, i did not reproduce this behavior immediately (even though: swh-env updated, db rebuilt, configuration file 'almost' identical to prod).
I had another issue prior to the one described (about DentryPerms.directory).

The 'almost' identical configuration diverged only on the storage (which was local).

Here is the stacktrace:

$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.svn.tasks import MountAndLoadSvnRepositoryTsk
>>>
>>> t = MountAndLoadSvnRepositoryTsk()
>>> t.run(archive_path='cpcsdk-repo.svndump.gz', origin_url='http://cpcsdk.googlecode.com/svn/', visit_date='2016-05-03T15:16:32+00:00')
INFO:swh.loader.svn.SvnLoader:Archive to mount and load cpcsdk-repo.svndump.gz
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://cpcsdk.googlecode.com/svn/
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://cpcsdk.googlecode.com/svn/
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://cpcsdk.googlecode.com/svn/
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://cpcsdk.googlecode.com/svn/
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating origin_visit for origin 1 at time 2016-05-03T15:16:32+00:00
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done Creating origin_visit for origin 1 at time 2016-05-03T15:16:32+00:00
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:[revision_start-revision_end]: [1-623]
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Processing {'swh-origin': 1, 'uuid': b'dd9cee65-5dde-4192-945e-cf0bec8a92d6', 'remote_url': 'file:///tmp/tmp.wmko7y37.swh.loader.svn', 'local_url': b'/tmp/tmp.yp2jitma.swh.loader/tmp.wmko7y37.swh.loader.svn'}.
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 1, swhrev: 8c00e01da274ded154d25b5b77d7e4dc077195d3, dir: 75ed58f260bfa4102d0e09657803511f5f0ab372
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 2, swhrev: e5b5466c376da890e2642dc42d6c6dd8807d2943, dir: d5a2863eb43c43292db4f804c320bc16a1f17375
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 3, swhrev: 936dc143f984c810b6234bff4717c6795a2d3c74, dir: 49a3946d60a6bd324f301ad8632c843fe4e09951
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 4, swhrev: 687abf57995e18c3df48132d8e82e5a8efd5df8a, dir: 1893c4ef234bc876840793f04dad13a6edead5c7
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 5, swhrev: 7cb9e8974a090f477de7528e4bbd818ff7f659b2, dir: e35aed32f4ad5d6ed8e28876373c198a531edc32
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 6, swhrev: e452dcd84af5530a484304373f2d0b5deb65cd69, dir: 4ee26a044b21470b477a18ac6a84fbb3eda31ea4
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 7, swhrev: 5345d0018ba076e8fe4ef2f3ed5b5fb5ad7f7ce3, dir: d143acafc843b621c7ccb7e6e6e41d59cbacb66a
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 1195 contents
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 1195 contents
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 8, swhrev: 584d9ff73981cfcab44f3f2e15c91fdf7984e92b, dir: 2862ce4c4ab79660c34a46bf3aa29b7bf639e96a
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 9, swhrev: c4d19fb3e11a454239f468996ef2a4e5bbe65b26, dir: a2555ad5cba6f8514140746a2f2048c5192134a8
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 10, swhrev: 4435a9c2715e45b5334b4a51f5a3c7c649818c28, dir: 70d694bc06cf3165ab5aeefe9f4e2e3e84165293
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 11, swhrev: 22a2cfe9259295f586e13202c6874bc486f25b70, dir: af4bf5e750480398c9031e94c0dd2429f402b36e
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 12, swhrev: 751ad5674d4cfa137b11e403facba6124974a19f, dir: 262185f44411173cb13987f81da51237fb20687f
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 13, swhrev: 22e63e5088103c83f79000bbb20954938aaa3ea0, dir: 0723781250956b12961c04a10433e6ab2baabc9f
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 14, swhrev: d54b1ceed255fe7075c8f6c60a2153dcaaff7365, dir: 2fae94a18650760281732e38437e7d1f2dc54244
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 15, swhrev: 5cb584f26d008661af665d0c1db7dabed6a085dc, dir: 4538ecb42a2315407da7a97bdc9f8308586e64b4
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 16, swhrev: 4c8e6de61330ef0b99d5007ceaf746d7db173a28, dir: ea09ae8d1c83eccd4ab54b4a68af5ac9b49fed64
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 17, swhrev: 4c0012e0cb85146d00ef49b6203248590ef1c4b1, dir: 42c3c256eb962bdb61c82820588c409d3f758701
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 18, swhrev: 9012cbdf7c6ed045e8587a8687e856f2380a9a1d, dir: 17586a9e6e6fc9f82723c43d4368135fa9d6c891
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 19, swhrev: d2f0cb2d5ec820c924625e9a754600c347887fc3, dir: 9c149031fa310991cf1a774e869e153ca8cd0f89
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 20, swhrev: 694e29b5c4b8acc01490fd8d906bb66eb9e3aca8, dir: ffd7e9c62d44cc69a36da387d9f3bd6bc9a616ff
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 21, swhrev: d69c680394270bb9e742ccc362c60516a08440ea, dir: 8890ca5f9cd9621b19ff0ffc25345f5385e839ac
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 22, swhrev: f9a54f0605677a247c59a1806dcfb82f82a01721, dir: ec4ee5e23f27483b7c9bae3683158cc5470ada37
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 223 contents
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 223 contents
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 262 directories
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 627, in load
    self.store_data()
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 303, in store_data
    self.last_known_swh_revision)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 492, in process_repository
    svnrepo, revision_start, revision_end, revision_parents)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 232, in process_swh_revisions
    raise e
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 216, in process_swh_revisions
    self.config['revision_packet_size']):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper
    for _data in itertools.zip_longest(*args, fillvalue=None):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 165, in process_svn_revisions
    new_objects.get('directory', {}).values())
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 494, in maybe_load_directories
    self.bulk_send_directories(directories)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 417, in bulk_send_directories
    self.send_directories(self.directories.pop())
  File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 276, in send_directories
    self.storage.directory_add(directory_list)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-storage/swh/storage/storage.py", line 427, in directory_add
    cur,
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-storage/swh/storage/db.py", line 183, in copy_to
    tblname, ', '.join(columns)), f)
psycopg2.DataError: invalid input syntax for integer: "DentryPerms.directory"
CONTEXT:  COPY tmp_directory_entry_dir, line 1, column perms: "DentryPerms.directory"

DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Updating origin_visit for origin 1 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done updating origin_visit for origin 1 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Clean up temp directory /tmp/tmp.wmko7y37.swh.loader.svn for project
DEBUG:amqp:Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@corellia.lan', 'product': 'RabbitMQ','version': '3.6.6', 'platform': 'Erlang/OTP', 'capabilities': {'consumer_priorities': True, 'consumer_cancel_notify': True, 'publisher_confirms': True, 'exchange_exchange_bindings': True, 'direct_reply_to': True, 'connection.blocked': True, 'authentication_failure_close': True, 'basic.nack': True, 'per_consumer_qos': True}, 'information': 'Licensed underthe MPL.  See http://www.rabbitmq.com/', 'copyright': 'Copyright (C) 2007-2016 Pivotal Software, Inc.'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
DEBUG:amqp:Open OK!
DEBUG:amqp:using channel_id: 1
DEBUG:amqp:Channel open

Switching the configuration to use a remote storage, and now I reproduce.

Stacktrace of the reproduced error:

DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 456, swhrev: 975f88a53cec719fe0f3041f618bfae7528ac2be, dir:2dedb203719e482743d4bd6822ae65cde41cf607
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Eventful partial visit. Detail: Property 'svn:externals' detected. Not implemented yet.
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:occ: {'branch': 'master', 'origin': 1, 'visit': 2, 'target': b'0\xd86S\x8d01f\x1fj\xf1\x08tW\xe5.f7\xf0\x0b', 'target_type': 'revision'}
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 216, in process_swh_revisions
    self.config['revision_packet_size']):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper
    for _data in itertools.zip_longest(*args, fillvalue=None):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 160, in process_svn_revisions
    for rev, nextrev, commit, new_objects, root_directory in gen_revs:
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/svn.py", line 266, in swh_hash_data_per_revision
    objects = self.swhreplay.compute_hashes(rev)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 333, in compute_hashes
    self.replay(rev)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 318, in replay
    self.conn.replay(rev, rev+1, self.editor)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 225, in change_prop
    "Property '%s' detected. Not implemented yet." % key)
ValueError: Property 'svn:externals' detected. Not implemented yet.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 627, in load
    self.store_data()
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 303, in store_data
    self.last_known_swh_revision)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 492, in process_repository
    svnrepo, revision_start, revision_end, revision_parents)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 230, in process_swh_revisions
    'id': known_swh_rev['id'],
swh.loader.svn.loader.SvnLoaderEventful: Property 'svn:externals' detected. Not implemented yet.
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Updating origin_visit for origin 1 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done updating origin_visit for origin 1 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 12 contents
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 12 contents
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 14 directories
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 14 directories
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 1 occurrences
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 1 occurrences
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Clean up temp directory /tmp/tmp.5rgc3odd.swh.loader.svn for project
DEBUG:amqp:Start from server, version: 0.9, properties: {'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.6', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'cluster_name': 'rabbit@corellia.lan', 'capabilities': {'authentication_failure_close': True, 'connection.blocked': True, 'consumer_priorities': True, 'basic.nack': True, 'exchange_exchange_bindings': True, 'direct_reply_to': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'publisher_confirms': True}, 'copyright': 'Copyright (C) 2007-2016 Pivotal Software, Inc.'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
DEBUG:amqp:Open OK!
DEBUG:amqp:using channel_id: 1
DEBUG:amqp:Channel open

An exception within an exception is raised...
And indeed, data are flushed but not the revisions.

Looking into it.

Well (remembering now), the fact that only 400 revisions is stored is normal (as in implemented that way).

To avoid having too much in memory, swh-loader-svn walks the history log and groups revision by block size (configured by yaml with key revision_packet_size, which for this test, is 100-revision blocks).
When an exception is raised, it's supposed to take the last revision seen (by blocks so here 400 as in 4th iteration of 100 revisions) to make it the next starting point (for the next visit).
Somehow the first time though, an exception is raised within that edge case scenario.

What's not normal here is that the first time we pass on that origin, another exception is raised (thus breaking what i've just described...).
Looking further into it.

Note:
The fact that the contents/directories are already sent is not problematic.
We will make links (as in revision targeting those directories) later when visiting again that origin (idempotency making it real).

Well, well, at that moment, the revision has no 'id' key yet...
Ok, fixing it.