Page MenuHomeSoftware Heritage

googlecode import: Loading failure on symbolic link edge cases
Closed, MigratedEdits Locked

Description

Some repository loading fails because of some symbolic link edge cases.
A symbolic link with executable flag set.
It appears that in this case, the properties must be changed not to the symlink but to its source.

dump: /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/s/skal/skal-repo.svndump.gz

$ python3
>>> repo = 'skal-repo'
>>> origin_url = 'http://%s.googlecode.com' % repo
>>>
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.svn.tasks import LoadSWHSvnRepositoryTsk
>>>
>>> t = LoadSWHSvnRepositoryTsk()
>>> t.run(svn_url='file:///home/storage/svn/repo/latest/%s' % repo,
...       destination_path='/tmp',
...       origin_url=origin_url, visit_date='2016-05-03T15:16:32+00:00',
...       start_from_scratch=True)
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Creating svn origin for http://skal-repo.googlecode.com
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Done creating svn origin for http://skal-repo.googlecode.com
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Creating origin_visit for origin 1684 at time 2016-05-03T15:16:32+00:00
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Done Creating origin_visit for origin 1684 at time 2016-05-03T15:16:32+00:00
INFO:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Processing revisions [1-103] for {'swh-origin': 1684, 'remote_url': 'file:///home/storage/svn/repo/latest/skal-repo', 'local_url': b'/tmp/swh.loader.svn.1kb89qaj.tmp/skal-repo', 'uuid': b'2e175d46-24b1-41b6-a7ad-ac97b4d0c617'}
...
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:rev: 41, swhrev: f15e2f3f20705b7a0b98701c4839567c118aaa6d, dir: d252483f1521386e85d11346024b69a79ef44161
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:rev: 42, swhrev: 8ec6eb017fbcdb291faab4f42a971a9a836e7020, dir: 92da2f8ce6fd67fb065e74c5a7ea567d39186d69
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:snapshot: {'id': b'Q\x9d\x80-1\xa4\xcdc\xbd\xd6\xedhU$\xa1\x00oO\\\xc7', 'branches': {b'master': {'target': b'\xc5\xbcF\xb7\xa0\x9e\x90\xc5r5\x9d\xa8\x05\xb8mk\xc8\x7f\x11O', 'target_type': 'revision'}}}
ERROR:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 862, in load
    self.store_data()
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 476, in store_data
    start_from_scratch=self.start_from_scratch)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 295, in process_repository
    svnrepo, revision_start, revision_end, revision_parents)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 395, in process_swh_revisions
    raise e
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 375, in process_swh_revisions
    self.config['revision_packet_size']):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper
    for _data in itertools.zip_longest(*args, fillvalue=None):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 319, in process_svn_revisions
    for rev, nextrev, commit, new_objects, root_directory in gen_revs:
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/svn.py", line 232, in swh_hash_data_per_revision
    objects = self.swhreplay.compute_hashes(rev)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 374, in compute_hashes
    self.replay(rev)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 359, in replay
    self.conn.replay(rev, rev+1, self.editor)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 175, in close
    os.chmod(self.fullpath, 0o755)
FileNotFoundError: [Errno 2] No such file or directory: b'/tmp/swh.loader.svn.1kb89qaj.tmp/skal-repo/trunk/script/users'
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Updating origin_visit for origin 1684 with status partial
DEBUG:swh.scheduler.task.LoadSWHSvnRepositoryTsk:Done updating origin_visit for origin 1684 with status partial

Event Timeline

It appears that in this case, the properties must be changed not to the symlink but to its source.

Fortunately, no.
I checkouted the wrong commit without noticing. So i was not comparing the right commits.