Page MenuHomeSoftware Heritage

googlecode import: Some dumps are just empty repository
Closed, ResolvedPublic

Description

And unexpectedly, those svn repositories starts their commit range log number to 0.

dump: /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/b/bbs-proj/bbs-proj-repo.svndump.gz

>>> dump = '2/bbs-proj-repo.svndump.gz'
>>> origin_url = 'http://%s.googlecode.com' % dump
>>>
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.svn.tasks import MountAndLoadSvnRepositoryTsk
>>>
>>> t = MountAndLoadSvnRepositoryTsk()
>>> t.run(archive_path=dump, origin_url=origin_url, visit_date='2016-05-03T15:16:32+00:00')
INFO:swh.loader.svn.SvnLoader:Archive to mount and load 2/bbs-proj-repo.svndump.gz
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://2/bbs-proj-repo.svndump.gz.googlecode.com
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://2/bbs-proj-repo.svndump.gz.googlecode.com
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating origin_visit for origin 1553 at time 2016-05-03T15:16:32+00:00
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done Creating origin_visit for origin 1553 at time 2016-05-03T15:16:32+00:00
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:[revision_start-revision_end]: [1-0]
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Processing {'remote_url': 'file:///tmp/swh.loader.svn.pttpj1p8.tmp/2', 'local_url': b'/tmp/swh.loader.svn.7zs3m9df.tmp/2', 'uuid': b'85271a8c-1f52-4d72-8b51-1d813f2a6efb', 'swh-origin': 1553}.
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 862, in load
    self.store_data()
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 475, in store_data
    start_from_scratch=self.start_from_scratch)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 293, in process_repository
    svnrepo, revision_start, revision_end, revision_parents)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 394, in process_swh_revisions
    raise e
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 373, in process_swh_revisions
    self.config['revision_packet_size']):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper
    for _data in itertools.zip_longest(*args, fillvalue=None):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 317, in process_svn_revisions
    for rev, nextrev, commit, new_objects, root_directory in gen_revs:
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/svn.py", line 234, in swh_hash_data_per_revision
    for commit in self.logs(start_revision, end_revision):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/svn.py", line 184, in logs
    discover_changed_paths=False):
subvertpy.SubversionException: ('No such revision 1', 160006)
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Updating origin_visit for origin 1553 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done updating origin_visit for origin 1553 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Clean up temp directory /tmp/swh.loader.svn.pttpj1p8.tmp for project 2
{'status': 'failed'}

Event Timeline

ardumont created this task.Feb 5 2018, 11:43 AM

It's more empty repository case than a repository starting its commit range at 0...

Other dump matching the case: /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/f/flylinkdc-update/flylinkdc-update-repo.svndump.gz

ardumont renamed this task from googlecode import: Some dumps starts their log to revision 0 to googlecode import: Some dumps are just empty repository.Feb 5 2018, 1:45 PM
ardumont changed the task status from Open to Work in Progress.
ardumont updated the task description. (Show Details)