Page MenuHomeSoftware Heritage

Gitorious import: Release time conversion issue when no release date is provided
Closed, ResolvedPublic

Description

On some disk repository, errors occur due to time convertion issue.

Steps to reproduce on a local storage with latest swh-loader-git.

Use /srv/storage/space/mirrors/gitorious.org/mnt/repositories/nmbscan/nmbscan.git:

repo = 'nmbscan.git'
origin_url = 'http://foo/bar/git'

import logging
logging.basicConfig(level=logging.DEBUG)

from swh.loader.git.tasks import LoadDiskGitRepository

t = LoadDiskGitRepository()
t.run(origin_url=origin_url, directory=repo, date='2016-05-03T15:16:32+00:00')

Output:

DEBUG:swh.scheduler.task.LoadDiskGitRepository:Creating git origin for http://nmbscan.git
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Done creating git origin for http://nmbscan.git
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Sending 50 contents
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Done sending 50 contents
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Sending 42 directories
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Done sending 42 directories
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Sending 37 revisions
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Done sending 37 revisions
Traceback (most recent call last):
  File "./load-git-disk.py", line 20, in <module>
    main()
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "./load-git-disk.py", line 17, in main
    t.run(origin_url=origin_url, directory=repo, date='2016-05-03T15:16:32+00:00')
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 35, in run
    raise e from None
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 32, in run
    result = self.run_task(*args, **kwargs)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/tasks.py", line 39, in run_task
    return loader.load(origin_url, directory, dateutil.parser.parse(date))
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/base.py", line 437, in load
    self.send_all_releases(self.get_releases())
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/base.py", line 377, in send_all_releases
    send_in_packets(releases, self.send_releases, packet_size)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/base.py", line 28, in send_in_packets
    for obj in objects:
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/loader.py", line 128, in get_releases
    self.repo[hashutil.hash_to_bytehex(oid)], log=self.log)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/converters.py", line 224, in dulwich_tag_to_release
    tag._tag_timezone_neg_utc,
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/converters.py", line 143, in dulwich_tsinfo_to_timestamp
    'offset': timezone // 60,
TypeError: unsupported operand type(s) for //: 'NoneType' and 'int'

Note: load-git-disk.py is a wrapper around the scenario described (cf. P185)

Event Timeline

In that particular repository, the tag has no time (tag.tag_time and tag.tag_timezone are None, tag._tag_timezone_neg_utc is False - those are the default values for that object).
But the swh-loader-git's code expects those values to exist.
In our model though, we are ok with that date not being provided.

So i think the fix here is to simply check for the time, if not present, setting it to None (similarly to the 'no author' case).

ardumont renamed this task from Gitorious import: Time conversion issue to Gitorious import: Release time conversion issue when none is provided.Oct 26 2017, 11:53 AM
ardumont renamed this task from Gitorious import: Release time conversion issue when none is provided to Gitorious import: Release time conversion issue when no release date is provided.Oct 26 2017, 1:09 PM