Page MenuHomeSoftware Heritage

googlecode import: hglib.error.CommandError during loading
Closed, MigratedEdits Locked

Description

When creating a bundle, hglib complains about missing information.

archive: /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/e/exposong/exposong-source-archive.zip

Stacktrace:

$ python3
Python 3.6.4 (default, Jan  5 2018, 02:13:53)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> rootpath = '/home/storage/hg/repo/latest'
... archive_name = 'exposong-source-archive.zip'
>>> origin_url = 'https://%s/googlecode/hg' % archive_name
>>>
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.mercurial.tasks import LoadArchiveMercurialTsk
>>>
>>> archive_path = '%s/%s' % (rootpath, archive_name)
>>> t = LoadArchiveMercurialTsk()
>>> t.run(origin_url=origin_url, archive_path=archive_path, visit_date='2016-05-03T15:16:32+00:00')
patool: Extracting /home/storage/hg/repo/latest/exposong-source-archive.zip ...
patool: running /usr/bin/7z x -y -o/tmp/swh.loader.mercurial.75hjki0o -- /home/storage/hg/repo/latest/exposong-source-archive.zip
patool: ... /home/storage/hg/repo/latest/exposong-source-archive.zip extracted to `/tmp/swh.loader.mercurial.75hjki0o'.
INFO:swh.scheduler.task.LoadArchiveMercurialTsk:From https://exposong-source-archive.zip/googlecode/hg - Uncompressing archive exposong-source-archive.zip at /tmp/swh.loader.mercurial.75hjki0o/exposong-source-archive
DEBUG:swh.scheduler.task.LoadArchiveMercurialTsk:Bundling at /tmp/swh.loader.mercurial.75hjki0o/exposong/HG20_none_bundle
DEBUG:amqp:Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@corellia.lan', 'copyright': 'Copyright (C) 2007-2017 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.10'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
DEBUG:amqp:using channel_id: 1
DEBUG:amqp:Channel open
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 161, in run
    raise e from None
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 158, in run
    result = self.run_task(*args, **kwargs)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-mercurial/swh/loader/mercurial/tasks.py", line 43, in run_task
    visit_date=visit_date)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 839, in load
    self.prepare(*args, **kwargs)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-mercurial/swh/loader/mercurial/bundle20_loader.py", line 442, in prepare
    super().prepare(origin_url, visit_date, directory=directory)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-mercurial/swh/loader/mercurial/bundle20_loader.py", line 118, in prepare
    type=b'none-v2')
  File "/usr/lib/python3/dist-packages/hglib/client.py", line 501, in bundle
    self.rawcommand(args, eh=eh)
  File "/usr/lib/python3/dist-packages/hglib/client.py", line 185, in rawcommand
    return eh(ret, out, err)
  File "/usr/lib/python3/dist-packages/hglib/util.py", line 163, in __call__
    raise error.CommandError(self.args, ret, out, err)
hglib.error.CommandError: (255, b'814 changesets found', b'abort: empty or missing revlog for data/sword/mods.d/kjv.conf')

Reproductibility, (from latest swh-environment, in a local environment):

import os
rootpath = '/home/storage/hg/repo/latest'  # well, you'd need to adapt this ;)
archive_name = 'exposong-source-archive.zip'
origin_url = 'https://%s/googlecode/hg' % archive_name

import logging
logging.basicConfig(level=logging.DEBUG)

from swh.loader.mercurial.tasks import LoadArchiveMercurialTsk

archive_path = '%s/%s' % (rootpath, archive_name)
t = LoadArchiveMercurialTsk()
t.run(origin_url=origin_url, archive_path=archive_path, visit_date='2016-05-03T15:16:32+00:00')

Event Timeline

ardumont renamed this task from import googlecode: hglib.error.CommandError during loading to googlecode import: hglib.error.CommandError during loading.Feb 13 2018, 12:21 PM

Basic checks on the archive is fine:

  • archive check is ok
  • archive is uncompressing

Uncompressing the archive and checking the hg repository.
Hg verify is warning about that particular revision:

$ pwd
/home/storage/hg/repo/latest/exposong
$ hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
 warning: revlog 'data/data/sword/mods.d/kjv.conf.i' not in fncache!
 761: empty or missing data/sword/mods.d/kjv.conf
 data/sword/mods.d/kjv.conf@761: manifest refers to unknown revision 4c0f370237f8
347 files, 814 changesets, 2283 total revisions
1 warnings encountered!
hint: run "hg debugrebuildfncache" to recover from corrupt fncache
2 integrity errors encountered!
(first damaged changeset appears to be 761)

Doing what the tool is hinting at...:

$ hg debugrebuildfncache                                                              
removing data/data/sword/mods.d/kjv.conf.i
0 items added, 1 removed from fncache

... is not enough. That does not solve the issue.

Trying again to inject the incoming repository (as local repository this time).
The same error occurs:

$ python3                                                       
Python 3.6.4 (default, Jan  5 2018, 02:13:53)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> repo = 'exposong'
>>>
>>> import os
>>> directory = '/home/storage/hg/repo/latest/%s' % repo
>>> origin_url = 'https://%s/googlecode/local/hg/' % repo
>>>
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.mercurial.tasks import LoadMercurialTsk
>>>
>>> t = LoadMercurialTsk()
>>> t.run(origin_url=origin_url, directory=directory, visit_date='2016-05-03T15:16:32+00:00')
DEBUG:swh.scheduler.task.LoadMercurialTsk:Bundling at /home/storage/hg/repo/latest/exposong/HG20_none_bundle
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 161, in run
    raise e from None
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 158, in run
    result = self.run_task(*args, **kwargs)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-mercurial/swh/loader/mercurial/tasks.py", line 27, in run_task
    visit_date=visit_date)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 839, in load
    self.prepare(*args, **kwargs)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-mercurial/swh/loader/mercurial/bundle20_loader.py", line 118, in prepare
    type=b'none-v2')
  File "/usr/lib/python3/dist-packages/hglib/client.py", line 501, in bundle
    self.rawcommand(args, eh=eh)
  File "/usr/lib/python3/dist-packages/hglib/client.py", line 185, in rawcommand
    return eh(ret, out, err)
  File "/usr/lib/python3/dist-packages/hglib/util.py", line 163, in __call__
    raise error.CommandError(self.args, ret, out, err)
hglib.error.CommandError: (255, b'814 changesets found', b'abort: empty or missing revlog for data/sword/mods.d/kjv.conf')

That did not quite help...

Conclusion: the repository is corrupted.
I guess, that would mean digging at https://www.mercurial-scm.org/wiki/RepositoryCorruption (that hints towards reusing other mercurial clones to repair it... which we don't have).

This git repository sounds quite like a mirror.
A local clone and a diff recursive check confirms this:

$ git clone https://github.com/exposong/exposong exposong-git
Cloning into 'exposong-git'...
remote: Counting objects: 6219, done.
remote: Compressing objects: 100% (1580/1580), done.
remote: Total 6219 (delta 4041), reused 6186 (delta 4041), pack-reused 0
Receiving objects: 100% (6219/6219), 15.11 MiB | 17.00 KiB/s, done.
Resolving deltas: 100% (4041/4041), done.
$ diff -r -x .git -x .hg exposong exposong-git
Only in exposong: .hgtags
Only in exposong-git: readme.md

I close this as another mirror exists that we already browsed multiple times.

Also, nothing is set in stone, we can always reopen this if another instance occurs.