Page MenuHomeSoftware Heritage

Gitorious import: Examine ingestion logs for errors and list them if any
Closed, ResolvedPublic

Description

When the ingestion is done, retrieve errors immediately from softwareheritage-log between the start and end date of the ingestion and reference them (e.g paste referenced from that thread or directly as comment here).

The errors should be addressable as per git repository.

Event Timeline

olasd renamed this task from Reference errors after ingestion to Gitorious import: Reference errors after ingestion.Feb 10 2017, 2:39 PM
ardumont renamed this task from Gitorious import: Reference errors after ingestion to Gitorious import: Examine ingestion logs for errors and list them if any.Feb 10 2017, 3:18 PM

After much learning on how to read and extract logs from our kibana instance, here is the error repartition.

"gitorious": {
  "total": 634,
  "errors": {
    "NotGitRepository('No git repository was found at /": 305,
    "IntegrityError('duplicate key value violates uniqu": 156,
    "FileNotFoundError(2, \"No usable temporary director": 121,
    "KeyError(b'b2tmp_obj_SfIbZp',)": 17,
    "TypeError(\"unsupported operand type(s) for //: 'No": 8,
    "KeyError(b'f2bd56289283900cd1dcf2c52b93952f98e4514": 5,
    "ObjectFormatException(ValueError(\"invalid literal ": 3,
    "ValueError('year is out of range',)": 3,
    "TransactionRollbackError('deadlock detected\\nDETAI": 3,
    "IsADirectoryError(21, 'Is a directory')": 2,
    "KeyError(b'5btmp_obj_D3f6nk',)": 2,
    "TypeError('ord() expected a character, but string ": 2,
    "StorageAPIError(ConnectionError(ProtocolError('Con": 2,
    "PermissionError(13, 'Permission denied')": 2,
    "ObjectFormatException(\"Unknown field b'Thu'\",)": 2,
    "OverflowError('timestamp out of range for platform": 1
  }

It's mostly either that:

  • we send something that was not a git repository.
  • integrity error (which is expected for now)
  • disk space problem

The other marginal errors sounds like implementation errors.
Those need more investigation.

Note that we have less errors than previously mentioned since i cover a lesser period (multiple schedulings took place).

source:

  • we send something that was not a git repository.
  • integrity error (which is expected for now)

These two are *shrug*, we can't do much about it (but we should have a list of those repos, for posterity).

  • disk space problem

The other marginal errors sounds like implementation errors.
Those need more investigation.

These should be rescheduled and driven to successful completion.

We have been doing "more investigation" for a very long time now. We really need closure closure here.

Note that we have less errors than previously mentioned since i cover a lesser period (multiple schedulings took place).

"yay" :-)

These should be rescheduled and driven to successful completion.

And now it has been queued (same goes for T675).
Putting aside the errors we cannot do much about (not a git repository, etc...)

We have been doing "more investigation" for a very long time now. We really need closure closure here.

I don't know if investigation was the right word so far.
We rescheduled (as in queued) massively since there have been infrastructure bumps along the way (db problems -> solved since, queue emptied, etc....).

Also, there were some me bump as well (foot accident and all) which delayed analysis :/

We really need closure closure here.

Yes, we do :)

(but we should have a list of those repos, for posterity).

On uffizi, :/srv/storage/space/lists/loader-git-disk/origin-with-errors.gitorious.tar.gz (respectively for googlecode)

For information, the last injection has been done. The remaining errors:

"gitorious": {
  "errors": {
    "KeyError(b'b2tmp_obj_SfIbZp',)": 10,
    "KeyError(b'f2bd56289283900cd1dcf2c52b93952f98e4514": 5,
    "TypeError(\"unsupported operand type(s) for //: 'No": 5,
    "ObjectFormatException(ValueError(\"invalid literal ": 2,
    "ValueError('year is out of range',)": 2,
    "StorageAPIError(ConnectionError(ProtocolError('Con": 1,
    "ObjectFormatException(\"Unknown field b'Thu'\",)": 1,
    "TypeError('ord() expected a character, but string ": 1,
    "KeyError(b'5btmp_obj_D3f6nk',)": 1,
    "IsADirectoryError(21, 'Is a directory')": 1
  },
  "total": 29
}

Note: P170 for more information on error.

Updated and scheduled the last 170 repositories.
Now, remains those to be checked for errors.

Only 2 errors left:

  • 1 about bad transaction in db
  • 1 about unicode error:

Details

{
  "took": 65,
  "timed_out": false,
  "_shards": {
    "total": 30,
    "successful": 30,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "logstash-2017.12.18",
        "_type": "journal",
        "_id": "AWBqQGrgl67zVNRhVsQ-",
        "_score": null,
        "_source": {
          "message": "[2017-12-18 15:31:33,690: ERROR/Worker-1] Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 862, in load
    self.store_data()
  File "/usr/lib/python3/dist-packages/swh/loader/git/base.py", line 164, in store_data
    self.send_batch_revisions(self.get_revisions())
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 647, in send_batch_revisions
    send_in_packets(revisions, self.send_revisions, packet_size)
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 46, in send_in_packets
    sender(formatted_objects)
  File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 416, in send_revisions
    self.storage.revision_add(revision_list)
  File "/usr/lib/python3/dist-packages/swh/storage/api/client.py", line 80, in revision_add
    return self.post('revision/add', {'revisions': revisions})
  File "/usr/lib/python3/dist-packages/swh/core/api.py", line 62, in post
    return self._decode_response(response)
  File "/usr/lib/python3/dist-packages/swh/core/api.py", line 91, in _decode_response
    raise pickle.loads(decode_response(response))
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL:  Process 44058 waits for ShareLock on transaction 1430906690; blocked by process 40853.
Process 40853 waits for ShareLock on transaction 1430906691; blocked by process 44058.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (26995597,21) in relation "revision_pkey"
SQL statement "insert into revision (id, date, date_offset, date_neg_utc_offset, committer_date, committer_date_offset, committer_date_neg_utc_offset, type, directory, message, author, committer, metadata, synthetic)
    select t.id, t.date, t.date_offset, t.date_neg_utc_offset, t.committer_date, t.committer_date_offset, t.committer_date_neg_utc_offset, t.type, t.directory, t.message, a.id, c.id, t.metadata, t.synthetic
    from tmp_revision t
    left join person a on a.fullname = t.author_fullname
    left join person c on c.fullname = t.committer_fullname"
PL/pgSQL function swh_revision_add() line 5 at SQL statement
"
        },
        "sort": [
          1513611093695
        ]
      },
      {
        "_index": "logstash-2017.12.18",
        "_type": "journal",
        "_id": "AWBqRZr-l67zVNRhVxzM",
        "_score": null,
        "_source": {
          "message": "[2017-12-18 15:37:13,969: ERROR/Worker-3] Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 862, in load
    self.store_data()
  File "/usr/lib/python3/dist-packages/swh/loader/git/base.py", line 168, in store_data
    self.send_batch_occurrences(self.get_occurrences())
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 659, in send_batch_occurrences
    send_in_packets(occurrences, self.send_occurrences, packet_size)
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 34, in send_in_packets
    for obj in objects:
  File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 214, in get_occurrences
    for refs, target in self.repo.refs.as_dict().items()
  File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 164, in as_dict
    keys = self.keys(base)
  File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 143, in keys
    return self.allkeys()
  File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 470, in allkeys
    sys.getfilesystemencoding())
UnicodeEncodeError: 'utf-8' codec can't encode character '\udccd' in position 11: surrogates not allowed"
        },
        "sort": [
          1513611433974
        ]
      }
    ]
  }
}

The origins in question:

softwareheritage=> select o.id, o.url, o.type, ov.status from origin o inner join origin_visit ov on ov.origin=o.id where type='git' and url like 'https://gitorious.org%' and ov.visit = (select max(visit) from origin_visit where origin=ov.
origin) and ov.status='partial';
    id    |                             url                             | type | status
----------+-------------------------------------------------------------+------+---------
 58141917 | https://gitorious.org/test-project2009/test-project2009.git | git  | partial
 58227648 | https://gitorious.org/qt/tolszaks-qtsensors.git             | git  | partial
(2 rows)

The sql error was sheer bad luck, tested locally and no problem, so it was rescheduled, loaded successfully.

Now, only 1 origin to go, this is a bug in the underlying brick (dulwich) and T911 is opened for it.

softwareheritage=> select o.id, o.url, o.type, ov.status from origin o inner join origin_visit ov on ov.origin=o.id where type='git' and url like 'https://gitorious.org%' and ov.visit = (select max(visit) from origin_visit where origin=ov.origin) and ov.status='partial';
    id    |                             url                             | type | status
----------+-------------------------------------------------------------+------+---------
 58141917 | https://gitorious.org/test-project2009/test-project2009.git | git  | partial
(1 row)
ardumont claimed this task.

Last origin rescheduled and injected.

No more gitorious origin with status different than 'full'.

softwareheritage=> select o.id, o.url, o.type, ov.status from origin o inner join origin_visit ov on ov.origin=o.id where type='git' and url like 'https://gitorious.org%' and ov.visit = (select max(visit) from origin_visit where origin=ov.origin) and ov.status<>'full';
id | url | type | status
----+-----+------+--------
(0 rows)