Page MenuHomeSoftware Heritage

git origins: latest failure reports
Closed, MigratedEdits Locked

Description

Symmetrically to other loader reports, here is one on the current redundant failures we have on our git origins.
The period covered is from August to October 2018:

$ cat git-output-august-october-2018.txt | python3 -m group_by_exception --loader-type git --no-aggregate  | jq .
{
  "total": 1167689,
  "errors": {
    "FileNotFoundError(2, 'No such file or directory')": 648242,   // <- worker03 had a missing mount point, had the issue with mercurial loader
    "dulwich.errors.GitProtocolError: unexpected http resp 401 for": 403090,
    "psycopg2.extensions.QueryCanceledError: canceling statement due to statement timeout": 36628,
    "502 Bad Gateway": 34300,
    "gaierror(-3, 'Temporary failure in name resolution')": 17493,
    "consumer: Cannot connect to amqp": 6649,
    "pgbouncer cannot connect to server": 4811,
    "StorageAPIError(ConnectionError(MaxRetryError('None: Max retries exceeded with url: ": 3307,
    "Worker exited prematurely: signal 9": 2749,
    "MemoryError": 2164,
    "ValueError: invalid literal for int() with base 16: b''": 1898,
    "swh.objstorage.exc.ObjStorageAPIError: An unexpected error occurred in the api backend:": 1309,
    "OSError: Pack file too big for repository": 978,
    "TypeError: snapshot_add() got an unexpected keyword argument 'back_compat'": 890,
    "ConnectionResetError: [Errno 104] Connection reset by peer": 597,
    "socket.gaierror: [Errno -3] Temporary failure in name resolution": 563,
    "http.client.IncompleteRead:": 509,
    "dulwich.errors.NotGitRepository": 231,
    "BrokenPipeError: [Errno 32] Broken pipe": 208,
    "dulwich.errors.GitProtocolError: unexpected http resp 503": 168,
    "504 Gateway Time-out": 127,
    "dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:": 115,
    "StorageDBError(OperationalError('could not connect to server: Connection refused": 113,
    "psycopg2.IntegrityError: duplicate key value violates unique constraint ": 102,
    "ConnectionResetError(104, 'Connection reset by peer')": 89,
    "psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block": 53,
    "psycopg2.extensions.TransactionRollbackError: deadlock detected": 39,
    "psycopg2.IntegrityError: null value in column ": 38,
    "SSL connection has been closed unexpectedly": 29,
    "socket.gaierror: [Errno -5] No address associated with hostname": 25,
    "psycopg2.DatabaseError: server conn crashed?": 22,
    "ValueError: invalid literal for int() with base 16:": 20,
    "OSError(ConnectionRefusedError(111, 'Connection refused')": 15,
    "ValueError: year is out of range": 14,
    "KeyError": 14,
    "OSError(timeout('timed out',),)": 12,
    "OSError(113, 'No route to host')": 12,
    "dulwich.errors.GitProtocolError: unexpected http resp 500": 11,
    "TimeoutError: [Errno 110] Connection timed out": 8,
    "Worker exited prematurely: signal 15 (SIGTERM).',)": 6,
    "socket.gaierror: [Errno -2] Name or service not known": 5,
    "dulwich.errors.GitProtocolError: unexpected http resp 502 for": 4,
    "dulwich.errors.HangupException: The remote server unexpectedly closed the connection.": 4,
    "TimeoutError(110, 'Connection timed out')": 4,
    "OSError: [Errno 113] No route to host": 3,
    "ValueError: Wrong content type `text/html` for API response": 3,
    "psycopg2.OperationalError: terminating connection due to administrator command": 2,
    "OverflowError: timestamp out of range for platform time_t": 2,
    "amqp.exceptions.NotFound: Basic.publish: (404) NOT_FOUND": 2,
    "CONNECTION_FORCED - broker forced connection closure with reason": 2,
    "OSError(OSError(113, 'No route to host'),)": 2,
    "Timed out waiting for UP message from": 1,
    "psycopg2.OperationalError: index row size": 1,
    "dulwich.errors.ObjectFormatException: Unknown field": 1,
    "dulwich.errors.GitProtocolError: unexpected http resp 429": 1,
    "gaierror(-2, 'Name or service not known')": 1,
    "OSError(5, 'Input/output error')": 1,
    "msgpack.exceptions.UnpackValueError: Unpack failed: error = 0": 1,
    "ValueError: not enough values to unpack": 1
  }
}

As usual, some are issues, some are not (e.g 401)

Associated to the error is the number of occurrences.
That will help to determine the next issue to fix.

This is a meta-task to open and track the associated issue if it is deemed one.

dashboard: http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/22195930-d36e-11e8-913b-077937c6a5ef

Related T1246#23774
Related T1159#23772

Event Timeline

ardumont triaged this task as Normal priority.Oct 19 2018, 11:49 AM
ardumont created this task.
olasd added a subscriber: olasd.

Thanks for the analysis!

These look like bugs in the loader rather than the environment:

  • "StorageAPIError(ConnectionError(MaxRetryError('None: Max retries exceeded with url: ": 3305, (probably already fixed)
  • "psycopg2.extensions.QueryCanceledError: canceling statement due to statement timeout": 2125,
  • "ValueError: invalid literal for int() with base 16: b''": 426, (comes from the HTTP lib)
  • "dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:": 30,
  • "psycopg2.IntegrityError: duplicate key value violates unique constraint ": 17,
  • "ValueError: year is out of range": 7,
  • "psycopg2.IntegrityError: null value in column ": 7,
  • "psycopg2.OperationalError: index row size": 1,
  • "OverflowError: timestamp out of range for platform time_t": 1,
  • "KeyError": 1,
  • "dulwich.errors.ObjectFormatException: Unknown field": 1,

the other ones look like environmental issues

"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664

Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).

  • "FileNotFoundError(2, 'No such file or directory')": 648242,

For the git loader, that error is raised when trying to save the git pack.
It was an infra problem. worker03 had a missing mount point (/srv/storage/space....).
That one got fixed after rescheduling the mercurial loader origins (and see the errors).

All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey" [1] are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered [2]

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/22195930-d36e-11e8-913b-077937c6a5ef?_g=(refreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow-60d%2Cmode%3Aquick%2Cto%3Anow))

[2] https://shattered.io/

Errors of type ValueError: year is out of range [1] are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:

One solution would be to use time.struct_time [2] as an alternative to represent those weird dates as it does not restrict the years range.
But is it worth it regarding the few encountered cases ?

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/22195930-d36e-11e8-913b-077937c6a5ef?_g=(refreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow-60d%2Cmode%3Aquick%2Cto%3Anow))

[2] https://docs.python.org/3.5/library/time.html#time.struct_time

Errors of type dulwich.errors.NotGitRepository [1] are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request [2] has been submitted to fix that issue.

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/e3fbbb60-1d9c-11e9-b8ce-cf95f437ce37?_g=h@a0fef55&_a=h@e10a772
[2] https://github.com/dulwich/dulwich/pull/680

New dashboards with latest errors as of 09/2019 [1]

(old dashboards seem to have disappeared)

[1] http://kibana.internal.softwareheritage.org:5601/app/kibana#/dashboard/AW2BLAA06Tb3WKn3xR0y

To ease the analysis, here is an aggregate of the 09/2019 latest failures:

$ ./kibana_fetch_logs.py > 09-2019-loader-git.txt
$ cat 09-2019-loader-git.txt| python3 -m group_by_exception --loader-type git --no-aggregate  | jq .
{
  "total": 207764,
  "errors": {
    "dulwich.errors.GitProtocolError: unexpected http resp 401 for": 86514,
    "502 Bad Gateway": 46759,
    "pgbouncer cannot connect to server": 24330,
    "ValueError: I/O operation on closed file.": 13706,
    "Unicode-objects must be encoded before hashing": 10619,
    "dulwich.errors.NotGitRepository": 9211,
    "ImportError: cannot import name 'RPCClient'": 5957,
    "bitbucket.org": 5594,
    "swh.storage.HashCollision": 3277,
    "psycopg2.pool.PoolError: connection pool exhausted": 971,
    "MemoryError": 112,
    "SSL connection has been closed unexpectedly": 91,
    "billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).": 90,
    "consumer: Cannot connect to amqp": 70,
    "psycopg2.extensions.QueryCanceledError: canceling statement due to statement timeout": 68,
    "ImportError: No module named 'azure'": 63,
    ".git/info/refs?service=git-upload-pack": 61,
    "OSError: Pack file too big for repository": 60,
    "git.yoctoproject.org": 33,
    "504 Gateway Time-out": 26,
    "psycopg2.OperationalError: terminating connection due to administrator command": 23,
    "dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:": 19,
    "psycopg2.extensions.TransactionRollbackError: deadlock detected": 14,
    "celery.exceptions.WorkerShutdown": 14,
    "ValueError: not enough values to unpack": 10,
    "dulwich.errors.GitProtocolError: unexpected http resp 500": 8,
    "msgpack.exceptions.UnpackValueError: Unpack failed: error = 0": 7,
    "KeyError": 6,
    "AssertionError": 5,
    "dulwich.errors.ObjectFormatException: Unknown field": 4,
    "ValueError: invalid literal for int() with base 16: b''": 4,
    "column date_offset": 4,
    "operand type(s) for": 3,
    "does not match length prefix": 3,
    "Connection aborted": 3,
    "dulwich.errors.GitProtocolError: unexpected http resp 503": 3,
    "psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block": 2,
    "psycopg2.IntegrityError: null value in column ": 2,
    "socket.gaierror: [Errno -2] Name or service not known": 2,
    "dulwich.errors.GitProtocolError: unexpected http resp 502 for": 2,
    "dulwich.errors.HangupException: The remote server unexpectedly closed the connection.": 2,
    "swh.storage.exc.StorageAPIError: An unexpected error occurred in the api backend": 2,
    "Timed out waiting for UP message from": 2,
    "Not a gzipped file": 1,
    "NameError: name 'origin' is not defined": 1,
    "ce buffer size: 18446744071562067968 vs 2147483648": 1,
    "ce buffer size: 18446744072796632783 vs 3382048463": 1,
    "OSError: [Errno 113] No route to host": 1,
    "tension-visualeditor-mediawiki.git/git-upload-pack": 1,
    "ConnectionResetError: [Errno 104] Connection reset by peer": 1,
    "heritage/objects/2d/6f/b5/hex_obj_id.3af7qqmc.tmp'": 1
  }
}
`

To use in conjunction with the dashboard to find corresponding urls [1]

[1] http://kibana.internal.softwareheritage.org:5601/app/kibana#/dashboard/AW2BLAA06Tb3WKn3xR0y

Related P320
Related P536

olasd removed olasd as the assignee of this task.Sep 11 2020, 2:25 PM