Page MenuHomeSoftware Heritage

Revisions in the journal with out of range dates
Closed, MigratedEdits Locked

Description

There are a few revisions in the journal with dates that are out of range for a Python datetime:

swh:1:rev:7b0cbe27c852efa8a40d76d4e5e95452df3dd17a -> seconds=1472213758000
swh:1:rev:fb21c344c6d2b875a50576dcf1991df0b05f62cd -> seconds=9223372036854775
swh:1:rev:23c3584bd6030c4f8b9ad9ce5419d218990b75a5 -> seconds=1472212421000
swh:1:rev:0000009707af34dadb4278f8321a112373f73d99 -> seconds=1390465279142
swh:1:rev:03e039cc75bdf4fe0ba50330ccb2fbad27135d5e -> seconds=9223372036854775807

We need to:

  1. remove them from the journal
  2. potentially add something in the journal object fixer to pass them when we see them for compatibility purposes
  3. investigate consistency across the different databases (are these objects present in some but not all databases?)
  4. make sure that these are rejected in the future and can never be added to the journal again

Here are the full objects I found:

ERROR:root:Exporter ORCExporter: error while exporting the object: {'message': b'Another future commit.\n', 'author': {'fullname': b'\xd0(\x0b\xbfhA\x0b\x9b\\\xaa8^\xadO\x9f\x1d\x8e\x82\x02j\x8c\xd7Lw\xa4\xc2\x9ay:\xf8bR', 'name': None, 'email': None}, 'committer': {'fullname': b'\xd0(\x0b\xbfhA\x0b\x9b\\\xaa8^\xadO\x9f\x1d\x8e\x82\x02j\x8c\xd7Lw\xa4\xc2\x9ay:\xf8bR', 'name': None, 'email': None}, 'date': {'timestamp': {'seconds': 1472213758000, 'microseconds': 0}, 'offset': -420, 'negative_utc': False}, 'committer_date': {'timestamp': {'seconds': 1472213758000, 'microseconds': 0}, 'offset': -420, 'negative_utc': False}, 'type': 'git', 'directory': b'\xfb\xf9\xbeq\xab\x97b\xbd\xf4\xd5\x95\x9f\xf9\x159S\x8a\xb5\x06\x06', 'synthetic': False, 'metadata': None, 'parents': [b'#\xc3XK\xd6\x03\x0cO\x8b\x9a\xd9\xceT\x19\xd2\x18\x99\x0bu\xa5'], 'id': b"{\x0c\xbe'\xc8R\xef\xa8\xa4\rv\xd4\xe5\xe9TR\xdf=\xd1z", 'extra_headers': []}

ERROR:root:Exporter ORCExporter: error while exporting the object: {'message': b'future\n', 'author': {'fullname': b'\x1f\xfc\x04\xb1\xe7\xd7\xec\x00~\xa8\xa8\xcc\xa6\x0f\xb0gptP\x8b>\x19E!\xf1U/\x1c\t\xb6\xbdN', 'name': None, 'email': None}, 'committer': {'fullname': b'\x1f\xfc\x04\xb1\xe7\xd7\xec\x00~\xa8\xa8\xcc\xa6\x0f\xb0gptP\x8b>\x19E!\xf1U/\x1c\t\xb6\xbdN', 'name': None, 'email': None}, 'date': {'timestamp': {'seconds': 9223372036854775, 'microseconds': 0}, 'offset': 0, 'negative_utc': False}, 'committer_date': {'timestamp': {'seconds': 9223372036854775, 'microseconds': 0}, 'offset': 0, 'negative_utc': False}, 'type': 'git', 'directory': b'\xe4q\x8e\xce\xc4\xd8\x90d\xea=\xf1K\xe71\x11e\xb1\xfdY\x02', 'synthetic': False, 'metadata': None, 'parents': [b'\xd3\xc2E\xd3\xf4\xef`\xf3\xe44=K\xf9\xa3\xafs-\x81\xa1_'], 'id': b'\xfb!\xc3D\xc6\xd2\xb8u\xa5\x05v\xdc\xf1\x99\x1d\xf0\xb0_b\xcd', 'extra_headers': []}

ERROR:root:Exporter ORCExporter: error while exporting the object: {'message': b'Time travel! Spooky!\n', 'author': {'fullname': b'\xd0(\x0b\xbfhA\x0b\x9b\\\xaa8^\xadO\x9f\x1d\x8e\x82\x02j\x8c\xd7Lw\xa4\xc2\x9ay:\xf8bR', 'name': None, 'email': None}, 'committer': {'fullname': b'\xd0(\x0b\xbfhA\x0b\x9b\\\xaa8^\xadO\x9f\x1d\x8e\x82\x02j\x8c\xd7Lw\xa4\xc2\x9ay:\xf8bR', 'name': None, 'email': None}, 'date': {'timestamp': {'seconds': 1472212416000, 'microseconds': 0}, 'offset': -420, 'negative_utc': False}, 'committer_date': {'timestamp': {'seconds': 1472212421000, 'microseconds': 0}, 'offset': -420, 'negative_utc': False}, 'type': 'git', 'directory': b'\x93<\xa5\xc0\x97\x9f\xb6PuK\xb1\\?*\xf0\xb9\x88\xb6\xea\xbe', 'synthetic': False, 'metadata': None, 'parents': [b'K\xde\xfe\x8d\x8f\x95\x00\x94\xf1r\xc0|\xcf\rM<V\xf0?\xd6'], 'id': b'#\xc3XK\xd6\x03\x0cO\x8b\x9a\xd9\xceT\x19\xd2\x18\x99\x0bu\xa5', 'extra_headers': []}

ERROR:root:Exporter ORCExporter: error while exporting the object: {'message': b'Mined a Gitcoin!\nnonce 00092b0a', 'author': {'fullname': b'Q\xaa\xce\xdf\xc5\xb0r\x17\x8a\xbf\xce\x15\x97sh,\xc7:\x0c_\x85#\xb2(i\xab<\xf0\x02\xb9P\xa8', 'name': None, 'email': None}, 'committer': {'fullname': b'Q\xaa\xce\xdf\xc5\xb0r\x17\x8a\xbf\xce\x15\x97sh,\xc7:\x0c_\x85#\xb2(i\xab<\xf0\x02\xb9P\xa8', 'name': None, 'email': None}, 'date': {'timestamp': {'seconds': 1390465279142, 'microseconds': 0}, 'offset': 0, 'negative_utc': False}, 'committer_date': {'timestamp': {'seconds': 1390465279142, 'microseconds': 0}, 'offset': 0, 'negative_utc': False}, 'type': 'git', 'directory': b'\xee\x19\xe1\x85=F\xf45\xd0k\xb2\xcc\xc1\x9a\xfc>\xf5\xccG\xc9', 'synthetic': False, 'metadata': None, 'parents': [b'\x00\x00\x00@\xba\xb4\xb99B\x01\xf4\xa2k\x801\x93\xfd=\xfdV'], 'id': b'\x00\x00\x00\x97\x07\xaf4\xda\xdbBx\xf82\x1a\x11#s\xf7=\x99', 'extra_headers': []}

ERROR:root:Exporter ORCExporter: error while exporting the object: {'message': b'future-max\n', 'author': {'fullname': b'\x1f\xfc\x04\xb1\xe7\xd7\xec\x00~\xa8\xa8\xcc\xa6\x0f\xb0gptP\x8b>\x19E!\xf1U/\x1c\t\xb6\xbdN', 'name': None, 'email': None}, 'committer': {'fullname': b'\x1f\xfc\x04\xb1\xe7\xd7\xec\x00~\xa8\xa8\xcc\xa6\x0f\xb0gptP\x8b>\x19E!\xf1U/\x1c\t\xb6\xbdN', 'name': None, 'email': None}, 'date': {'timestamp': {'seconds': 9223372036854775807, 'microseconds': 0}, 'offset': 0, 'negative_utc': False}, 'committer_date': {'timestamp': {'seconds': 9223372036854775807, 'microseconds': 0}, 'offset': 0, 'negative_utc': False}, 'type': 'git', 'directory': b'\xf7\xa9~\x13\xe1\xf3\xe1S\x8b\x1a\xd0\x06\xb4\rtV\x8f\xc7w\x1e', 'synthetic': False, 'metadata': None, 'parents': [b"Z\xc1\x1f\r\x86\x19\xbay\x81x'\t\xbe{\x1fzf@\x95N"], 'id': b"\x03\xe09\xccu\xbd\xf4\xfe\x0b\xa5\x030\xcc\xb2\xfb\xad'\x13]^", 'extra_headers': []}

Unfortunately I don't know their partition ID.

Event Timeline

seirl triaged this task as Normal priority.Mar 24 2021, 1:13 PM
seirl created this task.
seirl updated the task description. (Show Details)
seirl updated the task description. (Show Details)

Note that none of their parent revisions can be found either in the archive (one invalid revision in a set of ingested revisions prevent any of them being inserted in the database I suppose, but they are already inserted in kafka at this moment).

I wonder if we should not have a MissingRevision or InvalidRevision object in the model to handle cases like that, to keep a trace that we already know these revision objects but are unable of ingesting them in our data model...