Page MenuHomeSoftware Heritage

Indexer - Retrieval error when contents is too big
Open, NormalPublic


When the contents size is high, supposedly more than the 100*1024*1024 bytes limit (limit imposed on our loaders), the objstorage retrieval fails:

Oct 10 12:49:26 python3[15204]: [2017-10-10 12:49:26,600: INFO/Worker-1] sha1: b'\r5~\xe6\xb9\r\x86\nz\xb1\xa7S\x04\x03\xb3+\xbc\x97\x7f`'
Oct 10 12:51:06 python3[15204]: [2017-10-10 12:51:06,094: ERROR/Worker-1] Problem when reading contents metadata.
                                                      Traceback (most recent call last):
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/", line 216, in run
                                                          raw_content = self.objstorage.get(sha1)
                                                        File "/usr/lib/python3/dist-packages/swh/objstorage/multiplexer/", line 134, in get
                                                          return storage.get(obj_id)
                                                        File "/usr/lib/python3/dist-packages/swh/objstorage/multiplexer/filter/", line 69, in get
                                                          return, *args, **kwargs)
                                                        File "/usr/lib/python3/dist-packages/swh/objstorage/multiplexer/", line 134, in get
                                                          return storage.get(obj_id)
                                                        File "/usr/lib/python3/dist-packages/swh/objstorage/multiplexer/filter/", line 56, in get
                                                          return*args, obj_id=obj_id, **kwargs)
                                                        File "/usr/lib/python3/dist-packages/swh/objstorage/cloud/", line 105, in get
                                                          return gzip.decompress(blob.content)
                                                        File "/usr/lib/python3.4/", line 632, in decompress
                                                        File "/usr/lib/python3.4/", line 360, in read
                                                          while self._read(readsize):
                                                        File "/usr/lib/python3.4/", line 454, in _read
                                                          self._add_read_data( uncompress )
                                                        File "/usr/lib/python3.4/", line 472, in _add_read_data
                                                          self.extrabuf = self.extrabuf[offset:] + data
Oct 10 12:51:06 python3[15204]: [2017-10-10 12:51:06,099: WARNING/Worker-1] Rescheduling batch

Here, the hash b'\r5~\xe6\xb9\r\x86\nz\xb1\xa7S\x04\x03\xb3+\xbc\x97\x7f`' is the one failing:

Converting its hash to be readable:

$ python3
>>> h = b'\r5~\xe6\xb9\r\x86\nz\xb1\xa7S\x04\x03\xb3+\xbc\x97\x7f`'
>>> from swh.model import hashutil
>>> hashutil.hash_to_hex(h)

Checking its length in the storage, we see that's indeed a quite huge file:


Event Timeline

ardumont created this task.Oct 10 2017, 3:04 PM