Page MenuHomeSoftware Heritage

replayer: Crashes because of directories with duplicated entry names in journal
Closed, MigratedEdits Locked

Description

https://sentry.softwareheritage.org/organizations/swh/issues/104660/?referrer=phabricator_plugin

ValueError: swh:1:dir:96f959bba77e02a9161462a07a4e0f10c7fbfa3a has duplicated entry name: b'1'
(13 additional frame(s) were not displayed)
...
  File "swh/storage/replay.py", line 118, in convert
    obj = OBJECT_CONVERTERS[object_type](dict_repr)
  File "swh/model/model.py", line 1194, in from_dict
    return cls(
  File "<attrs generated init swh.model.model.Directory>", line 6, in __init__
    """
  File "attr/_make.py", line 2946, in __call__
    v(inst, attr, value)
  File "swh/model/model.py", line 1185, in check_entries
    raise ValueError(

Related Objects

StatusAssignedTask
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration

Event Timeline

vlorentz triaged this task as Normal priority.
vlorentz added a subscriber: vlorentz.

New objects with duplicated entries go through this method to be fixed: https://docs.softwareheritage.org/devel/apidoc/swh.model.model.html#swh.model.model.Directory.from_possibly_duplicated_entries (currently only used by swh.storage.backfill); but old kafka messages still have duplicate entries, causing the crash above.