Page MenuHomeSoftware Heritage

indexer/extrinsic_metadata: ParseError: syntax error: line 1, column 0
Closed, MigratedEdits Locked

Description

https://sentry.softwareheritage.org/organizations/swh/issues/104852/?referrer=phabricator_plugin

ParseError: syntax error: line 1, column 0
(12 additional frame(s) were not displayed)
...
  File "swh/indexer/cli.py", line 373, in worker_fn
    fn(objects)
  File "swh/indexer/metadata.py", line 85, in process_journal_objects
    results[remd.target] = self.index(remd.id, data=remd)
  File "swh/indexer/metadata.py", line 127, in index
    metadata_item = mapping.translate(data.metadata)
  File "swh/indexer/metadata_dictionary/codemeta.py", line 111, in translate
    root = ET.fromstring(content)
  File "xml/etree/ElementTree.py", line 1345, in XML
    return parser.close()

Event Timeline

ardumont triaged this task as Normal priority.Oct 21 2022, 6:13 PM

This happens on staging from that time we accidentally created N one-byte documents instead of one document with N bytes.

I'll write a script to purge those from the journal and storage

vlorentz added a subscriber: olasd.

@olasd did it :)