Page MenuHomeSoftware Heritage

swh-indexer produces dates not supported by swh-search/ElasticSearch
Closed, MigratedEdits Locked

Description

eg. 2022-12-2

https://sentry.softwareheritage.org/organizations/swh/issues/104824/?referrer=phabricator_plugin

BulkIndexError: ('8 document(s) failed to index.', [{'update': {'_index': 'origin-v0.11', '_type': '_doc', '_id': '155291d5b9ada4570672510509f93fcfd9809882', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': "failed to parse field [jsonld.http://schema.org/dateModified.@value] of type [date] in document with id '155291d5b9ada4570672510509f93fcfd9809882'. Preview of field's value: '2020-12-2'", 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'failed to parse date field [2020-12-2] with ...
(5 additional frame(s) were not displayed)
...
  File "swh/search/metrics.py", line 21, in d
    return f(*a, **kw)
  File "swh/search/elasticsearch.py", line 382, in origin_update
    indexed_count, errors = helpers.bulk(self._backend, actions, index=write_index)
  File "elasticsearch/helpers/actions.py", line 300, in bulk
    for ok, item in streaming_bulk(client, actions, *args, **kwargs):
  File "elasticsearch/helpers/actions.py", line 230, in streaming_bulk
    **kwargs
  File "elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk
    raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)

Event Timeline

vlorentz edited projects, added Archive search; removed Restricted Project.