Page MenuHomeSoftware Heritage

Error on date parsing on the deposit
Closed, MigratedEdits Locked

Description

It seems there is some error in the parsing of some dates on the deposit:

2020-12-20 19:02:44 [1812383] django.request:ERROR Internal Server Error: /1/private/1290/meta/
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3/dist-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/rest_framework/views.py", line 495, in dispatch
    response = self.handle_exception(exc)
  File "/usr/lib/python3/dist-packages/rest_framework/views.py", line 455, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/lib/python3/dist-packages/rest_framework/views.py", line 492, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/deposit/api/private/__init__.py", line 84, in get
    return super().get(request, collection_name, deposit_id)
  File "/usr/lib/python3/dist-packages/swh/deposit/api/common.py", line 1010, in get
    r = self.process_get(request, collection_name, deposit)
  File "/usr/lib/python3/dist-packages/swh/deposit/api/private/deposit_read.py", line 198, in process_get
    data = self.metadata_read(deposit)
  File "/usr/lib/python3/dist-packages/swh/deposit/api/private/deposit_read.py", line 161, in metadata_read
    author_date, commit_date = self._normalize_dates(deposit, metadata)
  File "/usr/lib/python3/dist-packages/swh/deposit/api/private/deposit_read.py", line 132, in _normalize_dates
    return (normalize_date(author_date), normalize_date(commit_date))
  File "/usr/lib/python3/dist-packages/swh/deposit/utils.py", line 118, in normalize_date
    date = iso8601.parse_date(date)
  File "/usr/lib/python3/dist-packages/iso8601/iso8601.py", line 190, in parse_date
    raise ParseError("Unable to parse date string %r" % datestring)
iso8601.iso8601.ParseError: Unable to parse date string '2014–09–16'

The deposits remain in the verified status:

Full error:
http://kibana0.internal.softwareheritage.org:5601/app/kibana#/discover/doc/3f8dbf80-18cc-11e9-b8ce-cf95f437ce37/systemlogs-2020.12.20?id=WjyIgXYBAwcZKgjfWkPY

It seems sentry doesn't catch this problem

Event Timeline

vsellier triaged this task as Normal priority.Dec 21 2020, 10:28 AM
vsellier created this task.
vsellier raised the priority of this task from Normal to High.Dec 21 2020, 10:32 AM

Changing to high priority (@ardumont recommandation)

In case it's not clear, the issue is because this deposit is using the wrong type of dashes in the date (0x2013 instead of 0x2d)

vlorentz renamed this task from Error on data parsing on the deposit to Error on date parsing on the deposit.Dec 21 2020, 12:36 PM

@vlorentz @vsellier You should notify the ipol people.

At least technically, I'd say revert the verified status of the deposits concerned to deposited and schedule back a check [1]

With the new check introduced in D4773, those deposits will be marked as rejected with the new 'invalid date' reason.
And then ipol people will know.

[1] https://docs.softwareheritage.org/devel/swh-deposit/sys-info.html#reschedule-a-deposit
https://docs.softwareheritage.org/devel/swh-deposit/sys-info.html#environment-production