After the session with Bruno this week, we saw that multiple request of the same deposit that are waiting for the workers create a corner case where each is treated as a different deposit and each is loaded into the archive separately. For example this deposit -https://archive.softwareheritage.org/browse/origin/https://hal.archives-ouvertes.fr/hal-01862659/visits/ with 9 visits but not related through the parent history.
Procedure:
- if external id exists
- if md5 identical
- calculate metadata hash
- if metadata hash identical
- return 400 //we have already received this deposit
- mark deposit with last identical external-id as parent-id
- if parent is 'rejected' status iterate until last non-rejected parent
- if md5 identical
- return 201 with new deposit-id
Comment: when parent is not in status 'done' the deposit can't be loaded