Page MenuHomeSoftware Heritage

[HAL] Fix published zip in zip
Open, NormalPublic

Description

HAL publishes zip in zip content while we don't accept it
https://hal.inria.fr/hal-02355563v1

Event Timeline

moranegg triaged this task as Normal priority.Nov 8 2019, 12:30 PM
moranegg created this task.
moranegg created this object in space Restricted Space.
moranegg shifted this object from the Restricted Space space to the S1 Public space.

I'm not completely sure I get this ticket; I understand that HAL may (always?) produce invalid zip files when it uploads a deposit in SWH. So my questions are:

  • is this bug fixed on the HAL side?
  • should we just check such a deposit as invalid (thus refuse it)? or should we implement a slightly smarter approach where we identify the case of an archive file with a single archive file in it and deal with it (recursively unzip single archive in archive case)?
  • is there some cleanup to do in the swh archive or the deposit (as the title suggests)?

The deposit is visible and accessible on HAL without a link to SWH (because we rejected it).

  1. this ticket is open on HAL side as well, but I don't know at what priority level.
  2. we do check and than we reject it (this is why there is no link)
  3. I tagged it with [HAL] because I assume we want to keep the rejection but we need to make sure it is fixed

I can change the title to Make sure this is fixed

I am still not sure how to understand this.

I mean "this" in "make sure this is fixed" is not clear (also no need to retitle).
Does "this" refer to "published" in "published zip in zip", or does it refer to the "zip in zip" part?

I.E. how exactly do we want to deal this situation when a deposit user send an archive (zip) file which content is a single archive (zip) file?

You write "we do check and th[e]n we reject", so what step should be fixed here? I understand "we do check" as "we check and the check is approved" (during the check-deposit step handled asynchronously by a first worker task).
Then I understand 'we reject' as "the loader-deposit task fails".

Is is correct?

If so, how do we want to fix this? Improving the first step (check-deposit) so it does not stamp the zip in a zip case as valid? Or do we improve the second step (load-deposit) so it handles properly the zip in a zip case?