Page MenuHomeSoftware Heritage

deposit: Keep raw metadata received
Closed, ResolvedPublic

Description

As explained in T1144#21336, it would simplify post-analysis on problems if we keep all metadata information received from client.

Event Timeline

ardumont triaged this task as Normal priority.Jul 19 2018, 12:04 PM
ardumont created this task.

Good idea!

Similarly to what we do for the loaders (e.g., with the Git pack files), we should just keep everything (metadata + tarball) received from a deposit in raw format somewhere, to allow further re-processing. So I think this issue should not only be about keeping raw metadata, but rather the entire (raw) deposit.

Similarly to what we do for the loaders (e.g., with the Git pack files), we should just keep everything (metadata + tarball) received from a deposit in raw format somewhere, to allow further re-processing. So I think this issue should not only be about keeping raw metadata, but rather the entire (raw) deposit.

For the raw archive(s) it's already the case, as we do not know yet when the actual deposit will be done.
As a deposit could stay 'partial' for a long time (meaning, not done, user is able to update the deposit), it was kept (only from status 'deposited' onward, are the archive(s) and metadata used).

We just did not do it for the metadata.

I don't remember the reason(s) that made us inconsistent there. Probably a missing reasoning step.