Page MenuHomeSoftware Heritage

draft specs for deposit with incomplete tarball
Closed, MigratedEdits Locked

Description

enable the deposit of a tarball with empty paths that will be defined using the associated metadata

Event Timeline

Goal: deposit a tarball for which part or all the content is already in the SWH archive
the paths to the missing directories/content must be provided as empty paths in the tarball
the list linking each path to the object in the archive will be provided with the metadata

./path/to/file.txtswh:1:cnt:aaaaaaaaaaaaaaaaaaaaa...
./path/to/dir/swh: 1:dir:aaaaaaaaaaaaaaaaaaaaa...

Note: the name of the file or the directory is given by the path and is not part of the identified object.

Checks on deposit:

  1. the paths in the metadata are explicit in the tarball
  2. the path name corresponds to the object type
  3. the paths in the tarball are empty
  4. the identifiers exists in the SWH archive

Load the data from the deposit:

  • load the existing data
  • create links from the path to the SWH object through the identifier
  • calculate identifier of the new objects

this is the direction for the swiss-deposit, the description above will be included in docs/specs/half-deposit.rst
and docs/specs/meta-deposit.rst (deposits with only metadata)


Board discussion with zack and ardumont

great, thanks!

I'll be AFK for a while, so I can't check the diff, but if you (@moranegg ) can point me to the current version (on docs.s.o?, if it's deployed), I'll be happy to have a look before it's implemented

It is not yet deployed on the docs web pages, but I'll put a link here when it is.
Also I'm planing some modifications on the swh xml schema for the swh-id properties (mentioned in T1152).

Regarding implementation, no plans of implementing it are on the horizon, it is something to consider for the priority/yearly planning.
I can also open a review documentation subtask.

Regarding implementation, no plans of implementing it are on the horizon, it is something to consider for the priority/yearly planning.
I can also open a review documentation subtask.

Oh, ok, thanks for clarifying. Yes, dedicated review task assigned to me about this would be welcome (once you've a pointer to the spec that you think it's ready for review).
We will indeed prioritize later but, as an advanced sneak peek, we will want to have a working prototype of this by 2018, end of the year (for the CCS deposit use case).