To support the use case of scientists willing to document the software relevant for a given paper, we want to have a manifest format capable of fully describing some software archived by Software Heritage.
In its essence, a manifest should point to Software Heritage objects using the URI scheme of T335.
But then, different level of details are possible:
- (minimum detail) just point to a manifest file archived somewhere (possibly on SWH), using a manifest ID
- point to a directory
- fully describe the directory content, with a pair <pathname, content id> for each file
- (maximum detail) as above + a revision history, describing each revision in full ← this will be crazy-large for long histories
In addition to the above, various kinds of metadata could be added:
- SWH-specific metadata: when and where the archived code has been found
- user-provided metadata, submitted to SWH at the time of ingestion request (e.g., Dublin Core, paper references, etc.)