Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9123927
D5247.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
3 KB
Subscribers
None
D5247.diff
View Options
diff --git a/docs/extrinsic-metadata-specification.rst b/docs/extrinsic-metadata-specification.rst
--- a/docs/extrinsic-metadata-specification.rst
+++ b/docs/extrinsic-metadata-specification.rst
@@ -136,7 +136,7 @@
The authority and fetcher must be known to the storage before using this
endpoint.
`format` is a text field indicating the format of the content of the
- `metadata` byte string.
+ `metadata` byte string, see `extrinsic-metadata-formats`_.
* Getting latest metadata::
@@ -259,3 +259,87 @@
In all cases, ``visit`` should only be provided if ``origin`` is
(as visit ids are only unique with respect to an origin).
+
+
+.. _extrinsic-metadata-formats:
+
+Extrinsic metadata format
+-------------------------
+
+Here is a list of all the metadata format stored:
+
+``pypi-project-json``
+ The metadata is a release entry from a PyPI project's
+ JSON file, extracted and re-serialized.
+``replicate-npm-package-json``
+ ditto, but from a replicate.npmjs.com project
+``nixguix-sources-json``
+ ditto, but from https://nix-community.github.io/nixpkgs-swh/
+``original-artifacts-json``
+ tarball data, see below
+``sword-v2-atom-codemeta``
+ XML Atom document, with Codemeta metadata,
+ as sent by a deposit client, see the
+ :ref:`Deposit protocol reference <deposit-protocol>`.
+``sword-v2-atom-codemeta-v2``
+ Deprecated alias of ``sword-v2-atom-codemeta``
+``sword-v2-atom-codemeta-v2-in-json``
+ Deprecated, JSON serialization of a ``sword-v2-atom-codemeta`` document.
+``xml-deposit-info``
+ Information about a deposit, to identify the provenance of
+ a metadata object sent via swh-deposit, see below
+
+Details on some of these formats:
+
+
+original-artifacts-json
+^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a loosely defined format, originally used as a ``metadata`` column
+on the ``revision`` table that changed over the years.
+
+It is a JSON array, and each entry is a JSON object representing an archive
+(tarball, zipball, ...) that was unpackaged by the SWH loader
+before loading its content in Software Heritage.
+
+When writing this specification, it was stabilized to this format::
+
+ [
+ {
+ "length": <int>,
+ "filename": "<original filename>",
+ "checksums": {
+ "sha1": "<hex-encoded string>",
+ "sha256": "<hex-encoded string>",
+ },
+ "url": "<URL the archive was downloaded from>"
+ },
+ ...
+ ]
+
+Older ``original-artifacts-json`` were migrated to use this format,
+but may be missing some of the keys.
+
+
+xml-deposit-info
+^^^^^^^^^^^^^^^^
+
+Deposits with code objects are loaded as their own origin, so we can
+look them up in the deposit database from their metadata (which hold the
+origin as a context).
+
+This is not true for metadata-only deposits, because we don't create an
+origin for them; so we need to store this information somewhere.
+The naive solution would be to insert them in the Atom entry provided by
+the client, but it means altering a document before we archive it, which
+potentially corrupts it or loses part of the data.
+
+Therefore, on each metadata-only deposit, the deposit creates an extra
+"metametadata" object, with the original metadata object as target,
+and using this format::
+
+ <deposit xmlns="https://www.softwareheritage.org/schema/2018/deposit">
+ <deposit_id>{{ deposit.id }}</deposit_id>
+ <deposit_client>{{ deposit.client.provider_url }}</deposit_client>
+ <deposit_collection>{{ deposit.collection.name }}</deposit_collection>
+ </deposit>
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Fri, Jun 20, 6:23 PM (2 w, 1 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3224829
Attached To
D5247: Document the existing metadata formats
Event Timeline
Log In to Comment