Changeset View
Standalone View
docs/extrinsic-metadata-specification.rst
Show All 26 Lines | |||||
Metadata authorities are entities that provide metadata about an | Metadata authorities are entities that provide metadata about an | ||||
:term:`origin`. Metadata authorities include: code hosting places, | :term:`origin`. Metadata authorities include: code hosting places, | ||||
:term:`deposit` submitters, and registries (eg. Wikidata). | :term:`deposit` submitters, and registries (eg. Wikidata). | ||||
An authority is uniquely defined by these properties: | An authority is uniquely defined by these properties: | ||||
* its type, representing the kind of authority, which is one of these values: | * its type, representing the kind of authority, which is one of these values: | ||||
* `deposit`, for metadata pushed to Software Heritage at the same time | * `deposit`, for metadata pushed to Software Heritage at the same time | ||||
as a software artifact | as a software artifact | ||||
* `forge`, for metadata pulled from the same source as the one hosting | * `forge`, for metadata pulled from the same source as the one hosting | ||||
the software artifacts (which includes package managers) | the software artifacts (which includes package managers) | ||||
* `registry`, for metadata pulled from a third-party | * `registry`, for metadata pulled from a third-party | ||||
* its URL, which unambiguously identifies an instance of the authority type. | * its URL, which unambiguously identifies an instance of the authority type. | ||||
Examples: | Examples: | ||||
=============== ================================= | =============== ================================= | ||||
type url | type url | ||||
=============== ================================= | =============== ================================= | ||||
deposit https://hal.archives-ouvertes.fr/ | deposit https://hal.archives-ouvertes.fr/ | ||||
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines | * Getting latest metadata:: | ||||
origin_metadata_get_latest(origin_url, authority) | origin_metadata_get_latest(origin_url, authority) | ||||
where `authority` must be a dict containing keys `type` and `url`, | where `authority` must be a dict containing keys `type` and `url`, | ||||
which returns a dictionary corresponding to the latest metadata entry | which returns a dictionary corresponding to the latest metadata entry | ||||
added from this origin, in the format:: | added from this origin, in the format:: | ||||
{ | { | ||||
'origin_url': ..., | |||||
'authority': {'type': ..., 'url': ...}, | 'authority': {'type': ..., 'url': ...}, | ||||
'fetcher': {'name': ..., 'version': ...}, | 'fetcher': {'name': ..., 'version': ...}, | ||||
'discovery_date': ..., | 'discovery_date': ..., | ||||
'format': '...', | 'format': '...', | ||||
'metadata': b'...' | 'metadata': b'...' | ||||
} | } | ||||
Show All 20 Lines | |||||
The parameters ``after`` and ``limit`` are used for pagination based on the | The parameters ``after`` and ``limit`` are used for pagination based on the | ||||
order defined by the ``discovery_date``. | order defined by the ``discovery_date``. | ||||
``metadata`` is a bytes array (eventually encoded using Base64). | ``metadata`` is a bytes array (eventually encoded using Base64). | ||||
Its format is specific to each authority; and is treated as an opaque value | Its format is specific to each authority; and is treated as an opaque value | ||||
by the storage. | by the storage. | ||||
Unifying these various formats into a common language is outside the scope | Unifying these various formats into a common language is outside the scope | ||||
of this specification. | of this specification. | ||||
moranegg: typo a URL instead of an URL | |||||
Done Inline ActionsI would prefer saying that:
and add the note:
moranegg: I would prefer saying that:
> extrinsic metadata can be given on a specific artifact within a… | |||||
Done Inline ActionsI agree that we want to be able to store metadata that does not depend on a context. Using an empty context field, or a special distinguished value for designating "all contexts" fills the bill. rdicosmo: I agree that we want to be able to store metadata that does not depend on a context. Using an… | |||||
Done Inline ActionsI don't understand that.. which authorship? of the file? I would change this example because it opens question on authorship that are not in the scope of this documentation. If we have a HAL deposit on an artifact developed in GitHub, the deposit metadata is enhancing the information we might have on the repository (it might not be in contradiction to the one on GitHub). So I would say,
moranegg: I don't understand that.. which authorship? of the file?
so you can't add authors metadata on a… | |||||
Done Inline ActionsAgree with this clarification of the text. This is why it is important to qualify the metadata with the complete context for which it is inteded. rdicosmo: Agree with this clarification of the text.
Small improvement:
//This is why it is important to… | |||||
Done Inline ActionsI think that Therefore is not needed. moranegg: I think that `Therefore` is not needed. | |||||
Done Inline Actionsdo we get all metadata on the identified artifact, if the context is empty? moranegg: do we get all metadata on the identified artifact, if the context is empty?
If yes, this is… | |||||
Done Inline ActionsI agree it is important to be able to formulate a query that allows to request the metadata available for all contexts. rdicosmo: I agree it is important to be able to formulate a query that allows to request the metadata… | |||||
Done Inline ActionsSorry, that's a typo. I didn't mean to add this context argument. vlorentz: Sorry, that's a typo. I didn't mean to add this `context` argument. | |||||
Not Done Inline ActionsFor the snapshot key in the dictionary, use the snapshot core SWHID, not the visit integer (that is an internal implementation detail) rdicosmo: For the snapshot key in the dictionary, use the `snapshot core SWHID`, not the `visit integer`… | |||||
Done Inline ActionsWhat do you mean? Make snapshots have a snapshot context? vlorentz: What do you mean? Make snapshots have a snapshot context? | |||||
Done Inline ActionsHere also, we do not want to commit to using sha1_git, as in the near future we might need to store metadata for SWHIDs that use different hashes. Let's use "(a release SWHID)", "(a revision SWHID)" etc. rdicosmo: Here also, we do not want to commit to using `sha1_git`, as in the near future we might need to… | |||||
Done Inline ActionsSWHID of a directory ardumont: SWHID of a directory | |||||
Done Inline ActionsIt seems that the XXXX in "(the core SWHID of a XXXX)" are off by one in lines 228, 230, 232: in 228 XXXX should be snapshot, not release, etc. rdicosmo: It seems that the XXXX in "(the core SWHID of a XXXX)" are off by one in lines 228, 230, 232… | |||||
Not Done Inline Actionstypo: an URL should be a URL moranegg: typo: an URL should be a URL |
typo a URL instead of an URL