Changeset View
Standalone View
docs/extrinsic-metadata-specification.rst
Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines | |||||
The parameters ``after`` and ``limit`` are used for pagination based on the | The parameters ``after`` and ``limit`` are used for pagination based on the | ||||
order defined by the ``discovery_date``. | order defined by the ``discovery_date``. | ||||
``metadata`` is a bytes array (eventually encoded using Base64). | ``metadata`` is a bytes array (eventually encoded using Base64). | ||||
Its format is specific to each authority; and is treated as an opaque value | Its format is specific to each authority; and is treated as an opaque value | ||||
by the storage. | by the storage. | ||||
Unifying these various formats into a common language is outside the scope | Unifying these various formats into a common language is outside the scope | ||||
of this specification. | of this specification. | ||||
Artifact metadata | |||||
^^^^^^^^^^^^^^^^^ | |||||
In addition to origin metadata, the storage database stores metadata on | |||||
all software artifacts supported by the data model. | |||||
This works similarly to origin metadata, with one major difference: | |||||
extrinsic metadata can be given on a specific artifact within a specified | |||||
moranegg: I would prefer saying that:
> extrinsic metadata can be given on a specific artifact within a… | |||||
Done Inline ActionsI agree that we want to be able to store metadata that does not depend on a context. Using an empty context field, or a special distinguished value for designating "all contexts" fills the bill. rdicosmo: I agree that we want to be able to store metadata that does not depend on a context. Using an… | |||||
context (for example: a directory in a specific revision from a specific | |||||
visit on a specific origin) which will be stored along the metadata itself. | |||||
Done Inline ActionsI don't understand that.. which authorship? of the file? I would change this example because it opens question on authorship that are not in the scope of this documentation. If we have a HAL deposit on an artifact developed in GitHub, the deposit metadata is enhancing the information we might have on the repository (it might not be in contradiction to the one on GitHub). So I would say,
moranegg: I don't understand that.. which authorship? of the file?
so you can't add authors metadata on a… | |||||
Done Inline ActionsAgree with this clarification of the text. This is why it is important to qualify the metadata with the complete context for which it is inteded. rdicosmo: Agree with this clarification of the text.
Small improvement:
//This is why it is important to… | |||||
For example, two origins may develop the same file independently; | |||||
the information about authorship, licensing or even description may vary | |||||
about the same artifact in a different context. | |||||
Done Inline ActionsI think that Therefore is not needed. moranegg: I think that `Therefore` is not needed. | |||||
This is why it is important to qualify the metadata with the complete | |||||
context for which it is intended, if any. | |||||
for each artifact type ``<X>``, there are two endpoints | |||||
to manipulate metadata associated with artifacts of that type: | |||||
* Adding metadata:: | |||||
<X>_metadata_add(id, context, discovery_date, | |||||
authority, fetcher, | |||||
format, metadata) | |||||
Done Inline Actionsdo we get all metadata on the identified artifact, if the context is empty? moranegg: do we get all metadata on the identified artifact, if the context is empty?
If yes, this is… | |||||
Done Inline ActionsI agree it is important to be able to formulate a query that allows to request the metadata available for all contexts. rdicosmo: I agree it is important to be able to formulate a query that allows to request the metadata… | |||||
Done Inline ActionsSorry, that's a typo. I didn't mean to add this context argument. vlorentz: Sorry, that's a typo. I didn't mean to add this `context` argument. | |||||
* Getting all metadata:: | |||||
<X>_metadata_get(id, | |||||
authority, | |||||
after, limit) | |||||
definited similarly to ``origin_metadata_add`` and ``origin_metadata_get``, | |||||
but where ``id`` is a core SWHID (with type matching ``<X>``), | |||||
Done Inline Actionstypo a URL instead of an URL moranegg: typo a URL instead of an URL | |||||
Not Done Inline ActionsFor the snapshot key in the dictionary, use the snapshot core SWHID, not the visit integer (that is an internal implementation detail) rdicosmo: For the snapshot key in the dictionary, use the `snapshot core SWHID`, not the `visit integer`… | |||||
Done Inline ActionsWhat do you mean? Make snapshots have a snapshot context? vlorentz: What do you mean? Make snapshots have a snapshot context? | |||||
and with an extra ``context`` (argument when adding metadata, and dictionary | |||||
key when getting them) that is a dictionary with keys | |||||
depending on the artifact type ``<X>``: | |||||
Not Done Inline Actionstypo: an URL should be a URL moranegg: typo: an URL should be a URL | |||||
* for ``snapshot``: ``origin`` (a URL) and ``visit`` (an integer) | |||||
* for ``release``: those above, plus ``snapshot`` | |||||
Done Inline ActionsHere also, we do not want to commit to using sha1_git, as in the near future we might need to store metadata for SWHIDs that use different hashes. Let's use "(a release SWHID)", "(a revision SWHID)" etc. rdicosmo: Here also, we do not want to commit to using `sha1_git`, as in the near future we might need to… | |||||
Done Inline ActionsIt seems that the XXXX in "(the core SWHID of a XXXX)" are off by one in lines 228, 230, 232: in 228 XXXX should be snapshot, not release, etc. rdicosmo: It seems that the XXXX in "(the core SWHID of a XXXX)" are off by one in lines 228, 230, 232… | |||||
(the core SWHID of a snapshot) | |||||
* for ``revision``: all those above, plus ``release`` | |||||
(the core SWHID of a release) | |||||
* for ``directory``: all those above, plus ``revision`` | |||||
(the core SWHID of a revision) | |||||
and ``path`` (a byte string), representing the path to this directory | |||||
from the root of the ``revision`` | |||||
Done Inline ActionsSWHID of a directory ardumont: SWHID of a directory | |||||
* for ``content``: all those above, plus ``directory`` | |||||
(the core SWHID of a directory) | |||||
All keys are optional, but should be provided whenever possible. | |||||
The dictionary may be empty, if metadata is fully independent from context. | |||||
In all cases, ``visit`` should only be provided if ``origin`` is | |||||
(as visit ids are only unique with respect to an origin). |
I would prefer saying that:
and add the note: