Page MenuHomeSoftware Heritage

Validate extrinsic metadata terminology
Closed, MigratedEdits Locked

Description

With the extrinsic metadata specification we have introduced two new terms that replaces existing terms in SWH.
authority which is replacing provider

Metadata authorities are entities that provide metadata about an origin. Metadata authorities include: code hosting places, deposit submitters, and registries (eg. Wikidata).

fetcher which is replacing tool

Metadata fetchers are software components used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive.

see more details here: https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html

To insure consistency, these terms should be used for extrinsic metadata in all modules and in all documentation (wiki, website, etc.)

Before landing the complete implementation of the extrinsic metadata diff D2988 , we need to validate that these terms are still the ones we want to use everywhere.

Here is a list of all affected modules, documentation and wiki pages:

Event Timeline

moranegg triaged this task as Normal priority.Apr 27 2020, 11:57 AM
moranegg created this task.

swh-loader-tar (used in the revision creation with a field called extrinsic with two properties provide and tool)

I don't think we should change this right now, but convert them when we have a proper storage for extrinsic revision metadata

swh-loader-tar (used in the revision creation with a field called extrinsic with two properties provide and tool)

I don't think we should change this right now, but convert them when we have a proper storage for extrinsic revision metadata

I prefer doing a complete change of terminology, we shouldn't have one part with old terms and new implementations with the new terms..
Doing it now, will prevent future ambiguities.

You correctly mention uniformity as a positive thing to have.
But term correctness is also a positive value, in my opinion.

So, before deciding: do you (@moranegg) consider the old terms superior/more correct than the new proposed ones?
FWIW, "authority" seems to me more appropriate than "provider"; and "tool" is a term so generic to be meaningless (it strikes me as very similar to the historical mistake we made when we decided to blobs "contents").

If we agree that the initially proposed terms are more correct than the legacy ones we have, we need to weigh whether consistency is a more important argument than correctness.
If it isn't a better argument, we should maybe plan to adapt the legacy terms to the new ones instead?
(I'm mentioning this because the risk here is to have to do the transition you suggest now, and later on do it again a transition in the other direction.)

thanks ! for the feedback..

when searching for correctness, seems that we use provider also as the source code provider (which is more accurate, IMO)
we might want to accentuate this difference, authority for metadata and provider for code.

The deposit today, has a provider (of the source code package) and an authority (for the metadata associated) which is the same actor - the deposit client.

In this choice, a metadata deposit does not have a provider, it has only an authority.

So when going to the loader and deposit, we need to have this distinction in mind and keep the term provider where it's about the software artifacts
and authority where manipulating the metadata objects.

Regarding tool and fetcher, I intuitively think that fetcher is not the "correct" term for deposit (because the metadata is pushed)
But I can get used to that.
I agree that tool is too generic and I don't have a better proposition.

The reason for the task, validating these terms, was not to have a future discussion (after implementation) discussing if we all agree on the terminology and if not, do we need to search for new terms, or go back to the old terms.

Bottom line

We agree that we are moving to authority and fetcher for extrinsic metadata.
Now we need to identify the modules and documentation affected by this change of terminology to have it updated (with the distinction I have presented above).

For the code itself, we need to transition either way, as the interface changed. So changing the terminology in the code is free.

@vlorentz what do you mean with free?

I mean it doesn't require any extra work.

moranegg claimed this task.