Page MenuHomeSoftware Heritage

Document metadata providers.
ClosedPublic

Authored by vlorentz on Nov 8 2018, 2:59 PM.

Active Operations

Diff Detail

Repository
rDSTO Storage manager
Branch
doc-metadata-providers
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2370
Build 2892: tox-on-jenkinsJenkins
Build 2891: arc lint + arc unit

Event Timeline

moranegg added inline comments.
swh/storage/storage.py
1304

It might be pertinent to write a type example or the potential types list.

  • Add an example provider_type.
douardda added a subscriber: douardda.
douardda added inline comments.
swh/storage/storage.py
1304

I'll go one step further:

  • what are the possible values for this provider_type (I've not checked the code for this) ? If its a limited list of possible values, give them, otherwise tell how valid values can be guessed.
  • add a complete and valid example of arguments (or give a clue on where such a usage example can be found in the code)

Also, there should somewhere a description of what this metadata object stands for. I guess it is the metadata describing the metadata provider (which might be source of confusion; which metadata are we talking about in there?)

This revision now requires changes to proceed.Nov 13 2018, 12:01 PM
vlorentz added inline comments.
swh/storage/storage.py
1304

tbh, I don't know the answers to these questions

I'm not sure why the build is failing.
Can I relaunch a build as a reviewer, or only by changing the diff will the build be relaunched?

swh/storage/storage.py
1304

what are the possible values for this provider_type (I've not checked the code for this) ? If its a limited list of possible values, give them, otherwise tell how valid values can be guessed.

It isn't a limited list for now, we have at the moment only one provider_type deposit_client, but the metadata provider entity should be used in the following future cases:

  • when listing: lister type
  • when loading: loader type
  • when fetching metadata from registries: registry type

Because it is not implemented, I'm not sure if we should list the types above in the docs

add a complete and valid example of arguments (or give a clue on where such a usage example can be found in the code)

Here is the actual metadata_provider in the storage today

idprovider_nameprovider_typeprovider_urlmetadata
3haldeposit_clienthttps://hal.archives-ouvertes.fr/{}
4swhdeposit_clienthttps://www.softwareheritage.org{}

with the use cases I mentioned we could add:

idprovider_nameprovider_typeprovider_urlmetadata
xgitlab_listerlisterhttps://gitlab.com/{}
ygitlab_loaderloaderhttps://gitlab.com/{}
zwikidataregistryhttps://www.wikidata.org{}

Also, there should somewhere a description of what this metadata object stands for. I guess it is the metadata describing the metadata provider (which might be source of confusion; which metadata are we talking about in there?)

You are right ! we do not use at the moment the metadata property for a metadata_provider, this property was added for registries where we want to keep more information about the registry itself.

@vlorentz you can add the existing hal deposit_client example as is
The rest isn't documented, we had a couple of emails on swh-devel, but we might need specs on this subject.

For a larger discussion:
@douardda where would you put specs on future features?
For the deposit, we used the \docs\specs for it, maybe wiki pages are more appropriate.

I think that @douardda's comments should be resolved in a specs document and not in the current docs.
So I'm opening a task about that, referring to the discussion here [T1344]

I think that @douardda's comments should be resolved in a specs document and not in the current docs.
So I'm opening a task about that, referring to the discussion here [T1344]

Fine for me

as long as T1344 is not forgotten forever. I mean as is this docstring is pretty much useless, so we need to make sure it's improved also as part of T1344 .

This revision is now accepted and ready to land.Nov 14 2018, 4:17 PM
This revision was automatically updated to reflect the committed changes.