Page MenuHomeSoftware Heritage

docs: Explain the indexation workflow for extrinsic metadata
ClosedPublic

Authored by vlorentz on Jul 6 2022, 4:19 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
extrinsic
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 30375
Build 47484: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 47483: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8087 (id=29192)

Could not rebase; Attempt merge onto c2742b5b75...

Updating c2742b5..847bc29
Fast-forward
 docs/images/tasks-metadata-indexers.uml            |  57 +++++-----
 docs/metadata-workflow.rst                         | 118 ++++++++++++++++-----
 swh/indexer/codemeta.py                            |  57 ++++++----
 swh/indexer/metadata.py                            |  10 +-
 swh/indexer/metadata_detector.py                   |   4 +-
 swh/indexer/metadata_dictionary/__init__.py        |  12 ++-
 swh/indexer/metadata_dictionary/base.py            |  63 +++++++----
 swh/indexer/metadata_dictionary/cff.py             |   4 +-
 swh/indexer/metadata_dictionary/codemeta.py        |   4 +-
 swh/indexer/metadata_dictionary/composer.py        |   4 +-
 swh/indexer/metadata_dictionary/github.py          |  68 ++++++++++--
 swh/indexer/metadata_dictionary/maven.py           |   4 +-
 swh/indexer/metadata_dictionary/npm.py             |   4 +-
 swh/indexer/metadata_dictionary/python.py          |   4 +-
 swh/indexer/metadata_dictionary/ruby.py            |  11 +-
 .../tests/metadata_dictionary/test_github.py       |  26 ++++-
 16 files changed, 307 insertions(+), 143 deletions(-)
Changes applied before test
commit 847bc29eb5e77bc70380336c8dce480be5019251
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 6 16:17:22 2022 +0200

    docs: Explain the indexation workflow for extrinsic metadata

commit 724034de625f3a388a261e1eed3e6a2c9620c539
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 5 17:46:33 2022 +0200

    docs: Update description of the metadata workflow
    
    1. indexers call themselves directly instead of going through the scheduler
    2. metadata is attached to directories instead of revisions

commit 3458892274226aabe490a795abe5d6fce990be99
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 5 16:43:02 2022 +0200

    github: Translate stargazers_count and watchers_count

commit e177c77baf48b69e420a3eed6b9125a7f209947f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 18:26:24 2022 +0200

    Simplify codemeta.make_absolute_uri()

commit dd9adebeca15c697cc27011693c8d84f6ec1544e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 18:25:25 2022 +0200

    Document codemeta.make_absolute_uri()

commit 358ee08416dd847d7ebbddd0c721d7a287149175
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 13:55:19 2022 +0200

    Use compact URIs for ForgeFed and ActivityStreams
    
    It makes resulting documents (usually) shorter, and tests more readable.

commit d41f26eef0561fd41932eb688bc6908f2253ef4c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 13:37:58 2022 +0200

    Use separate base classes for intrinsic and extrinsic mappings
    
    detect_metadata_files and extrinsic_metadata_formats (respectively) are somewhat
    mutually exclusive, so it does not make much sense to have them in the same
    class and MAPPINGS dict

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/343/ for more details.

anlambert added a subscriber: anlambert.
anlambert added inline comments.
docs/metadata-workflow.rst
98

s/pull/pulled/

120

"owners" or owners

180

s/Gitlab/GitLab/

This revision is now accepted and ready to land.Jul 7 2022, 4:36 PM
vlorentz marked 3 inline comments as done.

Fix typos

This revision was landed with ongoing or failed builds.Jul 11 2022, 5:30 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8087 (id=29289)

Rebasing onto 724034de62...

Current branch diff-target is up to date.
Changes applied before test
commit 2dd2be9c1b17cdd8ac5520acc3935a2f34be98c8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 6 16:17:22 2022 +0200

    docs: Explain the indexation workflow for extrinsic metadata

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/356/ for more details.