Page MenuHomeSoftware Heritage

docs: Update description of the metadata workflow
ClosedPublic

Authored by vlorentz on Jul 6 2022, 4:05 PM.

Details

Summary
  1. indexers call themselves directly instead of going through the scheduler
  2. metadata is attached to directories instead of revisions

Depends on D8084

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8085 (id=29188)

Could not rebase; Attempt merge onto c2742b5b75...

Updating c2742b5..085837e
Fast-forward
 docs/images/tasks-metadata-indexers.uml            | 57 ++++++++----------
 docs/metadata-workflow.rst                         | 29 +++++----
 swh/indexer/codemeta.py                            | 57 +++++++++++-------
 swh/indexer/metadata.py                            | 10 ++--
 swh/indexer/metadata_detector.py                   |  4 +-
 swh/indexer/metadata_dictionary/__init__.py        | 12 +++-
 swh/indexer/metadata_dictionary/base.py            | 63 ++++++++++++--------
 swh/indexer/metadata_dictionary/cff.py             |  4 +-
 swh/indexer/metadata_dictionary/codemeta.py        |  4 +-
 swh/indexer/metadata_dictionary/composer.py        |  4 +-
 swh/indexer/metadata_dictionary/github.py          | 68 +++++++++++++++++++---
 swh/indexer/metadata_dictionary/maven.py           |  4 +-
 swh/indexer/metadata_dictionary/npm.py             |  4 +-
 swh/indexer/metadata_dictionary/python.py          |  4 +-
 swh/indexer/metadata_dictionary/ruby.py            | 11 +---
 .../tests/metadata_dictionary/test_github.py       | 26 +++++++--
 16 files changed, 229 insertions(+), 132 deletions(-)
Changes applied before test
commit 085837edeea62d1439b9cfeb265b28ab858a5e7e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 5 17:46:33 2022 +0200

    docs: Update description of the metadata workflow
    
    1. indexers call themselves directly instead of going through the scheduler
    2. metadata is attached to directories instead of revisions

commit 3458892274226aabe490a795abe5d6fce990be99
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 5 16:43:02 2022 +0200

    github: Translate stargazers_count and watchers_count

commit e177c77baf48b69e420a3eed6b9125a7f209947f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 18:26:24 2022 +0200

    Simplify codemeta.make_absolute_uri()

commit dd9adebeca15c697cc27011693c8d84f6ec1544e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 18:25:25 2022 +0200

    Document codemeta.make_absolute_uri()

commit 358ee08416dd847d7ebbddd0c721d7a287149175
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 13:55:19 2022 +0200

    Use compact URIs for ForgeFed and ActivityStreams
    
    It makes resulting documents (usually) shorter, and tests more readable.

commit d41f26eef0561fd41932eb688bc6908f2253ef4c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 13:37:58 2022 +0200

    Use separate base classes for intrinsic and extrinsic mappings
    
    detect_metadata_files and extrinsic_metadata_formats (respectively) are somewhat
    mutually exclusive, so it does not make much sense to have them in the same
    class and MAPPINGS dict

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/341/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/341/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 6 2022, 4:12 PM
Harbormaster failed remote builds in B30277: Diff 29188!

Build is green

Patch application report for D8085 (id=29191)

Could not rebase; Attempt merge onto c2742b5b75...

Updating c2742b5..724034d
Fast-forward
 docs/images/tasks-metadata-indexers.uml            | 57 ++++++++----------
 docs/metadata-workflow.rst                         | 31 +++++-----
 swh/indexer/codemeta.py                            | 57 +++++++++++-------
 swh/indexer/metadata.py                            | 10 ++--
 swh/indexer/metadata_detector.py                   |  4 +-
 swh/indexer/metadata_dictionary/__init__.py        | 12 +++-
 swh/indexer/metadata_dictionary/base.py            | 63 ++++++++++++--------
 swh/indexer/metadata_dictionary/cff.py             |  4 +-
 swh/indexer/metadata_dictionary/codemeta.py        |  4 +-
 swh/indexer/metadata_dictionary/composer.py        |  4 +-
 swh/indexer/metadata_dictionary/github.py          | 68 +++++++++++++++++++---
 swh/indexer/metadata_dictionary/maven.py           |  4 +-
 swh/indexer/metadata_dictionary/npm.py             |  4 +-
 swh/indexer/metadata_dictionary/python.py          |  4 +-
 swh/indexer/metadata_dictionary/ruby.py            | 11 +---
 .../tests/metadata_dictionary/test_github.py       | 26 +++++++--
 16 files changed, 230 insertions(+), 133 deletions(-)
Changes applied before test
commit 724034de625f3a388a261e1eed3e6a2c9620c539
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 5 17:46:33 2022 +0200

    docs: Update description of the metadata workflow
    
    1. indexers call themselves directly instead of going through the scheduler
    2. metadata is attached to directories instead of revisions

commit 3458892274226aabe490a795abe5d6fce990be99
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 5 16:43:02 2022 +0200

    github: Translate stargazers_count and watchers_count

commit e177c77baf48b69e420a3eed6b9125a7f209947f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 18:26:24 2022 +0200

    Simplify codemeta.make_absolute_uri()

commit dd9adebeca15c697cc27011693c8d84f6ec1544e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 18:25:25 2022 +0200

    Document codemeta.make_absolute_uri()

commit 358ee08416dd847d7ebbddd0c721d7a287149175
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 13:55:19 2022 +0200

    Use compact URIs for ForgeFed and ActivityStreams
    
    It makes resulting documents (usually) shorter, and tests more readable.

commit d41f26eef0561fd41932eb688bc6908f2253ef4c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Jul 4 13:37:58 2022 +0200

    Use separate base classes for intrinsic and extrinsic mappings
    
    detect_metadata_files and extrinsic_metadata_formats (respectively) are somewhat
    mutually exclusive, so it does not make much sense to have them in the same
    class and MAPPINGS dict

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/342/ for more details.

This revision is now accepted and ready to land.Jul 7 2022, 4:38 PM