Page MenuHomeSoftware Heritage

Add support for origin_extrinsic_metadata to the storage
ClosedPublic

Authored by vlorentz on Jun 30 2022, 3:28 PM.

Details

Summary

this is entirely copy-pasted from the origin_intrinsic_metadata-related code...

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8058 (id=29078)

Could not rebase; Attempt merge onto 1be4e184d4...

Updating 1be4e18..49221e1
Fast-forward
 swh/indexer/codemeta.py                            |    2 +
 swh/indexer/metadata.py                            |   41 +-
 swh/indexer/metadata_detector.py                   |    8 +-
 swh/indexer/metadata_dictionary/__init__.py        |   12 +-
 swh/indexer/metadata_dictionary/base.py            |   80 +-
 swh/indexer/metadata_dictionary/github.py          |   78 ++
 swh/indexer/metadata_dictionary/npm.py             |    4 +-
 swh/indexer/metadata_dictionary/ruby.py            |    7 +-
 swh/indexer/sql/30-schema.sql                      |   16 +
 swh/indexer/sql/50-func.sql                        |  121 +-
 swh/indexer/sql/60-indexes.sql                     |   10 +
 swh/indexer/storage/__init__.py                    |   47 +
 swh/indexer/storage/db.py                          |   29 +
 swh/indexer/storage/in_memory.py                   |   13 +
 swh/indexer/storage/interface.py                   |   29 +
 swh/indexer/storage/model.py                       |   12 +
 swh/indexer/tests/metadata_dictionary/__init__.py  |    0
 swh/indexer/tests/metadata_dictionary/test_cff.py  |  220 ++++
 .../tests/metadata_dictionary/test_codemeta.py     |  175 +++
 .../tests/metadata_dictionary/test_github.py       |  126 ++
 .../tests/metadata_dictionary/test_maven.py        |  365 ++++++
 swh/indexer/tests/metadata_dictionary/test_npm.py  |  322 +++++
 .../tests/metadata_dictionary/test_python.py       |  114 ++
 swh/indexer/tests/metadata_dictionary/test_ruby.py |  134 ++
 swh/indexer/tests/storage/test_storage.py          |  248 ++++
 swh/indexer/tests/test_cli.py                      |    1 +
 swh/indexer/tests/test_metadata.py                 | 1277 --------------------
 swh/indexer/tests/zz_celery/README                 |    2 +
 swh/indexer/tests/zz_celery/__init__.py            |    0
 swh/indexer/tests/{ => zz_celery}/test_tasks.py    |    0
 30 files changed, 2149 insertions(+), 1344 deletions(-)
 create mode 100644 swh/indexer/metadata_dictionary/github.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/__init__.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_cff.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_codemeta.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_github.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_maven.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_npm.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_python.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_ruby.py
 create mode 100644 swh/indexer/tests/zz_celery/README
 create mode 100644 swh/indexer/tests/zz_celery/__init__.py
 rename swh/indexer/tests/{ => zz_celery}/test_tasks.py (100%)
Changes applied before test
commit 49221e13a976009cd9d2306404987b541239567c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 30 15:27:45 2022 +0200

    Add support for origin_extrinsic_metadata to the storage

commit 1b0eb35fb8e4cc41d5718bc947084a4a827cbde6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 30 14:52:08 2022 +0200

    reorder SQL functions

commit 2da2bc3ce7b73b775dbba22d3763cea1526a544b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 30 10:54:03 2022 +0200

    github mapping: Add support for more terms from the Codemeta crosswalk

commit a20610b2503ec503e9c0d3b26e809d779c3dbc8e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 18:08:00 2022 +0200

    github mapping: Add support for terms outside the codemeta context

commit 07074b9eec29880698469a623133a94a7122b731
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 17:53:03 2022 +0200

    Add minimal GitHub metadata mapping
    
    This introduces the scaffholding for extrinsic metadata mappings

commit 244bf36f55fb919f9b9da8503db309a6d816fd30
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 13:43:14 2022 +0200

    Move mapping-specific tests to a new directory
    
    We have many of those now; and keeping them all their tests in the same file
    is messy
    
    This causes these tests to run after Celery tests, which breaks them;
    so this commit also renames Celery tests to make them run last.

commit e002b2ee66b305c98a153cc2b57088c179a3fc68
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 12:19:00 2022 +0200

    Remove given/when/then comments

commit 65edef32831949de7b8e14846ecd4fa43bc619ee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 12:08:49 2022 +0200

    Remove SingleFileMapping from JsonMapping's base classes
    
    Extrinsic metadata indexers will not use a 'file' as input,
    but will typically use RawExtrinsicMetadata containing formats
    in JSON.

commit f7a4bf4e04b3ac4c2fa89cf9b8a5c22e5f0c4d12
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 11:01:35 2022 +0200

    Add typing to detect_metadata() and related functions

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/298/ for more details.

douardda added a subscriber: douardda.

Shouldn't there be a migration script as well?

This revision is now accepted and ready to land.Jul 4 2022, 12:21 PM

Build is green

Patch application report for D8058 (id=29125)

Could not rebase; Attempt merge onto 3074268b1b...

Updating 3074268..f26b4c8
Fast-forward
 swh/indexer/codemeta.py                            |   2 +
 swh/indexer/metadata_dictionary/__init__.py        |  12 +-
 swh/indexer/metadata_dictionary/base.py            |  54 ++++-
 swh/indexer/metadata_dictionary/github.py          |  78 +++++++
 swh/indexer/sql/30-schema.sql                      |  16 ++
 swh/indexer/sql/50-func.sql                        | 121 ++++++++--
 swh/indexer/sql/60-indexes.sql                     |  10 +
 swh/indexer/sql/upgrades/135.sql                   | 106 +++++++++
 swh/indexer/storage/__init__.py                    |  49 +++-
 swh/indexer/storage/db.py                          |  29 +++
 swh/indexer/storage/in_memory.py                   |  13 ++
 swh/indexer/storage/interface.py                   |  29 +++
 swh/indexer/storage/model.py                       |  12 +
 .../tests/metadata_dictionary/test_github.py       | 126 +++++++++++
 swh/indexer/tests/storage/test_storage.py          | 248 +++++++++++++++++++++
 swh/indexer/tests/test_cli.py                      |   1 +
 16 files changed, 873 insertions(+), 33 deletions(-)
 create mode 100644 swh/indexer/metadata_dictionary/github.py
 create mode 100644 swh/indexer/sql/upgrades/135.sql
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_github.py
Changes applied before test
commit f26b4c8b1ca771fa73cad78416001a464706445f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 30 15:27:45 2022 +0200

    Add support for origin_extrinsic_metadata to the storage

commit db02285bee9b4a6d017ef040753573647b04f930
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 30 14:52:08 2022 +0200

    reorder SQL functions

commit 151a3b8a2b698c999a0efb4f2ee7f5076d8a3076
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 30 10:54:03 2022 +0200

    github mapping: Add support for more terms from the Codemeta crosswalk

commit 8948c83972512326bd11eebaf0354b92747a8718
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 18:08:00 2022 +0200

    github mapping: Add support for terms outside the codemeta context

commit 9085cae01009f19a00a9c3b1e56eeb138e4f2775
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 29 17:53:03 2022 +0200

    Add minimal GitHub metadata mapping
    
    This introduces the scaffholding for extrinsic metadata mappings

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/317/ for more details.