this is entirely copy-pasted from the origin_intrinsic_metadata-related code...
Details
Details
- Reviewers
douardda - Group Reviewers
Reviewers - Maniphest Tasks
- T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch
- Commits
- rDCIDXf26b4c8b1ca7: Add support for origin_extrinsic_metadata to the storage
Diff Detail
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 30217 Build 47233: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 47232: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D8058 (id=29078)
Could not rebase; Attempt merge onto 1be4e184d4...
Updating 1be4e18..49221e1
Fast-forward
swh/indexer/codemeta.py | 2 +
swh/indexer/metadata.py | 41 +-
swh/indexer/metadata_detector.py | 8 +-
swh/indexer/metadata_dictionary/__init__.py | 12 +-
swh/indexer/metadata_dictionary/base.py | 80 +-
swh/indexer/metadata_dictionary/github.py | 78 ++
swh/indexer/metadata_dictionary/npm.py | 4 +-
swh/indexer/metadata_dictionary/ruby.py | 7 +-
swh/indexer/sql/30-schema.sql | 16 +
swh/indexer/sql/50-func.sql | 121 +-
swh/indexer/sql/60-indexes.sql | 10 +
swh/indexer/storage/__init__.py | 47 +
swh/indexer/storage/db.py | 29 +
swh/indexer/storage/in_memory.py | 13 +
swh/indexer/storage/interface.py | 29 +
swh/indexer/storage/model.py | 12 +
swh/indexer/tests/metadata_dictionary/__init__.py | 0
swh/indexer/tests/metadata_dictionary/test_cff.py | 220 ++++
.../tests/metadata_dictionary/test_codemeta.py | 175 +++
.../tests/metadata_dictionary/test_github.py | 126 ++
.../tests/metadata_dictionary/test_maven.py | 365 ++++++
swh/indexer/tests/metadata_dictionary/test_npm.py | 322 +++++
.../tests/metadata_dictionary/test_python.py | 114 ++
swh/indexer/tests/metadata_dictionary/test_ruby.py | 134 ++
swh/indexer/tests/storage/test_storage.py | 248 ++++
swh/indexer/tests/test_cli.py | 1 +
swh/indexer/tests/test_metadata.py | 1277 --------------------
swh/indexer/tests/zz_celery/README | 2 +
swh/indexer/tests/zz_celery/__init__.py | 0
swh/indexer/tests/{ => zz_celery}/test_tasks.py | 0
30 files changed, 2149 insertions(+), 1344 deletions(-)
create mode 100644 swh/indexer/metadata_dictionary/github.py
create mode 100644 swh/indexer/tests/metadata_dictionary/__init__.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_cff.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_codemeta.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_github.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_maven.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_npm.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_python.py
create mode 100644 swh/indexer/tests/metadata_dictionary/test_ruby.py
create mode 100644 swh/indexer/tests/zz_celery/README
create mode 100644 swh/indexer/tests/zz_celery/__init__.py
rename swh/indexer/tests/{ => zz_celery}/test_tasks.py (100%)Changes applied before test
commit 49221e13a976009cd9d2306404987b541239567c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jun 30 15:27:45 2022 +0200
Add support for origin_extrinsic_metadata to the storage
commit 1b0eb35fb8e4cc41d5718bc947084a4a827cbde6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jun 30 14:52:08 2022 +0200
reorder SQL functions
commit 2da2bc3ce7b73b775dbba22d3763cea1526a544b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jun 30 10:54:03 2022 +0200
github mapping: Add support for more terms from the Codemeta crosswalk
commit a20610b2503ec503e9c0d3b26e809d779c3dbc8e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 18:08:00 2022 +0200
github mapping: Add support for terms outside the codemeta context
commit 07074b9eec29880698469a623133a94a7122b731
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 17:53:03 2022 +0200
Add minimal GitHub metadata mapping
This introduces the scaffholding for extrinsic metadata mappings
commit 244bf36f55fb919f9b9da8503db309a6d816fd30
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 13:43:14 2022 +0200
Move mapping-specific tests to a new directory
We have many of those now; and keeping them all their tests in the same file
is messy
This causes these tests to run after Celery tests, which breaks them;
so this commit also renames Celery tests to make them run last.
commit e002b2ee66b305c98a153cc2b57088c179a3fc68
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 12:19:00 2022 +0200
Remove given/when/then comments
commit 65edef32831949de7b8e14846ecd4fa43bc619ee
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 12:08:49 2022 +0200
Remove SingleFileMapping from JsonMapping's base classes
Extrinsic metadata indexers will not use a 'file' as input,
but will typically use RawExtrinsicMetadata containing formats
in JSON.
commit f7a4bf4e04b3ac4c2fa89cf9b8a5c22e5f0c4d12
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 11:01:35 2022 +0200
Add typing to detect_metadata() and related functionsSee https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/298/ for more details.
Comment Actions
Build is green
Patch application report for D8058 (id=29125)
Could not rebase; Attempt merge onto 3074268b1b...
Updating 3074268..f26b4c8 Fast-forward swh/indexer/codemeta.py | 2 + swh/indexer/metadata_dictionary/__init__.py | 12 +- swh/indexer/metadata_dictionary/base.py | 54 ++++- swh/indexer/metadata_dictionary/github.py | 78 +++++++ swh/indexer/sql/30-schema.sql | 16 ++ swh/indexer/sql/50-func.sql | 121 ++++++++-- swh/indexer/sql/60-indexes.sql | 10 + swh/indexer/sql/upgrades/135.sql | 106 +++++++++ swh/indexer/storage/__init__.py | 49 +++- swh/indexer/storage/db.py | 29 +++ swh/indexer/storage/in_memory.py | 13 ++ swh/indexer/storage/interface.py | 29 +++ swh/indexer/storage/model.py | 12 + .../tests/metadata_dictionary/test_github.py | 126 +++++++++++ swh/indexer/tests/storage/test_storage.py | 248 +++++++++++++++++++++ swh/indexer/tests/test_cli.py | 1 + 16 files changed, 873 insertions(+), 33 deletions(-) create mode 100644 swh/indexer/metadata_dictionary/github.py create mode 100644 swh/indexer/sql/upgrades/135.sql create mode 100644 swh/indexer/tests/metadata_dictionary/test_github.py
Changes applied before test
commit f26b4c8b1ca771fa73cad78416001a464706445f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jun 30 15:27:45 2022 +0200
Add support for origin_extrinsic_metadata to the storage
commit db02285bee9b4a6d017ef040753573647b04f930
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jun 30 14:52:08 2022 +0200
reorder SQL functions
commit 151a3b8a2b698c999a0efb4f2ee7f5076d8a3076
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Jun 30 10:54:03 2022 +0200
github mapping: Add support for more terms from the Codemeta crosswalk
commit 8948c83972512326bd11eebaf0354b92747a8718
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 18:08:00 2022 +0200
github mapping: Add support for terms outside the codemeta context
commit 9085cae01009f19a00a9c3b1e56eeb138e4f2775
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Wed Jun 29 17:53:03 2022 +0200
Add minimal GitHub metadata mapping
This introduces the scaffholding for extrinsic metadata mappingsSee https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/317/ for more details.