Depends on D8279.
Details
Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Commits
- rDCIDX39bb4211697b: Add class attribute 'uri_fields' to deduplicate simply URI fields
Diff Detail
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 30958 Build 48423: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 48422: arc lint + arc unit
Event Timeline
Comment Actions
Build has FAILED
Patch application report for D8282 (id=29903)
Could not rebase; Attempt merge onto 7d7f29fb6a...
Updating 7d7f29f..416543b Fast-forward docs/metadata-workflow.rst | 2 +- mypy.ini | 3 + requirements.txt | 1 + swh/indexer/codemeta.py | 64 +++------ swh/indexer/metadata_dictionary/base.py | 146 +++++++++++++++++---- swh/indexer/metadata_dictionary/cff.py | 82 +++++++----- swh/indexer/metadata_dictionary/composer.py | 41 +++--- swh/indexer/metadata_dictionary/dart.py | 47 +++---- swh/indexer/metadata_dictionary/github.py | 135 +++++++++---------- swh/indexer/metadata_dictionary/maven.py | 99 +++++++------- swh/indexer/metadata_dictionary/npm.py | 141 +++++++++++++------- swh/indexer/metadata_dictionary/nuget.py | 76 +++++------ swh/indexer/metadata_dictionary/python.py | 44 ++++--- swh/indexer/metadata_dictionary/ruby.py | 47 +++---- swh/indexer/metadata_dictionary/utils.py | 72 ++++++++++ swh/indexer/namespaces.py | 12 ++ swh/indexer/tests/metadata_dictionary/test_cff.py | 11 +- .../tests/metadata_dictionary/test_composer.py | 7 +- swh/indexer/tests/metadata_dictionary/test_dart.py | 17 ++- .../tests/metadata_dictionary/test_github.py | 2 +- .../tests/metadata_dictionary/test_maven.py | 32 ++--- swh/indexer/tests/metadata_dictionary/test_npm.py | 7 +- .../tests/metadata_dictionary/test_nuget.py | 53 ++++---- .../tests/metadata_dictionary/test_python.py | 9 +- swh/indexer/tests/metadata_dictionary/test_ruby.py | 2 + swh/indexer/tests/test_codemeta.py | 32 +---- swh/indexer/tests/test_origin_metadata.py | 6 +- swh/indexer/tests/utils.py | 13 -- 28 files changed, 668 insertions(+), 535 deletions(-) create mode 100644 swh/indexer/metadata_dictionary/utils.py create mode 100644 swh/indexer/namespaces.py
Changes applied before test
commit 416543b631693d37328e2f7d068df54729858914
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 16:03:05 2022 +0200
Add class attribute 'uri_fields' to deduplicate simply URI fields
commit d9732bd541099164a1ac0bba4e176270a631c172
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 14:20:20 2022 +0200
Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally
Motivation:
1. It makes it easier to visualize what is actually happening when modifying
the graph, by working explicitly on triples instead of a JSON-LD (a tree
serialization of the graph).
2. Remove the need for the hacky `merge_values()` function (and possibly
`merge_documents()` in a future commit)
3. It also catches malformed data exactly where it is added in the document
(the call to rdflib.Graph.add()) instead of at the end of the mapping
when running compaction/expansion.
Downsides:
1. Tests are clunkier, because they relied on deterministic order of
unordered lists; but rdflib does not guarantee it
2. Code is longer
3. Extra dependency (which we will need at some point if we want to
import from RDF datasets, anyway)
commit b9f206bfe0a8e2592708e7f1728643654564f32a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 15:47:49 2022 +0200
Remove 'keywords' from test files
Their order is nondeterministic, it just happens to work with
the way we use PyLD.
commit d5207e9521d982d2c170399b4855de3b6d9e8005
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 13:20:14 2022 +0200
Add base XmlMapping to deduplicate between MavenMapping and NugetMapping
commit c4cd68f6ae0ea92be6e312a37fcf0fe597e7616f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 13:18:39 2022 +0200
nuget: Remove test-specific code from the main class
commit b09e2bcfc73d72a670dbedfe5f8334d0036ce195
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 13:14:05 2022 +0200
nuget: Inherit directly from BaseIntrinsicMapping
commit c8a4571c8763c84c064e832f598e949407e0e429
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 19 12:18:30 2022 +0200
Replace 'normalize' parameter of _translate_dict() with a hook method
This parameter was only used to execute extra code before
`normalize_translation` is called. This caused some duplication, and
will not work when switching to a non-JSON-LD internal representation.
Removing it also makes the code of mappings more consistent, by removing
specific field handling from their implementation of the `translate`
method itself.
commit 466108c1667c88be7ff272e565ffe076e16064d8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 19 12:13:33 2022 +0200
python: Simplify translation logic of author metadata
commit 92b53419f6f9d699451609cb23a946978ecb6b07
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 19 11:37:06 2022 +0200
metadata_dictionary: Simplify code using rdflib-style namespace classesLink to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/447/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/447/console
Comment Actions
Build is green
Patch application report for D8282 (id=29905)
Could not rebase; Attempt merge onto 7d7f29fb6a...
Updating 7d7f29f..42e663d Fast-forward docs/metadata-workflow.rst | 2 +- mypy.ini | 3 + requirements.txt | 1 + swh/indexer/codemeta.py | 66 +++------- swh/indexer/metadata_dictionary/base.py | 146 +++++++++++++++++---- swh/indexer/metadata_dictionary/cff.py | 82 +++++++----- swh/indexer/metadata_dictionary/composer.py | 41 +++--- swh/indexer/metadata_dictionary/dart.py | 47 +++---- swh/indexer/metadata_dictionary/github.py | 135 +++++++++---------- swh/indexer/metadata_dictionary/maven.py | 99 +++++++------- swh/indexer/metadata_dictionary/npm.py | 141 +++++++++++++------- swh/indexer/metadata_dictionary/nuget.py | 76 +++++------ swh/indexer/metadata_dictionary/python.py | 44 ++++--- swh/indexer/metadata_dictionary/ruby.py | 47 +++---- swh/indexer/metadata_dictionary/utils.py | 72 ++++++++++ swh/indexer/namespaces.py | 12 ++ swh/indexer/tests/metadata_dictionary/test_cff.py | 11 +- .../tests/metadata_dictionary/test_composer.py | 7 +- swh/indexer/tests/metadata_dictionary/test_dart.py | 17 ++- .../tests/metadata_dictionary/test_github.py | 2 +- .../tests/metadata_dictionary/test_maven.py | 32 ++--- swh/indexer/tests/metadata_dictionary/test_npm.py | 7 +- .../tests/metadata_dictionary/test_nuget.py | 53 ++++---- .../tests/metadata_dictionary/test_python.py | 9 +- swh/indexer/tests/metadata_dictionary/test_ruby.py | 2 + swh/indexer/tests/test_codemeta.py | 32 +---- swh/indexer/tests/test_origin_metadata.py | 6 +- swh/indexer/tests/utils.py | 13 -- 28 files changed, 669 insertions(+), 536 deletions(-) create mode 100644 swh/indexer/metadata_dictionary/utils.py create mode 100644 swh/indexer/namespaces.py
Changes applied before test
commit 42e663d21818e7a2df80f5fe2d1cfd22ec90db6a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 16:03:05 2022 +0200
Add class attribute 'uri_fields' to deduplicate simply URI fields
commit 885094479a97fd37ec624f781c4c2f87fc87a1f2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 14:20:20 2022 +0200
Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally
Motivation:
1. It makes it easier to visualize what is actually happening when modifying
the graph, by working explicitly on triples instead of a JSON-LD (a tree
serialization of the graph).
2. Remove the need for the hacky `merge_values()` function (and possibly
`merge_documents()` in a future commit)
3. It also catches malformed data exactly where it is added in the document
(the call to rdflib.Graph.add()) instead of at the end of the mapping
when running compaction/expansion.
Downsides:
1. Tests are clunkier, because they relied on deterministic order of
unordered lists; but rdflib does not guarantee it
2. Code is longer
3. Extra dependency (which we will need at some point if we want to
import from RDF datasets, anyway)
commit b9f206bfe0a8e2592708e7f1728643654564f32a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 15:47:49 2022 +0200
Remove 'keywords' from test files
Their order is nondeterministic, it just happens to work with
the way we use PyLD.
commit d5207e9521d982d2c170399b4855de3b6d9e8005
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 13:20:14 2022 +0200
Add base XmlMapping to deduplicate between MavenMapping and NugetMapping
commit c4cd68f6ae0ea92be6e312a37fcf0fe597e7616f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 13:18:39 2022 +0200
nuget: Remove test-specific code from the main class
commit b09e2bcfc73d72a670dbedfe5f8334d0036ce195
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 13:14:05 2022 +0200
nuget: Inherit directly from BaseIntrinsicMapping
commit c8a4571c8763c84c064e832f598e949407e0e429
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 19 12:18:30 2022 +0200
Replace 'normalize' parameter of _translate_dict() with a hook method
This parameter was only used to execute extra code before
`normalize_translation` is called. This caused some duplication, and
will not work when switching to a non-JSON-LD internal representation.
Removing it also makes the code of mappings more consistent, by removing
specific field handling from their implementation of the `translate`
method itself.
commit 466108c1667c88be7ff272e565ffe076e16064d8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 19 12:13:33 2022 +0200
python: Simplify translation logic of author metadata
commit 92b53419f6f9d699451609cb23a946978ecb6b07
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 19 11:37:06 2022 +0200
metadata_dictionary: Simplify code using rdflib-style namespace classesSee https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/449/ for more details.
Comment Actions
Build was aborted
Patch application report for D8282 (id=29924)
Could not rebase; Attempt merge onto d5207e9521...
Updating d5207e9..39bb421 Fast-forward docs/metadata-workflow.rst | 2 +- mypy.ini | 3 + requirements.txt | 1 + swh/indexer/codemeta.py | 51 ++------ swh/indexer/metadata_dictionary/base.py | 98 +++++++++++---- swh/indexer/metadata_dictionary/cff.py | 81 ++++++------ swh/indexer/metadata_dictionary/composer.py | 40 +++--- swh/indexer/metadata_dictionary/dart.py | 40 +++--- swh/indexer/metadata_dictionary/github.py | 119 ++++++++---------- swh/indexer/metadata_dictionary/maven.py | 72 ++++++----- swh/indexer/metadata_dictionary/npm.py | 138 +++++++++++++-------- swh/indexer/metadata_dictionary/nuget.py | 53 ++++---- swh/indexer/metadata_dictionary/python.py | 40 +++--- swh/indexer/metadata_dictionary/ruby.py | 44 +++---- swh/indexer/metadata_dictionary/utils.py | 72 +++++++++++ swh/indexer/namespaces.py | 20 +-- swh/indexer/tests/metadata_dictionary/test_cff.py | 11 +- .../tests/metadata_dictionary/test_composer.py | 7 +- swh/indexer/tests/metadata_dictionary/test_dart.py | 17 +-- .../tests/metadata_dictionary/test_github.py | 2 +- .../tests/metadata_dictionary/test_maven.py | 32 ++--- swh/indexer/tests/metadata_dictionary/test_npm.py | 7 +- .../tests/metadata_dictionary/test_nuget.py | 41 +++--- .../tests/metadata_dictionary/test_python.py | 9 +- swh/indexer/tests/metadata_dictionary/test_ruby.py | 2 + swh/indexer/tests/test_codemeta.py | 32 +---- swh/indexer/tests/test_origin_metadata.py | 6 +- swh/indexer/tests/utils.py | 13 -- 28 files changed, 573 insertions(+), 480 deletions(-) create mode 100644 swh/indexer/metadata_dictionary/utils.py
Changes applied before test
commit 39bb4211697b8890148e4c3b719f612ac7673bdf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 16:03:05 2022 +0200
Add class attribute 'uri_fields' to deduplicate simply URI fields
commit f72d095f425224f16be1bc564f5cc4ed709fb47a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 14:20:20 2022 +0200
Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally
Motivation:
1. It makes it easier to visualize what is actually happening when modifying
the graph, by working explicitly on triples instead of a JSON-LD (a tree
serialization of the graph).
2. Remove the need for the hacky `merge_values()` function (and possibly
`merge_documents()` in a future commit)
3. It also catches malformed data exactly where it is added in the document
(the call to rdflib.Graph.add()) instead of at the end of the mapping
when running compaction/expansion.
Downsides:
1. Tests are clunkier, because they relied on deterministic order of
unordered lists; but rdflib does not guarantee it
2. Code is longer
3. Extra dependency (which we will need at some point if we want to
import from RDF datasets, anyway)
commit 97f5fdcdcc3ac76d2b4680dbcd4f2b5d4c557293
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Aug 22 15:47:49 2022 +0200
Remove 'keywords' from test files
Their order is nondeterministic, it just happens to work with
the way we use PyLD.Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/457/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/457/console