Page MenuHomeSoftware Heritage

metadata_dictionary: Deduplicate docstrings of the translate() method
ClosedPublic

Authored by vlorentz on Aug 23 2022, 10:13 AM.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8284 (id=29912)

Could not rebase; Attempt merge onto 466108c166...

Updating 466108c..9c7d9d5
Fast-forward
 docs/metadata-workflow.rst                         |   2 +-
 mypy.ini                                           |   3 +
 requirements.txt                                   |   1 +
 swh/indexer/codemeta.py                            |  51 ++-----
 swh/indexer/metadata_dictionary/base.py            | 158 +++++++++++++++------
 swh/indexer/metadata_dictionary/cff.py             |  81 ++++++-----
 swh/indexer/metadata_dictionary/composer.py        |  40 +++---
 swh/indexer/metadata_dictionary/dart.py            |  40 +++---
 swh/indexer/metadata_dictionary/github.py          | 119 +++++++---------
 swh/indexer/metadata_dictionary/maven.py           |  96 ++++++-------
 swh/indexer/metadata_dictionary/npm.py             | 138 +++++++++++-------
 swh/indexer/metadata_dictionary/nuget.py           |  72 ++++------
 swh/indexer/metadata_dictionary/python.py          |  44 +++---
 swh/indexer/metadata_dictionary/ruby.py            |  44 +++---
 swh/indexer/metadata_dictionary/utils.py           |  72 ++++++++++
 swh/indexer/namespaces.py                          |  20 +--
 swh/indexer/tests/metadata_dictionary/test_cff.py  |  11 +-
 .../tests/metadata_dictionary/test_composer.py     |   7 +-
 swh/indexer/tests/metadata_dictionary/test_dart.py |  17 ++-
 .../tests/metadata_dictionary/test_github.py       |   2 +-
 .../tests/metadata_dictionary/test_maven.py        |  32 ++---
 swh/indexer/tests/metadata_dictionary/test_npm.py  |   7 +-
 .../tests/metadata_dictionary/test_nuget.py        |  53 +++----
 .../tests/metadata_dictionary/test_python.py       |   9 +-
 swh/indexer/tests/metadata_dictionary/test_ruby.py |   2 +
 swh/indexer/tests/test_codemeta.py                 |  32 +----
 swh/indexer/tests/test_origin_metadata.py          |   6 +-
 swh/indexer/tests/utils.py                         |  13 --
 28 files changed, 632 insertions(+), 540 deletions(-)
 create mode 100644 swh/indexer/metadata_dictionary/utils.py
Changes applied before test
commit 9c7d9d54360983fb2ea208c6aaab664c89ee4913
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 23 10:12:54 2022 +0200

    metadata_dictionary: Deduplicate docstrings of the translate() method

commit 42e663d21818e7a2df80f5fe2d1cfd22ec90db6a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 16:03:05 2022 +0200

    Add class attribute 'uri_fields' to deduplicate simply URI fields

commit 885094479a97fd37ec624f781c4c2f87fc87a1f2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 14:20:20 2022 +0200

    Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally
    
    Motivation:
    
    1. It makes it easier to visualize what is actually happening when modifying
       the graph, by working explicitly on triples instead of a JSON-LD (a tree
       serialization of the graph).
    
    2. Remove the need for the hacky `merge_values()` function (and possibly
       `merge_documents()` in a future commit)
    
    3. It also catches malformed data exactly where it is added in the document
       (the call to rdflib.Graph.add()) instead of at the end of the mapping
       when running compaction/expansion.
    
    Downsides:
    
    1. Tests are clunkier, because they relied on deterministic order of
       unordered lists; but rdflib does not guarantee it
    
    2. Code is longer
    
    3. Extra dependency (which we will need at some point if we want to
       import from RDF datasets, anyway)

commit b9f206bfe0a8e2592708e7f1728643654564f32a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 15:47:49 2022 +0200

    Remove 'keywords' from test files
    
    Their order is nondeterministic, it just happens to work with
    the way we use PyLD.

commit d5207e9521d982d2c170399b4855de3b6d9e8005
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 13:20:14 2022 +0200

    Add base XmlMapping to deduplicate between MavenMapping and NugetMapping

commit c4cd68f6ae0ea92be6e312a37fcf0fe597e7616f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 13:18:39 2022 +0200

    nuget: Remove test-specific code from the main class

commit b09e2bcfc73d72a670dbedfe5f8334d0036ce195
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 13:14:05 2022 +0200

    nuget: Inherit directly from BaseIntrinsicMapping

commit c8a4571c8763c84c064e832f598e949407e0e429
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 19 12:18:30 2022 +0200

    Replace 'normalize' parameter of _translate_dict() with a hook method
    
    This parameter was only used to execute extra code before
    `normalize_translation` is called. This caused some duplication, and
    will not work when switching to a non-JSON-LD internal representation.
    
    Removing it also makes the code of mappings more consistent, by removing
    specific field handling from their implementation of the `translate`
    method itself.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/453/ for more details.

This revision is now accepted and ready to land.Aug 23 2022, 10:26 AM

Build is green

Patch application report for D8284 (id=29926)

Could not rebase; Attempt merge onto d5207e9521...

Updating d5207e9..e8cafad
Fast-forward
 docs/metadata-workflow.rst                         |   2 +-
 mypy.ini                                           |   3 +
 requirements.txt                                   |   1 +
 swh/indexer/codemeta.py                            |  51 ++------
 swh/indexer/metadata_dictionary/base.py            | 139 +++++++++++++--------
 swh/indexer/metadata_dictionary/cff.py             |  81 ++++++------
 swh/indexer/metadata_dictionary/composer.py        |  40 +++---
 swh/indexer/metadata_dictionary/dart.py            |  40 +++---
 swh/indexer/metadata_dictionary/github.py          | 119 ++++++++----------
 swh/indexer/metadata_dictionary/maven.py           |  72 ++++++-----
 swh/indexer/metadata_dictionary/npm.py             | 138 ++++++++++++--------
 swh/indexer/metadata_dictionary/nuget.py           |  53 ++++----
 swh/indexer/metadata_dictionary/python.py          |  40 +++---
 swh/indexer/metadata_dictionary/ruby.py            |  44 +++----
 swh/indexer/metadata_dictionary/utils.py           |  72 +++++++++++
 swh/indexer/namespaces.py                          |  20 +--
 swh/indexer/tests/metadata_dictionary/test_cff.py  |  11 +-
 .../tests/metadata_dictionary/test_composer.py     |   7 +-
 swh/indexer/tests/metadata_dictionary/test_dart.py |  17 +--
 .../tests/metadata_dictionary/test_github.py       |   2 +-
 .../tests/metadata_dictionary/test_maven.py        |  32 ++---
 swh/indexer/tests/metadata_dictionary/test_npm.py  |   7 +-
 .../tests/metadata_dictionary/test_nuget.py        |  41 +++---
 .../tests/metadata_dictionary/test_python.py       |   9 +-
 swh/indexer/tests/metadata_dictionary/test_ruby.py |   2 +
 swh/indexer/tests/test_codemeta.py                 |  32 +----
 swh/indexer/tests/test_origin_metadata.py          |   6 +-
 swh/indexer/tests/utils.py                         |  13 --
 28 files changed, 587 insertions(+), 507 deletions(-)
 create mode 100644 swh/indexer/metadata_dictionary/utils.py
Changes applied before test
commit e8cafad61c92228a00cbdd410ebf66b166ed6aaf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 23 10:12:54 2022 +0200

    metadata_dictionary: Deduplicate docstrings of the translate() method

commit 39bb4211697b8890148e4c3b719f612ac7673bdf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 16:03:05 2022 +0200

    Add class attribute 'uri_fields' to deduplicate simply URI fields

commit f72d095f425224f16be1bc564f5cc4ed709fb47a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 14:20:20 2022 +0200

    Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally
    
    Motivation:
    
    1. It makes it easier to visualize what is actually happening when modifying
       the graph, by working explicitly on triples instead of a JSON-LD (a tree
       serialization of the graph).
    
    2. Remove the need for the hacky `merge_values()` function (and possibly
       `merge_documents()` in a future commit)
    
    3. It also catches malformed data exactly where it is added in the document
       (the call to rdflib.Graph.add()) instead of at the end of the mapping
       when running compaction/expansion.
    
    Downsides:
    
    1. Tests are clunkier, because they relied on deterministic order of
       unordered lists; but rdflib does not guarantee it
    
    2. Code is longer
    
    3. Extra dependency (which we will need at some point if we want to
       import from RDF datasets, anyway)

commit 97f5fdcdcc3ac76d2b4680dbcd4f2b5d4c557293
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 22 15:47:49 2022 +0200

    Remove 'keywords' from test files
    
    Their order is nondeterministic, it just happens to work with
    the way we use PyLD.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/458/ for more details.