Page MenuHomeSoftware Heritage

metadata_dictionary: Simplify code using rdflib-style namespace classes
ClosedPublic

Authored by vlorentz on Aug 19 2022, 11:37 AM.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8263 (id=29841)

Could not rebase; Attempt merge onto bb9082a6e5...

Updating bb9082a..8714625
Fast-forward
 swh/indexer/codemeta.py                            |  23 ++--
 swh/indexer/data/nuget.csv                         |  68 ++++++++++
 swh/indexer/metadata_dictionary/__init__.py        |   3 +-
 swh/indexer/metadata_dictionary/base.py            |  11 +-
 swh/indexer/metadata_dictionary/cff.py             |  17 +--
 swh/indexer/metadata_dictionary/composer.py        |   9 +-
 swh/indexer/metadata_dictionary/dart.py            |  13 +-
 swh/indexer/metadata_dictionary/github.py          |  34 ++---
 swh/indexer/metadata_dictionary/maven.py           |   7 +-
 swh/indexer/metadata_dictionary/npm.py             |  11 +-
 swh/indexer/metadata_dictionary/nuget.py           | 101 ++++++++++++++
 swh/indexer/metadata_dictionary/python.py          |  15 +-
 swh/indexer/metadata_dictionary/ruby.py            |   7 +-
 swh/indexer/namespaces.py                          |  28 ++++
 .../tests/metadata_dictionary/test_nuget.py        | 151 +++++++++++++++++++++
 swh/indexer/tests/test_cli.py                      |   1 +
 16 files changed, 423 insertions(+), 76 deletions(-)
 create mode 100644 swh/indexer/data/nuget.csv
 create mode 100644 swh/indexer/metadata_dictionary/nuget.py
 create mode 100644 swh/indexer/namespaces.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_nuget.py
Changes applied before test
commit 8714625ae452880738bac55ea7e353419109f3ac
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 19 11:37:06 2022 +0200

    metadata_dictionary: Simplify code using rdflib-style namespace classes

commit cbec71379bd556dc126180d7844d410dd89fc314
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for "*.nuspec" files

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/429/ for more details.

anlambert added a subscriber: anlambert.

Nice ! Did you consider adding a check for term validity in the namespace ? I see that the JSONLD file for codemeta is present in this repository, should be easy to parse and extract the terms set no ?

This revision is now accepted and ready to land.Aug 19 2022, 12:02 PM

Codemeta does not have a formal definition of its namespace; codemeta.jsonld is only a context file (used to expand shorthands).

Namespaces are rarely formally defined; and when they are it is based on OWL which is a complex format to parse. So it's out of scope for now (but I made add it as a side-effect of other changes I am planning)

Build is green

Patch application report for D8263 (id=29887)

Rebasing onto 7d7f29fb6a...

Current branch diff-target is up to date.
Changes applied before test
commit 92b53419f6f9d699451609cb23a946978ecb6b07
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 19 11:37:06 2022 +0200

    metadata_dictionary: Simplify code using rdflib-style namespace classes

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/434/ for more details.