diff --git a/PKG-INFO b/PKG-INFO
index 7c37e58..e04be35 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,71 +1,71 @@
 Metadata-Version: 2.1
 Name: swh.indexer
-Version: 2.3.0
+Version: 2.4.0
 Summary: Software Heritage Content Indexer
 Home-page: https://forge.softwareheritage.org/diffusion/78/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-indexer
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-indexer/
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 License-File: LICENSE
 License-File: AUTHORS
 
 swh-indexer
 ============
 
 Tools to compute multiple indexes on SWH's raw contents:
 - content:
   - mimetype
   - ctags
   - language
   - fossology-license
   - metadata
 - revision:
   - metadata
 
 An indexer is in charge of:
 - looking up objects
 - extracting information from those objects
 - store those information in the swh-indexer db
 
 There are multiple indexers working on different object types:
   - content indexer: works with content sha1 hashes
   - revision indexer: works with revision sha1 hashes
   - origin indexer: works with origin identifiers
 
 Indexation procedure:
 - receive batch of ids
 - retrieve the associated data depending on object type
 - compute for that object some index
 - store the result to swh's storage
 
 Current content indexers:
 
 - mimetype (queue swh_indexer_content_mimetype): detect the encoding
   and mimetype
 
 - language (queue swh_indexer_content_language): detect the
   programming language
 
 - ctags (queue swh_indexer_content_ctags): compute tags information
 
 - fossology-license (queue swh_indexer_fossology_license): compute the
   license
 
 - metadata: translate file into translated_metadata dict
 
 Current revision indexers:
 
 - metadata: detects files containing metadata and retrieves translated_metadata
   in content_metadata table in storage or run content indexer to translate
   files.
diff --git a/docs/metadata-workflow.rst b/docs/metadata-workflow.rst
index 299ea19..4d99106 100644
--- a/docs/metadata-workflow.rst
+++ b/docs/metadata-workflow.rst
@@ -1,274 +1,274 @@
 Metadata workflow
 =================
 
 Intrinsic metadata
 ------------------
 
 Indexing :term:`intrinsic metadata` requires extracting information from the
 lowest levels of the :ref:`Merkle DAG <swh-merkle-dag>` (directories, files,
 and content blobs) and associate them to the highest ones (origins).
 In order to deduplicate the work between origins, we split this work between
 multiple indexers, which coordinate with each other and save their results
 at each step in the indexer storage.
 
 Indexer architecture
 ^^^^^^^^^^^^^^^^^^^^
 
 .. thumbnail:: images/tasks-metadata-indexers.svg
 
 
 Origin-Head Indexer
 ^^^^^^^^^^^^^^^^^^^
 
 First, the Origin-Head indexer gets called externally, with an origin as
 argument (or multiple origins, that are handled sequentially).
 For now, its tasks are scheduled manually via recurring Scheduler tasks; but
 in the near future, the :term:`journal` will be used to do that.
 
 It first looks up the last :term:`snapshot` and determines what the main
 branch of origin is (the "Head branch") and what revision it points to
 (the "Head").
 Intrinsic metadata for that origin will be extracted from that revision.
 
 It schedules a Directory Metadata Indexer task for the root directory of
 that revision.
 
 
 Directory and Content Metadata Indexers
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 These two indexers do the hard part of the work. The Directory Metadata
 Indexer fetches the root directory associated with a revision, then extracts
 the metadata from that directory.
 
 To do so, it lists files in that directory, and looks for known names, such
 as :file:`codemeta.json`, :file:`package.json`, or :file:`pom.xml`. If there are any, it
 runs the Content Metadata Indexer on them, which in turn fetches their
 contents and runs them through extraction dictionaries/mappings.
 See below for details.
 
 Their results are saved in a database (the indexer storage), associated with
 the content and directory hashes.
 
 
 Origin Metadata Indexer
 ^^^^^^^^^^^^^^^^^^^^^^^
 
 The job of this indexer is very simple: it takes an origin identifier and
 uses the Origin-Head and Directory indexers to get metadata from the head
 directory of an origin, and copies the metadata of the former to a new table,
 to associate it with the latter.
 
 The reason for this is to be able to perform searches on metadata, and
 efficiently find out which origins matched the pattern.
 Running that search on the ``directory_metadata`` table would require either
 a reverse lookup from directories to origins, which is costly.
 
 
 Translation from ecosystem-specific metadata to CodeMeta
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Intrinsic metadata is extracted from files provided with a project's source
 code, and translated using `CodeMeta`_'s `crosswalk table`_.
 
 All input formats supported so far are straightforward dictionaries (eg. JSON)
 or can be accessed as such (eg. XML); and the first part of the translation is
 to map their keys to a term in the CodeMeta vocabulary.
 This is done by parsing the crosswalk table's `CSV file`_ and using it as a
 map between these two vocabularies; and this does not require any
 format-specific code in the indexers.
 
 The second part is to normalize values. As language-specific metadata files
 each have their way(s) of formatting these values, we need to turn them into
 the data type required by CodeMeta.
 This normalization makes up for most of the code of
 :py:mod:`swh.indexer.metadata_dictionary`.
 
 .. _CodeMeta: https://codemeta.github.io/
 .. _crosswalk table: https://codemeta.github.io/crosswalk/
 .. _CSV file: https://github.com/codemeta/codemeta/blob/master/crosswalk.csv
 
 
 Extrinsic metadata
 ------------------
 
 The :term:`extrinsic metadata` indexer works very differently from
 the :term:`intrinsic metadata` indexers we saw above.
 While the latter extract metadata from software artefacts (files and directories)
 which are already a core part of the archive, the former extracts such data from
 API calls pulled from forges and package managers, or pushed via the
 :ref:`SWORD deposit <swh-deposit>`.
 
 In order to preserve original information verbatim, the Software Heritage itself
 stores the result of these calls, independently of indexers, in their own archive
 as described in the :ref:`extrinsic-metadata-specification`.
 In this section, we assume this information is already present in the archive,
 but in the "raw extrinsic metadata" form, which needs to be translated to a common
 vocabulary to be useful, as with intrinsic metadata.
 
 The common vocabulary we chose is JSON-LD, with both CodeMeta and
 `ForgeFed's vocabulary`_ (including `ActivityStream's vocabulary`_)
 
 .. _ForgeFed's vocabulary: https://forgefed.org/vocabulary.html
 .. _ActivityStream's vocabulary: https://www.w3.org/TR/activitystreams-vocabulary/
 
 Instead of the four-step architecture above, the extrinsic-metadata indexer
 is standalone: it reads "raw extrinsic metadata" from the :ref:`swh-journal`,
 and produces new indexed entries in the database as they come.
 
 The caveat is that, while intrinsic metadata are always unambiguously authoritative
 (they are contained by their own origin repository, therefore they were added by
 the origin's "owners"), extrinsic metadata can be authored by third-parties.
 Support for third-party authorities is currently not implemented for this reason;
 so extrinsic metadata is only indexed when provided by the same
 forge/package-repository as the origin the metadata is about.
 Metadata on non-origin objects (typically, directories), is also ignored for
 this reason, for now.
 
 Assuming the metadata was provided by such an authority, it is then passed
 to metadata mappings; identified by a mimetype (or custom format name)
 they declared rather than filenames.
 
 
 Implementation status
 ---------------------
 
 Supported intrinsic metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following sources of intrinsic metadata are supported:
 
 * CodeMeta's `codemeta.json`_,
 * Maven's `pom.xml`_,
 * NPM's `package.json`_,
 * Python's `PKG-INFO`_,
 * Ruby's `.gemspec`_
 
 .. _codemeta.json: https://codemeta.github.io/terms/
 .. _pom.xml: https://maven.apache.org/pom.html
 .. _package.json: https://docs.npmjs.com/files/package.json
 .. _PKG-INFO: https://www.python.org/dev/peps/pep-0314/
 .. _.gemspec: https://guides.rubygems.org/specification-reference/
 
 Supported extrinsic metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following sources of extrinsic metadata are supported:
 
 * GitHub's `"repo" API <https://docs.github.com/en/rest/repos/repos#get-a-repository>`__
 
 
 
 Supported JSON-LD properties
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following terms may be found in the output of the metadata translation
 (other than the `codemeta` mapping, which is the identity function, and
 therefore supports all properties):
 
-.. program-output:: python3 -m swh.indexer.cli mapping list-terms --exclude-mapping codemeta
+.. program-output:: python3 -m swh.indexer.cli mapping list-terms --exclude-mapping codemeta --exclude-mapping json-sword-codemeta --exclude-mapping sword-codemeta
     :nostderr:
 
 
 
 
 Tutorials
 ---------
 
 The rest of this page is made of two tutorials: one to index
 :term:`intrinsic metadata` (ie. from a file in a VCS or in a tarball),
 and one to index :term:`extrinsic metadata` (ie. obtained via external means,
 such as GitHub's or GitLab's APIs).
 
 Adding support for additional ecosystem-specific intrinsic metadata
 -------------------------------------------------------------------
 
 This section will guide you through adding code to the metadata indexer to
 detect and translate new metadata formats.
 
 First, you should start by picking one of the `CodeMeta crosswalks`_.
 Then create a new file in :file:`swh-indexer/swh/indexer/metadata_dictionary/`, that
 will contain your code, and create a new class that inherits from helper
 classes, with some documentation about your indexer:
 
 .. code-block:: python
 
 	from .base import DictMapping, SingleFileIntrinsicMapping
 	from swh.indexer.codemeta import CROSSWALK_TABLE
 
 	class MyMapping(DictMapping, SingleFileIntrinsicMapping):
 		"""Dedicated class for ..."""
 		name = 'my-mapping'
 		filename = b'the-filename'
 		mapping = CROSSWALK_TABLE['Name of the CodeMeta crosswalk']
 
 .. _CodeMeta crosswalks: https://github.com/codemeta/codemeta/tree/master/crosswalks
 
 And reference it from :const:`swh.indexer.metadata_dictionary.INTRINSIC_MAPPINGS`.
 
 Then, add a ``string_fields`` attribute, that is the list of all keys whose
 values are simple text values. For instance, to
 `translate Python PKG-INFO`_, it's:
 
 .. code-block:: python
 
     string_fields = ['name', 'version', 'description', 'summary',
                      'author', 'author-email']
 
 
 These values will be automatically added to the above list of
 supported terms.
 
 .. _translate Python PKG-INFO: https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/metadata_dictionary/python.py
 
 Last step to get your code working: add a ``translate`` method that will
 take a single byte string as argument, turn it into a Python dictionary,
 whose keys are the ones of the input document, and pass it to
 ``_translate_dict``.
 
 For instance, if the input document is in JSON, it can be as simple as:
 
 .. code-block:: python
 
     def translate(self, raw_content):
         raw_content = raw_content.decode()  # bytes to str
         content_dict = json.loads(raw_content)  # str to dict
         return self._translate_dict(content_dict)  # convert to CodeMeta
 
 ``_translate_dict`` will do the heavy work of reading the crosswalk table for
 each of ``string_fields``, read the corresponding value in the ``content_dict``,
 and build a CodeMeta dictionary with the corresponding names from the
 crosswalk table.
 
 One last thing to run your code: add it to the list in
 :file:`swh-indexer/swh/indexer/metadata_dictionary/__init__.py`, so the rest of the
 code is aware of it.
 
 Now, you can run it:
 
 .. code-block:: shell
 
     python3 -m swh.indexer.metadata_dictionary MyMapping path/to/input/file
 
 and it will (hopefully) returns a CodeMeta object.
 
 If it works, well done!
 
 You can now improve your translation code further, by adding methods that
 will do more advanced conversion. For example, if there is a field named
 ``license`` containing an SPDX identifier, you must convert it to an URI,
 like this:
 
 .. code-block:: python
 
     def normalize_license(self, s):
         if isinstance(s, str):
-            return {"@id": "https://spdx.org/licenses/" + s}
+            return rdflib.URIRef("https://spdx.org/licenses/" + s)
 
 This method will automatically get called by ``_translate_dict`` when it
 finds a ``license`` field in ``content_dict``.
 
 Adding support for additional ecosystem-specific extrinsic metadata
 -------------------------------------------------------------------
 
 [this section is a work in progress]
diff --git a/mypy.ini b/mypy.ini
index 0df07a7..d63e789 100644
--- a/mypy.ini
+++ b/mypy.ini
@@ -1,30 +1,33 @@
 [mypy]
 namespace_packages = True
 warn_unused_ignores = True
 
 
 # 3rd party libraries without stubs (yet)
 
 [mypy-celery.*]
 ignore_missing_imports = True
 
 [mypy-confluent_kafka.*]
 ignore_missing_imports = True
 
 [mypy-magic.*]
 ignore_missing_imports = True
 
 [mypy-pkg_resources.*]
 ignore_missing_imports = True
 
 [mypy-psycopg2.*]
 ignore_missing_imports = True
 
 [mypy-pyld.*]
 ignore_missing_imports = True
 
 [mypy-pytest.*]
 ignore_missing_imports = True
 
+[mypy-rdflib.*]
+ignore_missing_imports = True
+
 [mypy-xmltodict.*]
 ignore_missing_imports = True
diff --git a/requirements.txt b/requirements.txt
index d9532ee..4dd61a2 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,10 +1,11 @@
 python-magic >= 0.4.13
 click
 # frozendict: dependency of pyld
 # the version 2.1.2 is causing segmentation faults
 # cf https://forge.softwareheritage.org/T3815
 frozendict != 2.1.2
 pyld
+rdflib
 sentry-sdk
 typing-extensions
 xmltodict
diff --git a/swh.indexer.egg-info/PKG-INFO b/swh.indexer.egg-info/PKG-INFO
index 7c37e58..e04be35 100644
--- a/swh.indexer.egg-info/PKG-INFO
+++ b/swh.indexer.egg-info/PKG-INFO
@@ -1,71 +1,71 @@
 Metadata-Version: 2.1
 Name: swh.indexer
-Version: 2.3.0
+Version: 2.4.0
 Summary: Software Heritage Content Indexer
 Home-page: https://forge.softwareheritage.org/diffusion/78/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-indexer
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-indexer/
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 License-File: LICENSE
 License-File: AUTHORS
 
 swh-indexer
 ============
 
 Tools to compute multiple indexes on SWH's raw contents:
 - content:
   - mimetype
   - ctags
   - language
   - fossology-license
   - metadata
 - revision:
   - metadata
 
 An indexer is in charge of:
 - looking up objects
 - extracting information from those objects
 - store those information in the swh-indexer db
 
 There are multiple indexers working on different object types:
   - content indexer: works with content sha1 hashes
   - revision indexer: works with revision sha1 hashes
   - origin indexer: works with origin identifiers
 
 Indexation procedure:
 - receive batch of ids
 - retrieve the associated data depending on object type
 - compute for that object some index
 - store the result to swh's storage
 
 Current content indexers:
 
 - mimetype (queue swh_indexer_content_mimetype): detect the encoding
   and mimetype
 
 - language (queue swh_indexer_content_language): detect the
   programming language
 
 - ctags (queue swh_indexer_content_ctags): compute tags information
 
 - fossology-license (queue swh_indexer_fossology_license): compute the
   license
 
 - metadata: translate file into translated_metadata dict
 
 Current revision indexers:
 
 - metadata: detects files containing metadata and retrieves translated_metadata
   in content_metadata table in storage or run content indexer to translate
   files.
diff --git a/swh.indexer.egg-info/SOURCES.txt b/swh.indexer.egg-info/SOURCES.txt
index 5c309da..3dddac5 100644
--- a/swh.indexer.egg-info/SOURCES.txt
+++ b/swh.indexer.egg-info/SOURCES.txt
@@ -1,161 +1,166 @@
 .git-blame-ignore-revs
 .gitignore
 .pre-commit-config.yaml
 AUTHORS
 CODE_OF_CONDUCT.md
 CONTRIBUTORS
 LICENSE
 MANIFEST.in
 Makefile
 Makefile.local
 README.md
 codemeta.json
 conftest.py
 mypy.ini
 pyproject.toml
 pytest.ini
 requirements-swh.txt
 requirements-test.txt
 requirements.txt
 setup.cfg
 setup.py
 tox.ini
 docs/.gitignore
 docs/Makefile
 docs/Makefile.local
 docs/README.md
 docs/cli.rst
 docs/conf.py
 docs/dev-info.rst
 docs/index.rst
 docs/metadata-workflow.rst
 docs/_static/.placeholder
 docs/_templates/.placeholder
 docs/images/.gitignore
 docs/images/Makefile
 docs/images/tasks-metadata-indexers.uml
 sql/bin/db-upgrade
 sql/bin/dot_add_content
 sql/doc/json
 sql/doc/json/.gitignore
 sql/doc/json/Makefile
 sql/doc/json/indexer_configuration.tool_configuration.schema.json
 sql/doc/json/revision_metadata.translated_metadata.json
 sql/json/.gitignore
 sql/json/Makefile
 sql/json/indexer_configuration.tool_configuration.schema.json
 sql/json/revision_metadata.translated_metadata.json
 swh/__init__.py
 swh.indexer.egg-info/PKG-INFO
 swh.indexer.egg-info/SOURCES.txt
 swh.indexer.egg-info/dependency_links.txt
 swh.indexer.egg-info/entry_points.txt
 swh.indexer.egg-info/requires.txt
 swh.indexer.egg-info/top_level.txt
 swh/indexer/__init__.py
 swh/indexer/cli.py
 swh/indexer/codemeta.py
 swh/indexer/fossology_license.py
 swh/indexer/indexer.py
 swh/indexer/journal_client.py
 swh/indexer/metadata.py
 swh/indexer/metadata_detector.py
 swh/indexer/mimetype.py
+swh/indexer/namespaces.py
 swh/indexer/origin_head.py
 swh/indexer/py.typed
 swh/indexer/rehash.py
 swh/indexer/tasks.py
 swh/indexer/data/composer.csv
+swh/indexer/data/nuget.csv
 swh/indexer/data/pubspec.csv
 swh/indexer/data/codemeta/CITATION
 swh/indexer/data/codemeta/LICENSE
 swh/indexer/data/codemeta/codemeta.jsonld
 swh/indexer/data/codemeta/crosswalk.csv
 swh/indexer/metadata_dictionary/__init__.py
 swh/indexer/metadata_dictionary/base.py
 swh/indexer/metadata_dictionary/cff.py
 swh/indexer/metadata_dictionary/codemeta.py
 swh/indexer/metadata_dictionary/composer.py
 swh/indexer/metadata_dictionary/dart.py
 swh/indexer/metadata_dictionary/github.py
 swh/indexer/metadata_dictionary/maven.py
 swh/indexer/metadata_dictionary/npm.py
+swh/indexer/metadata_dictionary/nuget.py
 swh/indexer/metadata_dictionary/python.py
 swh/indexer/metadata_dictionary/ruby.py
+swh/indexer/metadata_dictionary/utils.py
 swh/indexer/sql/10-superuser-init.sql
 swh/indexer/sql/20-enums.sql
 swh/indexer/sql/30-schema.sql
 swh/indexer/sql/50-data.sql
 swh/indexer/sql/50-func.sql
 swh/indexer/sql/60-indexes.sql
 swh/indexer/sql/upgrades/115.sql
 swh/indexer/sql/upgrades/116.sql
 swh/indexer/sql/upgrades/117.sql
 swh/indexer/sql/upgrades/118.sql
 swh/indexer/sql/upgrades/119.sql
 swh/indexer/sql/upgrades/120.sql
 swh/indexer/sql/upgrades/121.sql
 swh/indexer/sql/upgrades/122.sql
 swh/indexer/sql/upgrades/123.sql
 swh/indexer/sql/upgrades/124.sql
 swh/indexer/sql/upgrades/125.sql
 swh/indexer/sql/upgrades/126.sql
 swh/indexer/sql/upgrades/127.sql
 swh/indexer/sql/upgrades/128.sql
 swh/indexer/sql/upgrades/129.sql
 swh/indexer/sql/upgrades/130.sql
 swh/indexer/sql/upgrades/131.sql
 swh/indexer/sql/upgrades/132.sql
 swh/indexer/sql/upgrades/133.sql
 swh/indexer/sql/upgrades/134.sql
 swh/indexer/sql/upgrades/135.sql
 swh/indexer/storage/__init__.py
 swh/indexer/storage/converters.py
 swh/indexer/storage/db.py
 swh/indexer/storage/exc.py
 swh/indexer/storage/in_memory.py
 swh/indexer/storage/interface.py
 swh/indexer/storage/metrics.py
 swh/indexer/storage/model.py
 swh/indexer/storage/writer.py
 swh/indexer/storage/api/__init__.py
 swh/indexer/storage/api/client.py
 swh/indexer/storage/api/serializers.py
 swh/indexer/storage/api/server.py
 swh/indexer/tests/__init__.py
 swh/indexer/tests/conftest.py
 swh/indexer/tests/tasks.py
 swh/indexer/tests/test_cli.py
 swh/indexer/tests/test_codemeta.py
 swh/indexer/tests/test_fossology_license.py
 swh/indexer/tests/test_indexer.py
 swh/indexer/tests/test_journal_client.py
 swh/indexer/tests/test_metadata.py
 swh/indexer/tests/test_mimetype.py
 swh/indexer/tests/test_origin_head.py
 swh/indexer/tests/test_origin_metadata.py
 swh/indexer/tests/utils.py
 swh/indexer/tests/metadata_dictionary/__init__.py
 swh/indexer/tests/metadata_dictionary/test_cff.py
 swh/indexer/tests/metadata_dictionary/test_codemeta.py
 swh/indexer/tests/metadata_dictionary/test_composer.py
 swh/indexer/tests/metadata_dictionary/test_dart.py
 swh/indexer/tests/metadata_dictionary/test_github.py
 swh/indexer/tests/metadata_dictionary/test_maven.py
 swh/indexer/tests/metadata_dictionary/test_npm.py
+swh/indexer/tests/metadata_dictionary/test_nuget.py
 swh/indexer/tests/metadata_dictionary/test_python.py
 swh/indexer/tests/metadata_dictionary/test_ruby.py
 swh/indexer/tests/storage/__init__.py
 swh/indexer/tests/storage/conftest.py
 swh/indexer/tests/storage/generate_data_test.py
 swh/indexer/tests/storage/test_api_client.py
 swh/indexer/tests/storage/test_converters.py
 swh/indexer/tests/storage/test_in_memory.py
 swh/indexer/tests/storage/test_init.py
 swh/indexer/tests/storage/test_metrics.py
 swh/indexer/tests/storage/test_model.py
 swh/indexer/tests/storage/test_server.py
 swh/indexer/tests/storage/test_storage.py
 swh/indexer/tests/zz_celery/README
 swh/indexer/tests/zz_celery/__init__.py
 swh/indexer/tests/zz_celery/test_tasks.py
\ No newline at end of file
diff --git a/swh.indexer.egg-info/requires.txt b/swh.indexer.egg-info/requires.txt
index a418f0b..462c191 100644
--- a/swh.indexer.egg-info/requires.txt
+++ b/swh.indexer.egg-info/requires.txt
@@ -1,23 +1,24 @@
 python-magic>=0.4.13
 click
 frozendict!=2.1.2
 pyld
+rdflib
 sentry-sdk
 typing-extensions
 xmltodict
 swh.core[db,http]>=2.9
 swh.model>=0.0.15
 swh.objstorage>=0.2.2
 swh.scheduler>=0.5.2
 swh.storage>=0.22.0
 swh.journal>=0.1.0
 
 [testing]
 confluent-kafka
 hypothesis>=3.11.0
 pytest
 pytest-mock
 swh.scheduler[testing]>=0.5.0
 swh.storage[testing]>=0.10.0
 types-click
 types-pyyaml
diff --git a/swh/indexer/codemeta.py b/swh/indexer/codemeta.py
index 6c4ef58..f1d00b1 100644
--- a/swh/indexer/codemeta.py
+++ b/swh/indexer/codemeta.py
@@ -1,220 +1,189 @@
-# Copyright (C) 2018  The Software Heritage developers
+# Copyright (C) 2018-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import collections
 import csv
 import itertools
 import json
 import os.path
 import re
 from typing import Any, List
 
 from pyld import jsonld
+import rdflib
 
 import swh.indexer
+from swh.indexer.namespaces import ACTIVITYSTREAMS, CODEMETA, FORGEFED, SCHEMA
 
 _DATA_DIR = os.path.join(os.path.dirname(swh.indexer.__file__), "data")
 
 CROSSWALK_TABLE_PATH = os.path.join(_DATA_DIR, "codemeta", "crosswalk.csv")
 
 CODEMETA_CONTEXT_PATH = os.path.join(_DATA_DIR, "codemeta", "codemeta.jsonld")
 
 
 with open(CODEMETA_CONTEXT_PATH) as fd:
     CODEMETA_CONTEXT = json.load(fd)
 
 _EMPTY_PROCESSED_CONTEXT: Any = {"mappings": {}}
 _PROCESSED_CODEMETA_CONTEXT = jsonld.JsonLdProcessor().process_context(
     _EMPTY_PROCESSED_CONTEXT, CODEMETA_CONTEXT, None
 )
 
 CODEMETA_CONTEXT_URL = "https://doi.org/10.5063/schema/codemeta-2.0"
 CODEMETA_ALTERNATE_CONTEXT_URLS = {
     ("https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld")
 }
-CODEMETA_URI = "https://codemeta.github.io/terms/"
-SCHEMA_URI = "http://schema.org/"
-FORGEFED_URI = "https://forgefed.org/ns#"
-ACTIVITYSTREAMS_URI = "https://www.w3.org/ns/activitystreams#"
 
 
 PROPERTY_BLACKLIST = {
     # CodeMeta properties that we cannot properly represent.
-    SCHEMA_URI + "softwareRequirements",
-    CODEMETA_URI + "softwareSuggestions",
+    SCHEMA.softwareRequirements,
+    CODEMETA.softwareSuggestions,
     # Duplicate of 'author'
-    SCHEMA_URI + "creator",
+    SCHEMA.creator,
 }
 
 _codemeta_field_separator = re.compile(r"\s*[,/]\s*")
 
 
 def make_absolute_uri(local_name):
     """Parses codemeta.jsonld, and returns the @id of terms it defines.
 
     >>> make_absolute_uri("name")
     'http://schema.org/name'
     >>> make_absolute_uri("downloadUrl")
     'http://schema.org/downloadUrl'
     >>> make_absolute_uri("referencePublication")
     'https://codemeta.github.io/terms/referencePublication'
     """
     uri = jsonld.JsonLdProcessor.get_context_value(
         _PROCESSED_CODEMETA_CONTEXT, local_name, "@id"
     )
-    assert uri.startswith(("@", CODEMETA_URI, SCHEMA_URI)), (local_name, uri)
+    assert uri.startswith(("@", CODEMETA, SCHEMA)), (local_name, uri)
     return uri
 
 
 def _read_crosstable(fd):
     reader = csv.reader(fd)
     try:
         header = next(reader)
     except StopIteration:
         raise ValueError("empty file")
 
     data_sources = set(header) - {"Parent Type", "Property", "Type", "Description"}
 
     codemeta_translation = {data_source: {} for data_source in data_sources}
     terms = set()
 
     for line in reader:  # For each canonical name
         local_name = dict(zip(header, line))["Property"]
         if not local_name:
             continue
         canonical_name = make_absolute_uri(local_name)
-        if canonical_name in PROPERTY_BLACKLIST:
+        if rdflib.URIRef(canonical_name) in PROPERTY_BLACKLIST:
             continue
         terms.add(canonical_name)
         for (col, value) in zip(header, line):  # For each cell in the row
             if col in data_sources:
                 # If that's not the parentType/property/type/description
                 for local_name in _codemeta_field_separator.split(value):
                     # For each of the data source's properties that maps
                     # to this canonical name
                     if local_name.strip():
-                        codemeta_translation[col][local_name.strip()] = canonical_name
+                        codemeta_translation[col][local_name.strip()] = rdflib.URIRef(
+                            canonical_name
+                        )
 
     return (terms, codemeta_translation)
 
 
 with open(CROSSWALK_TABLE_PATH) as fd:
     (CODEMETA_TERMS, CROSSWALK_TABLE) = _read_crosstable(fd)
 
 
 def _document_loader(url, options=None):
     """Document loader for pyld.
 
     Reads the local codemeta.jsonld file instead of fetching it
     from the Internet every single time."""
-    if url == CODEMETA_CONTEXT_URL or url in CODEMETA_ALTERNATE_CONTEXT_URLS:
+    if (
+        url.lower() == CODEMETA_CONTEXT_URL.lower()
+        or url in CODEMETA_ALTERNATE_CONTEXT_URLS
+    ):
         return {
             "contextUrl": None,
             "documentUrl": url,
             "document": CODEMETA_CONTEXT,
         }
-    elif url == CODEMETA_URI:
+    elif url == CODEMETA:
         raise Exception(
             "{} is CodeMeta's URI, use {} as context url".format(
-                CODEMETA_URI, CODEMETA_CONTEXT_URL
+                CODEMETA, CODEMETA_CONTEXT_URL
             )
         )
     else:
         raise Exception(url)
 
 
 def compact(doc, forgefed: bool):
     """Same as `pyld.jsonld.compact`, but in the context of CodeMeta.
 
     Args:
         forgefed: Whether to add ForgeFed and ActivityStreams as compact URIs.
           This is typically used for extrinsic metadata documents, which frequently
           use properties from these namespaces.
     """
     contexts: List[Any] = [CODEMETA_CONTEXT_URL]
     if forgefed:
-        contexts.append({"as": ACTIVITYSTREAMS_URI, "forge": FORGEFED_URI})
+        contexts.append({"as": str(ACTIVITYSTREAMS), "forge": str(FORGEFED)})
     return jsonld.compact(doc, contexts, options={"documentLoader": _document_loader})
 
 
 def expand(doc):
     """Same as `pyld.jsonld.expand`, but in the context of CodeMeta."""
     return jsonld.expand(doc, options={"documentLoader": _document_loader})
 
 
-def merge_values(v1, v2):
-    """If v1 and v2 are of the form `{"@list": l1}` and `{"@list": l2}`,
-    returns `{"@list": l1 + l2}`.
-    Otherwise, make them lists (if they are not already) and concatenate
-    them.
-
-    >>> merge_values('a', 'b')
-    ['a', 'b']
-    >>> merge_values(['a', 'b'], 'c')
-    ['a', 'b', 'c']
-    >>> merge_values({'@list': ['a', 'b']}, {'@list': ['c']})
-    {'@list': ['a', 'b', 'c']}
-    """
-    if v1 is None:
-        return v2
-    elif v2 is None:
-        return v1
-    elif isinstance(v1, dict) and set(v1) == {"@list"}:
-        assert isinstance(v1["@list"], list)
-        if isinstance(v2, dict) and set(v2) == {"@list"}:
-            assert isinstance(v2["@list"], list)
-            return {"@list": v1["@list"] + v2["@list"]}
-        else:
-            raise ValueError("Cannot merge %r and %r" % (v1, v2))
-    else:
-        if isinstance(v2, dict) and "@list" in v2:
-            raise ValueError("Cannot merge %r and %r" % (v1, v2))
-        if not isinstance(v1, list):
-            v1 = [v1]
-        if not isinstance(v2, list):
-            v2 = [v2]
-        return v1 + v2
-
-
 def merge_documents(documents):
     """Takes a list of metadata dicts, each generated from a different
     metadata file, and merges them.
 
     Removes duplicates, if any."""
     documents = list(itertools.chain.from_iterable(map(expand, documents)))
     merged_document = collections.defaultdict(list)
     for document in documents:
         for (key, values) in document.items():
             if key == "@id":
                 # @id does not get expanded to a list
                 value = values
 
                 # Only one @id is allowed, move it to sameAs
                 if "@id" not in merged_document:
                     merged_document["@id"] = value
                 elif value != merged_document["@id"]:
-                    if value not in merged_document[SCHEMA_URI + "sameAs"]:
-                        merged_document[SCHEMA_URI + "sameAs"].append(value)
+                    if value not in merged_document[SCHEMA.sameAs]:
+                        merged_document[SCHEMA.sameAs].append(value)
             else:
                 for value in values:
                     if isinstance(value, dict) and set(value) == {"@list"}:
                         # Value is of the form {'@list': [item1, item2]}
                         # instead of the usual [item1, item2].
                         # We need to merge the inner lists (and mostly
                         # preserve order).
                         merged_value = merged_document.setdefault(key, {"@list": []})
                         for subvalue in value["@list"]:
                             # merged_value must be of the form
                             # {'@list': [item1, item2]}; as it is the same
                             # type as value, which is an @list.
                             if subvalue not in merged_value["@list"]:
                                 merged_value["@list"].append(subvalue)
                     elif value not in merged_document[key]:
                         merged_document[key].append(value)
 
     # XXX: we should set forgefed=True when merging extrinsic_metadata documents.
     # however, this function is only used to merge multiple files of the same
     # directory (which is only for intrinsic-metadata), so it is not an issue for now
     return compact(merged_document, forgefed=False)
diff --git a/swh/indexer/data/nuget.csv b/swh/indexer/data/nuget.csv
new file mode 100644
index 0000000..2155f10
--- /dev/null
+++ b/swh/indexer/data/nuget.csv
@@ -0,0 +1,68 @@
+Property,NuGet
+codeRepository,repository.url
+programmingLanguage,
+runtimePlatform,
+targetProduct,
+applicationCategory,
+applicationSubCategory,
+downloadUrl,
+fileSize,
+installUrl,
+memoryRequirements,
+operatingSystem,
+permissions,
+processorRequirements,
+releaseNotes,releaseNotes
+softwareHelp,
+softwareRequirements,
+softwareVersion,
+storageRequirements,
+supportingData,
+author,authors
+citation,
+contributor,
+copyrightHolder,
+copyrightYear,
+dateCreated,
+dateModified,
+datePublished,
+editor,
+encoding,
+fileFormat,
+funder,
+keywords,tags
+license,license/licenseUrl
+producer,
+provider,
+publisher,
+sponsor,
+version,version
+isAccessibleForFree,
+isPartOf,
+hasPart,
+position,
+description,description/summary
+identifier,
+name,name
+sameAs,
+url,projectUrl
+relatedLink,
+givenName,
+familyName,
+email,
+affiliation,
+identifier,id
+name,
+address,
+type,
+id,
+softwareSuggestions,
+maintainer,
+contIntegration,
+buildInstructions,
+developmentStatus,
+embargoDate,
+funding,
+issueTracker,
+referencePublication,
+readme,
diff --git a/swh/indexer/metadata_dictionary/__init__.py b/swh/indexer/metadata_dictionary/__init__.py
index 2d67c15..99c2504 100644
--- a/swh/indexer/metadata_dictionary/__init__.py
+++ b/swh/indexer/metadata_dictionary/__init__.py
@@ -1,56 +1,59 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import collections
 from typing import Dict, Type
 
 import click
 
-from . import cff, codemeta, composer, dart, github, maven, npm, python, ruby
+from . import cff, codemeta, composer, dart, github, maven, npm, nuget, python, ruby
 from .base import BaseExtrinsicMapping, BaseIntrinsicMapping, BaseMapping
 
 INTRINSIC_MAPPINGS: Dict[str, Type[BaseIntrinsicMapping]] = {
     "CffMapping": cff.CffMapping,
     "CodemetaMapping": codemeta.CodemetaMapping,
     "GemspecMapping": ruby.GemspecMapping,
     "MavenMapping": maven.MavenMapping,
     "NpmMapping": npm.NpmMapping,
     "PubMapping": dart.PubspecMapping,
     "PythonPkginfoMapping": python.PythonPkginfoMapping,
     "ComposerMapping": composer.ComposerMapping,
+    "NuGetMapping": nuget.NuGetMapping,
 }
 
 EXTRINSIC_MAPPINGS: Dict[str, Type[BaseExtrinsicMapping]] = {
     "GitHubMapping": github.GitHubMapping,
+    "JsonSwordCodemetaMapping": codemeta.JsonSwordCodemetaMapping,
+    "SwordCodemetaMapping": codemeta.SwordCodemetaMapping,
 }
 
 
 MAPPINGS: Dict[str, Type[BaseMapping]] = {**INTRINSIC_MAPPINGS, **EXTRINSIC_MAPPINGS}
 
 
 def list_terms():
     """Returns a dictionary with all supported CodeMeta terms as keys,
     and the mappings that support each of them as values."""
     d = collections.defaultdict(set)
     for mapping in MAPPINGS.values():
         for term in mapping.supported_terms():
             d[term].add(mapping)
     return d
 
 
 @click.command()
 @click.argument("mapping_name")
 @click.argument("file_name")
 def main(mapping_name: str, file_name: str):
     from pprint import pprint
 
     with open(file_name, "rb") as fd:
         file_content = fd.read()
     res = MAPPINGS[mapping_name]().translate(file_content)
     pprint(res)
 
 
 if __name__ == "__main__":
     main()
diff --git a/swh/indexer/metadata_dictionary/base.py b/swh/indexer/metadata_dictionary/base.py
index 2ac4adc..657c6a4 100644
--- a/swh/indexer/metadata_dictionary/base.py
+++ b/swh/indexer/metadata_dictionary/base.py
@@ -1,270 +1,347 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import json
 import logging
 from typing import Any, Callable, Dict, List, Optional, Tuple, TypeVar
+import uuid
+import xml.parsers.expat
 
+from pyld import jsonld
+import rdflib
 from typing_extensions import TypedDict
+import xmltodict
 import yaml
 
-from swh.indexer.codemeta import SCHEMA_URI, compact, merge_values
+from swh.indexer.codemeta import _document_loader, compact
+from swh.indexer.namespaces import RDF, SCHEMA
 from swh.indexer.storage.interface import Sha1
 
 
 class DirectoryLsEntry(TypedDict):
     target: Sha1
     sha1: Sha1
     name: bytes
     type: str
 
 
 TTranslateCallable = TypeVar(
-    "TTranslateCallable", bound=Callable[[Any, Dict[str, Any], Any], None]
+    "TTranslateCallable",
+    bound=Callable[[Any, rdflib.Graph, rdflib.term.BNode, Any], None],
 )
 
 
-def produce_terms(
-    namespace: str, terms: List[str]
-) -> Callable[[TTranslateCallable], TTranslateCallable]:
+def produce_terms(*uris: str) -> Callable[[TTranslateCallable], TTranslateCallable]:
     """Returns a decorator that marks the decorated function as adding
     the given terms to the ``translated_metadata`` dict"""
 
     def decorator(f: TTranslateCallable) -> TTranslateCallable:
         if not hasattr(f, "produced_terms"):
             f.produced_terms = []  # type: ignore
-        f.produced_terms.extend(namespace + term for term in terms)  # type: ignore
+        f.produced_terms.extend(uris)  # type: ignore
         return f
 
     return decorator
 
 
 class BaseMapping:
     """Base class for :class:`BaseExtrinsicMapping` and :class:`BaseIntrinsicMapping`,
     not to be inherited directly."""
 
     def __init__(self, log_suffix=""):
         self.log_suffix = log_suffix
         self.log = logging.getLogger(
             "%s.%s" % (self.__class__.__module__, self.__class__.__name__)
         )
 
     @property
     def name(self):
         """A name of this mapping, used as an identifier in the
         indexer storage."""
         raise NotImplementedError(f"{self.__class__.__name__}.name")
 
-    def translate(self, file_content: bytes) -> Optional[Dict]:
-        """Translates metadata, from the content of a file or of a RawExtrinsicMetadata
-        object."""
+    def translate(self, raw_content: bytes) -> Optional[Dict]:
+        """
+        Translates content by parsing content from a bytestring containing
+        mapping-specific data and translating with the appropriate mapping
+        to JSON-LD using the Codemeta and ForgeFed vocabularies.
+
+        Args:
+            raw_content: raw content to translate
+
+        Returns:
+            translated metadata in JSON friendly form needed for the content
+            if parseable, :const:`None` otherwise.
+
+        """
         raise NotImplementedError(f"{self.__class__.__name__}.translate")
 
     def normalize_translation(self, metadata: Dict[str, Any]) -> Dict[str, Any]:
         raise NotImplementedError(f"{self.__class__.__name__}.normalize_translation")
 
 
 class BaseExtrinsicMapping(BaseMapping):
     """Base class for extrinsic_metadata mappings to inherit from
 
     To implement a new mapping:
 
     - inherit this class
     - override translate function
     """
 
     @classmethod
     def extrinsic_metadata_formats(cls) -> Tuple[str, ...]:
         """
         Returns the list of extrinsic metadata formats which can be translated
         by this mapping
         """
         raise NotImplementedError(f"{cls.__name__}.extrinsic_metadata_formats")
 
     def normalize_translation(self, metadata: Dict[str, Any]) -> Dict[str, Any]:
         return compact(metadata, forgefed=True)
 
 
 class BaseIntrinsicMapping(BaseMapping):
     """Base class for intrinsic-metadata mappings to inherit from
 
     To implement a new mapping:
 
     - inherit this class
     - override translate function
     """
 
     @classmethod
     def detect_metadata_files(cls, file_entries: List[DirectoryLsEntry]) -> List[Sha1]:
         """
         Returns the sha1 hashes of files which can be translated by this mapping
         """
         raise NotImplementedError(f"{cls.__name__}.detect_metadata_files")
 
     def normalize_translation(self, metadata: Dict[str, Any]) -> Dict[str, Any]:
         return compact(metadata, forgefed=False)
 
 
 class SingleFileIntrinsicMapping(BaseIntrinsicMapping):
     """Base class for all intrinsic metadata mappings that use a single file as input."""
 
     @property
     def filename(self):
         """The .json file to extract metadata from."""
         raise NotImplementedError(f"{self.__class__.__name__}.filename")
 
     @classmethod
     def detect_metadata_files(cls, file_entries: List[DirectoryLsEntry]) -> List[Sha1]:
         for entry in file_entries:
             if entry["name"].lower() == cls.filename:
                 return [entry["sha1"]]
         return []
 
 
 class DictMapping(BaseMapping):
     """Base class for mappings that take as input a file that is mostly
     a key-value store (eg. a shallow JSON dict)."""
 
-    string_fields = []  # type: List[str]
+    string_fields: List[str] = []
     """List of fields that are simple strings, and don't need any
     normalization."""
 
+    uri_fields: List[str] = []
+    """List of fields that are simple URIs, and don't need any
+    normalization."""
+
     @property
     def mapping(self):
         """A translation dict to map dict keys into a canonical name."""
         raise NotImplementedError(f"{self.__class__.__name__}.mapping")
 
     @staticmethod
     def _normalize_method_name(name: str) -> str:
         return name.replace("-", "_")
 
     @classmethod
     def supported_terms(cls):
         # one-to-one mapping from the original key to a CodeMeta term
         simple_terms = {
-            term
+            str(term)
             for (key, term) in cls.mapping.items()
-            if key in cls.string_fields
+            if key in cls.string_fields + cls.uri_fields
             or hasattr(cls, "normalize_" + cls._normalize_method_name(key))
         }
 
         # more complex mapping from the original key to JSON-LD
         complex_terms = {
-            term
+            str(term)
             for meth_name in dir(cls)
             if meth_name.startswith("translate_")
             for term in getattr(getattr(cls, meth_name), "produced_terms", [])
         }
 
         return simple_terms | complex_terms
 
-    def _translate_dict(
-        self, content_dict: Dict, *, normalize: bool = True
-    ) -> Dict[str, str]:
+    def _translate_dict(self, content_dict: Dict) -> Dict[str, Any]:
         """
         Translates content  by parsing content from a dict object
         and translating with the appropriate mapping
 
         Args:
             content_dict (dict): content dict to translate
 
         Returns:
             dict: translated metadata in json-friendly form needed for
             the indexer
 
         """
-        translated_metadata = {"@type": SCHEMA_URI + "SoftwareSourceCode"}
+        graph = rdflib.Graph()
+
+        # The main object being described (the SoftwareSourceCode) does not necessarily
+        # may or may not have an id.
+        # Either way, we temporarily use this URI to identify it. Unfortunately,
+        # we cannot use a blank node as we need to use it for JSON-LD framing later,
+        # and blank nodes cannot be used for framing in JSON-LD >= 1.1
+        root_id = (
+            "https://www.softwareheritage.org/schema/2022/indexer/tmp-node/"
+            + str(uuid.uuid4())
+        )
+        root = rdflib.URIRef(root_id)
+        graph.add((root, RDF.type, SCHEMA.SoftwareSourceCode))
+
         for k, v in content_dict.items():
             # First, check if there is a specific translation
             # method for this key
             translation_method = getattr(
                 self, "translate_" + self._normalize_method_name(k), None
             )
             if translation_method:
-                translation_method(translated_metadata, v)
+                translation_method(graph, root, v)
             elif k in self.mapping:
                 # if there is no method, but the key is known from the
                 # crosswalk table
                 codemeta_key = self.mapping[k]
 
-                # if there is a normalization method, use it on the value
+                # if there is a normalization method, use it on the value,
+                # and add its results to the triples
                 normalization_method = getattr(
                     self, "normalize_" + self._normalize_method_name(k), None
                 )
                 if normalization_method:
                     v = normalization_method(v)
+                    if v is None:
+                        pass
+                    elif isinstance(v, list):
+                        for item in reversed(v):
+                            graph.add((root, codemeta_key, item))
+                    else:
+                        graph.add((root, codemeta_key, v))
                 elif k in self.string_fields and isinstance(v, str):
-                    pass
+                    graph.add((root, codemeta_key, rdflib.Literal(v)))
                 elif k in self.string_fields and isinstance(v, list):
-                    v = [x for x in v if isinstance(x, str)]
+                    for item in v:
+                        graph.add((root, codemeta_key, rdflib.Literal(item)))
+                elif k in self.uri_fields and isinstance(v, str):
+                    graph.add((root, codemeta_key, rdflib.URIRef(v)))
+                elif k in self.uri_fields and isinstance(v, list):
+                    for item in v:
+                        graph.add((root, codemeta_key, rdflib.URIRef(item)))
                 else:
                     continue
 
-                # set the translation metadata with the normalized value
-                if codemeta_key in translated_metadata:
-                    translated_metadata[codemeta_key] = merge_values(
-                        translated_metadata[codemeta_key], v
-                    )
-                else:
-                    translated_metadata[codemeta_key] = v
+        self.extra_translation(graph, root, content_dict)
 
-        if normalize:
-            return self.normalize_translation(translated_metadata)
-        else:
-            return translated_metadata
+        # Convert from rdflib's internal graph representation to JSON
+        s = graph.serialize(format="application/ld+json")
 
+        # Load from JSON to a list of Python objects
+        jsonld_graph = json.loads(s)
 
-class JsonMapping(DictMapping):
-    """Base class for all mappings that use JSON data as input."""
+        # Use JSON-LD framing to turn the graph into a rooted tree
+        # frame = {"@type": str(SCHEMA.SoftwareSourceCode)}
+        translated_metadata = jsonld.frame(
+            jsonld_graph,
+            {"@id": root_id},
+            options={
+                "documentLoader": _document_loader,
+                "processingMode": "json-ld-1.1",
+            },
+        )
 
-    def translate(self, raw_content: bytes) -> Optional[Dict]:
+        # Remove the temporary id we added at the beginning
+        if isinstance(translated_metadata["@id"], list):
+            translated_metadata["@id"].remove(root_id)
+        else:
+            del translated_metadata["@id"]
+
+        return self.normalize_translation(translated_metadata)
+
+    def extra_translation(
+        self, graph: rdflib.Graph, root: rdflib.term.Node, d: Dict[str, Any]
+    ):
+        """Called at the end of the translation process, and may add arbitrary triples
+        to ``graph`` based on the input dictionary (passed as ``d``).
         """
-        Translates content by parsing content from a bytestring containing
-        json data and translating with the appropriate mapping
+        pass
 
-        Args:
-            raw_content (bytes): raw content to translate
 
-        Returns:
-            dict: translated metadata in json-friendly form needed for
-            the indexer
+class JsonMapping(DictMapping):
+    """Base class for all mappings that use JSON data as input."""
 
-        """
+    def translate(self, raw_content: bytes) -> Optional[Dict]:
         try:
             raw_content_string: str = raw_content.decode()
         except UnicodeDecodeError:
             self.log.warning("Error unidecoding from %s", self.log_suffix)
             return None
         try:
             content_dict = json.loads(raw_content_string)
         except json.JSONDecodeError:
             self.log.warning("Error unjsoning from %s", self.log_suffix)
             return None
         if isinstance(content_dict, dict):
             return self._translate_dict(content_dict)
         return None
 
 
+class XmlMapping(DictMapping):
+    """Base class for all mappings that use XML data as input."""
+
+    def translate(self, raw_content: bytes) -> Optional[Dict]:
+        try:
+            d = xmltodict.parse(raw_content)
+        except xml.parsers.expat.ExpatError:
+            self.log.warning("Error parsing XML from %s", self.log_suffix)
+            return None
+        except UnicodeDecodeError:
+            self.log.warning("Error unidecoding XML from %s", self.log_suffix)
+            return None
+        except (LookupError, ValueError):
+            # unknown encoding or multi-byte encoding
+            self.log.warning("Error detecting XML encoding from %s", self.log_suffix)
+            return None
+        if not isinstance(d, dict):
+            self.log.warning("Skipping ill-formed XML content: %s", raw_content)
+            return None
+        return self._translate_dict(d)
+
+
 class SafeLoader(yaml.SafeLoader):
     yaml_implicit_resolvers = {
         k: [r for r in v if r[0] != "tag:yaml.org,2002:timestamp"]
         for k, v in yaml.SafeLoader.yaml_implicit_resolvers.items()
     }
 
 
 class YamlMapping(DictMapping, SingleFileIntrinsicMapping):
     """Base class for all mappings that use Yaml data as input."""
 
     def translate(self, raw_content: bytes) -> Optional[Dict[str, str]]:
         raw_content_string: str = raw_content.decode()
         try:
             content_dict = yaml.load(raw_content_string, Loader=SafeLoader)
         except yaml.scanner.ScannerError:
             return None
 
         if isinstance(content_dict, dict):
             return self._translate_dict(content_dict)
 
         return None
diff --git a/swh/indexer/metadata_dictionary/cff.py b/swh/indexer/metadata_dictionary/cff.py
index 286ec77..12121cc 100644
--- a/swh/indexer/metadata_dictionary/cff.py
+++ b/swh/indexer/metadata_dictionary/cff.py
@@ -1,53 +1,63 @@
-from typing import Dict, List, Optional, Union
+# Copyright (C) 2021-2022  The Software Heritage developers
+# See the AUTHORS file at the top-level directory of this distribution
+# License: GNU General Public License version 3, or any later version
+# See top-level LICENSE file for more information
 
-from swh.indexer.codemeta import CROSSWALK_TABLE, SCHEMA_URI
+from typing import List
+
+from rdflib import BNode, Graph, Literal, URIRef
+import rdflib.term
+
+from swh.indexer.codemeta import CROSSWALK_TABLE
+from swh.indexer.namespaces import RDF, SCHEMA
 
 from .base import YamlMapping
+from .utils import add_map
+
+DOI = URIRef("https://doi.org/")
+SPDX = URIRef("https://spdx.org/licenses/")
 
 
 class CffMapping(YamlMapping):
     """Dedicated class for Citation (CITATION.cff) mapping and translation"""
 
     name = "cff"
     filename = b"CITATION.cff"
     mapping = CROSSWALK_TABLE["Citation File Format Core (CFF-Core) 1.0.2"]
     string_fields = ["keywords", "license", "abstract", "version", "doi"]
-
-    def normalize_authors(self, d: List[dict]) -> Dict[str, list]:
-        result = []
-        for author in d:
-            author_data: Dict[str, Optional[Union[str, Dict]]] = {
-                "@type": SCHEMA_URI + "Person"
-            }
-            if "orcid" in author and isinstance(author["orcid"], str):
-                author_data["@id"] = author["orcid"]
-            if "affiliation" in author and isinstance(author["affiliation"], str):
-                author_data[SCHEMA_URI + "affiliation"] = {
-                    "@type": SCHEMA_URI + "Organization",
-                    SCHEMA_URI + "name": author["affiliation"],
-                }
-            if "family-names" in author and isinstance(author["family-names"], str):
-                author_data[SCHEMA_URI + "familyName"] = author["family-names"]
-            if "given-names" in author and isinstance(author["given-names"], str):
-                author_data[SCHEMA_URI + "givenName"] = author["given-names"]
-
-            result.append(author_data)
-
-        result_final = {"@list": result}
-        return result_final
-
-    def normalize_doi(self, s: str) -> Dict[str, str]:
-        if isinstance(s, str):
-            return {"@id": "https://doi.org/" + s}
-
-    def normalize_license(self, s: str) -> Dict[str, str]:
+    uri_fields = ["repository-code"]
+
+    def _translate_author(self, graph: Graph, author: dict) -> rdflib.term.Node:
+        node: rdflib.term.Node
+        if "orcid" in author and isinstance(author["orcid"], str):
+            node = URIRef(author["orcid"])
+        else:
+            node = BNode()
+        graph.add((node, RDF.type, SCHEMA.Person))
+        if "affiliation" in author and isinstance(author["affiliation"], str):
+            affiliation = BNode()
+            graph.add((node, SCHEMA.affiliation, affiliation))
+            graph.add((affiliation, RDF.type, SCHEMA.Organization))
+            graph.add((affiliation, SCHEMA.name, Literal(author["affiliation"])))
+        if "family-names" in author and isinstance(author["family-names"], str):
+            graph.add((node, SCHEMA.familyName, Literal(author["family-names"])))
+        if "given-names" in author and isinstance(author["given-names"], str):
+            graph.add((node, SCHEMA.givenName, Literal(author["given-names"])))
+        return node
+
+    def translate_authors(
+        self, graph: Graph, root: URIRef, authors: List[dict]
+    ) -> None:
+        add_map(graph, root, SCHEMA.author, self._translate_author, authors)
+
+    def normalize_doi(self, s: str) -> URIRef:
         if isinstance(s, str):
-            return {"@id": "https://spdx.org/licenses/" + s}
+            return DOI + s
 
-    def normalize_repository_code(self, s: str) -> Dict[str, str]:
+    def normalize_license(self, s: str) -> URIRef:
         if isinstance(s, str):
-            return {"@id": s}
+            return SPDX + s
 
-    def normalize_date_released(self, s: str) -> Dict[str, str]:
+    def normalize_date_released(self, s: str) -> Literal:
         if isinstance(s, str):
-            return {"@value": s, "@type": SCHEMA_URI + "Date"}
+            return Literal(s, datatype=SCHEMA.Date)
diff --git a/swh/indexer/metadata_dictionary/codemeta.py b/swh/indexer/metadata_dictionary/codemeta.py
index f0f0d09..4da5eb6 100644
--- a/swh/indexer/metadata_dictionary/codemeta.py
+++ b/swh/indexer/metadata_dictionary/codemeta.py
@@ -1,31 +1,149 @@
-# Copyright (C) 2018-2019  The Software Heritage developers
+# Copyright (C) 2018-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
+import collections
 import json
-from typing import Any, Dict, List, Optional
+import re
+from typing import Any, Dict, List, Optional, Tuple
+import xml.etree.ElementTree as ET
 
-from swh.indexer.codemeta import CODEMETA_TERMS, expand
+import xmltodict
 
-from .base import SingleFileIntrinsicMapping
+from swh.indexer.codemeta import CODEMETA_CONTEXT_URL, CODEMETA_TERMS, compact, expand
+
+from .base import BaseExtrinsicMapping, SingleFileIntrinsicMapping
+
+ATOM_URI = "http://www.w3.org/2005/Atom"
+
+_TAG_RE = re.compile(r"\{(?P<namespace>.*?)\}(?P<localname>.*)")
+_IGNORED_NAMESPACES = ("http://www.w3.org/2005/Atom",)
 
 
 class CodemetaMapping(SingleFileIntrinsicMapping):
     """
     dedicated class for CodeMeta (codemeta.json) mapping and translation
     """
 
     name = "codemeta"
     filename = b"codemeta.json"
     string_fields = None
 
     @classmethod
     def supported_terms(cls) -> List[str]:
         return [term for term in CODEMETA_TERMS if not term.startswith("@")]
 
     def translate(self, content: bytes) -> Optional[Dict[str, Any]]:
         try:
             return self.normalize_translation(expand(json.loads(content.decode())))
         except Exception:
             return None
+
+
+class SwordCodemetaMapping(BaseExtrinsicMapping):
+    """
+    dedicated class for mapping and translation from JSON-LD statements
+    embedded in SWORD documents, optionally using Codemeta contexts,
+    as described in the :ref:`deposit-protocol`.
+    """
+
+    name = "sword-codemeta"
+
+    @classmethod
+    def extrinsic_metadata_formats(cls) -> Tuple[str, ...]:
+        return (
+            "sword-v2-atom-codemeta",
+            "sword-v2-atom-codemeta-v2",
+        )
+
+    @classmethod
+    def supported_terms(cls) -> List[str]:
+        return [term for term in CODEMETA_TERMS if not term.startswith("@")]
+
+    def xml_to_jsonld(self, e: ET.Element) -> Dict[str, Any]:
+        doc: Dict[str, List[Dict[str, Any]]] = collections.defaultdict(list)
+        for child in e:
+            m = _TAG_RE.match(child.tag)
+            assert m, f"Tag with no namespace: {child}"
+            namespace = m.group("namespace")
+            localname = m.group("localname")
+            if namespace == ATOM_URI and localname in ("title", "name"):
+                # Convert Atom to Codemeta name; in case codemeta:name
+                # is not provided or different
+                doc["name"].append(self.xml_to_jsonld(child))
+            elif namespace == ATOM_URI and localname in ("author", "email"):
+                # ditto for these author properties (note that author email is also
+                # covered by the previous test)
+                doc[localname].append(self.xml_to_jsonld(child))
+            elif namespace in _IGNORED_NAMESPACES:
+                # SWORD-specific namespace that is not interesting to translate
+                pass
+            elif namespace.lower() == CODEMETA_CONTEXT_URL:
+                # It is a term defined by the context; write is as-is and JSON-LD
+                # expansion will convert it to a full URI based on
+                # "@context": CODEMETA_CONTEXT_URL
+                doc[localname].append(self.xml_to_jsonld(child))
+            else:
+                # Otherwise, we already know the URI
+                doc[f"{namespace}{localname}"].append(self.xml_to_jsonld(child))
+
+        # The above needed doc values to be list to work; now we allow any type
+        # of value as key "@value" cannot have a list as value.
+        doc_: Dict[str, Any] = doc
+
+        text = e.text.strip() if e.text else None
+        if text:
+            # TODO: check doc is empty, and raise mixed-content error otherwise?
+            doc_["@value"] = text
+
+        return doc_
+
+    def translate(self, content: bytes) -> Optional[Dict[str, Any]]:
+        # Parse XML
+        root = ET.fromstring(content)
+
+        # Transform to JSON-LD document
+        doc = self.xml_to_jsonld(root)
+
+        # Add @context to JSON-LD expansion replaces the "codemeta:" prefix
+        # hash (which uses the context URL as namespace URI for historical
+        # reasons) into properties in `http://schema.org/` and
+        # `https://codemeta.github.io/terms/` namespaces
+        doc["@context"] = CODEMETA_CONTEXT_URL
+
+        # Normalize as a Codemeta document
+        return self.normalize_translation(expand(doc))
+
+    def normalize_translation(self, metadata: Dict[str, Any]) -> Dict[str, Any]:
+        return compact(metadata, forgefed=False)
+
+
+class JsonSwordCodemetaMapping(SwordCodemetaMapping):
+    """
+    Variant of :class:`SwordCodemetaMapping` that reads the legacy
+    ``sword-v2-atom-codemeta-v2-in-json`` format and converts it back to
+    ``sword-v2-atom-codemeta-v2`` XML
+    """
+
+    name = "json-sword-codemeta"
+
+    @classmethod
+    def extrinsic_metadata_formats(cls) -> Tuple[str, ...]:
+        return ("sword-v2-atom-codemeta-v2-in-json",)
+
+    def translate(self, content: bytes) -> Optional[Dict[str, Any]]:
+        # ``content`` was generated by calling ``xmltodict.parse()`` on a XML document,
+        # so ``xmltodict.unparse()`` is guaranteed to return a document that is
+        # semantically equivalent to the original and pass it to SwordCodemetaMapping.
+        json_doc = json.loads(content)
+
+        if json_doc.get("@xmlns") != ATOM_URI:
+            # Technically, non-default XMLNS were allowed, but it does not seem like
+            # anyone used them, so they do not need to be implemented here.
+            raise NotImplementedError(f"Unexpected XMLNS set: {json_doc}")
+
+        # Root tag was stripped by swh-deposit
+        json_doc = {"entry": json_doc}
+
+        return super().translate(xmltodict.unparse(json_doc))
diff --git a/swh/indexer/metadata_dictionary/composer.py b/swh/indexer/metadata_dictionary/composer.py
index c02f5d8..a43fc23 100644
--- a/swh/indexer/metadata_dictionary/composer.py
+++ b/swh/indexer/metadata_dictionary/composer.py
@@ -1,56 +1,61 @@
 # Copyright (C) 2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import os.path
+from typing import Optional
 
-from swh.indexer.codemeta import _DATA_DIR, SCHEMA_URI, _read_crosstable
+from rdflib import BNode, Graph, Literal, URIRef
+
+from swh.indexer.codemeta import _DATA_DIR, _read_crosstable
+from swh.indexer.namespaces import RDF, SCHEMA
 
 from .base import JsonMapping, SingleFileIntrinsicMapping
+from .utils import add_map
+
+SPDX = URIRef("https://spdx.org/licenses/")
+
 
 COMPOSER_TABLE_PATH = os.path.join(_DATA_DIR, "composer.csv")
 
 with open(COMPOSER_TABLE_PATH) as fd:
     (CODEMETA_TERMS, COMPOSER_TABLE) = _read_crosstable(fd)
 
 
 class ComposerMapping(JsonMapping, SingleFileIntrinsicMapping):
     """Dedicated class for Packagist(composer.json) mapping and translation"""
 
     name = "composer"
     mapping = COMPOSER_TABLE["Composer"]
     filename = b"composer.json"
     string_fields = [
         "name",
         "description",
         "version",
         "keywords",
-        "homepage",
         "license",
         "author",
         "authors",
     ]
-
-    def normalize_homepage(self, s):
-        if isinstance(s, str):
-            return {"@id": s}
+    uri_fields = ["homepage"]
 
     def normalize_license(self, s):
         if isinstance(s, str):
-            return {"@id": "https://spdx.org/licenses/" + s}
+            return SPDX + s
 
-    def normalize_authors(self, author_list):
-        authors = []
-        for author in author_list:
-            author_obj = {"@type": SCHEMA_URI + "Person"}
+    def _translate_author(self, graph: Graph, author) -> Optional[BNode]:
+        if not isinstance(author, dict):
+            return None
+        node = BNode()
+        graph.add((node, RDF.type, SCHEMA.Person))
 
-            if isinstance(author, dict):
-                if isinstance(author.get("name", None), str):
-                    author_obj[SCHEMA_URI + "name"] = author.get("name", None)
-                if isinstance(author.get("email", None), str):
-                    author_obj[SCHEMA_URI + "email"] = author.get("email", None)
+        if isinstance(author.get("name"), str):
+            graph.add((node, SCHEMA.name, Literal(author["name"])))
+        if isinstance(author.get("email"), str):
+            graph.add((node, SCHEMA.email, Literal(author["email"])))
 
-                authors.append(author_obj)
+        return node
 
-        return {"@list": authors}
+    def translate_authors(self, graph: Graph, root: URIRef, authors) -> None:
+        add_map(graph, root, SCHEMA.author, self._translate_author, authors)
diff --git a/swh/indexer/metadata_dictionary/dart.py b/swh/indexer/metadata_dictionary/dart.py
index 26cd7d5..ec6dfb2 100644
--- a/swh/indexer/metadata_dictionary/dart.py
+++ b/swh/indexer/metadata_dictionary/dart.py
@@ -1,74 +1,75 @@
 # Copyright (C) 2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import os.path
 import re
 
-from swh.indexer.codemeta import _DATA_DIR, SCHEMA_URI, _read_crosstable
+from rdflib import RDF, BNode, Graph, Literal, URIRef
+
+from swh.indexer.codemeta import _DATA_DIR, _read_crosstable
+from swh.indexer.namespaces import SCHEMA
 
 from .base import YamlMapping
+from .utils import add_map
+
+SPDX = URIRef("https://spdx.org/licenses/")
 
 PUB_TABLE_PATH = os.path.join(_DATA_DIR, "pubspec.csv")
 
 with open(PUB_TABLE_PATH) as fd:
     (CODEMETA_TERMS, PUB_TABLE) = _read_crosstable(fd)
 
 
 def name_to_person(name):
     return {
-        "@type": SCHEMA_URI + "Person",
-        SCHEMA_URI + "name": name,
+        "@type": SCHEMA.Person,
+        SCHEMA.name: name,
     }
 
 
 class PubspecMapping(YamlMapping):
 
     name = "pubspec"
     filename = b"pubspec.yaml"
     mapping = PUB_TABLE["Pubspec"]
     string_fields = [
         "repository",
         "keywords",
         "description",
         "name",
-        "homepage",
         "issue_tracker",
         "platforms",
         "license"
         # license will only be used with the SPDX Identifier
     ]
+    uri_fields = ["homepage"]
 
     def normalize_license(self, s):
         if isinstance(s, str):
-            return {"@id": "https://spdx.org/licenses/" + s}
-
-    def normalize_homepage(self, s):
-        if isinstance(s, str):
-            return {"@id": s}
+            return SPDX + s
 
-    def normalize_author(self, s):
-        name_email_regex = "(?P<name>.*?)( <(?P<email>.*)>)"
-        author = {"@type": SCHEMA_URI + "Person"}
+    def _translate_author(self, graph, s):
+        name_email_re = re.compile("(?P<name>.*?)( <(?P<email>.*)>)")
         if isinstance(s, str):
-            match = re.search(name_email_regex, s)
+            author = BNode()
+            graph.add((author, RDF.type, SCHEMA.Person))
+            match = name_email_re.search(s)
             if match:
                 name = match.group("name")
                 email = match.group("email")
-                author[SCHEMA_URI + "email"] = email
+                graph.add((author, SCHEMA.email, Literal(email)))
             else:
                 name = s
 
-            author[SCHEMA_URI + "name"] = name
+            graph.add((author, SCHEMA.name, Literal(name)))
 
-            return {"@list": [author]}
+            return author
 
-    def normalize_authors(self, authors_list):
-        authors = {"@list": []}
+    def translate_author(self, graph: Graph, root, s) -> None:
+        add_map(graph, root, SCHEMA.author, self._translate_author, [s])
 
-        if isinstance(authors_list, list):
-            for s in authors_list:
-                author = self.normalize_author(s)["@list"]
-                authors["@list"] += author
-            return authors
+    def translate_authors(self, graph: Graph, root, authors) -> None:
+        if isinstance(authors, list):
+            add_map(graph, root, SCHEMA.author, self._translate_author, authors)
diff --git a/swh/indexer/metadata_dictionary/github.py b/swh/indexer/metadata_dictionary/github.py
index 020c8d0..fe3b87e 100644
--- a/swh/indexer/metadata_dictionary/github.py
+++ b/swh/indexer/metadata_dictionary/github.py
@@ -1,130 +1,113 @@
 # Copyright (C) 2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
-import json
-from typing import Any, Dict, Tuple
+from typing import Any, Tuple
 
-from swh.indexer.codemeta import ACTIVITYSTREAMS_URI, CROSSWALK_TABLE, FORGEFED_URI
+from rdflib import RDF, BNode, Graph, Literal, URIRef
 
-from .base import BaseExtrinsicMapping, JsonMapping, produce_terms
+from swh.indexer.codemeta import CROSSWALK_TABLE
+from swh.indexer.namespaces import ACTIVITYSTREAMS, FORGEFED, SCHEMA
 
+from .base import BaseExtrinsicMapping, JsonMapping, produce_terms
+from .utils import prettyprint_graph  # noqa
 
-def _prettyprint(d):
-    print(json.dumps(d, indent=4))
+SPDX = URIRef("https://spdx.org/licenses/")
 
 
 class GitHubMapping(BaseExtrinsicMapping, JsonMapping):
     name = "github"
     mapping = CROSSWALK_TABLE["GitHub"]
     string_fields = [
         "archive_url",
         "created_at",
         "updated_at",
         "description",
         "full_name",
         "html_url",
         "issues_url",
     ]
 
     @classmethod
     def extrinsic_metadata_formats(cls) -> Tuple[str, ...]:
         return ("application/vnd.github.v3+json",)
 
-    def _translate_dict(self, content_dict: Dict[str, Any], **kwargs) -> Dict[str, Any]:
-        d = super()._translate_dict(content_dict, **kwargs)
-        d["type"] = FORGEFED_URI + "Repository"
-        return d
+    def extra_translation(self, graph, root, content_dict):
+        graph.remove((root, RDF.type, SCHEMA.SoftwareSourceCode))
+        graph.add((root, RDF.type, FORGEFED.Repository))
 
-    @produce_terms(FORGEFED_URI, ["forks"])
-    @produce_terms(ACTIVITYSTREAMS_URI, ["totalItems"])
-    def translate_forks_count(
-        self, translated_metadata: Dict[str, Any], v: Any
-    ) -> None:
+    @produce_terms(FORGEFED.forks, ACTIVITYSTREAMS.totalItems)
+    def translate_forks_count(self, graph: Graph, root: BNode, v: Any) -> None:
         """
 
-        >>> translated_metadata = {}
-        >>> GitHubMapping().translate_forks_count(translated_metadata, 42)
-        >>> _prettyprint(translated_metadata)
+        >>> graph = Graph()
+        >>> root = URIRef("http://example.org/test-software")
+        >>> GitHubMapping().translate_forks_count(graph, root, 42)
+        >>> prettyprint_graph(graph, root)
         {
-            "https://forgefed.org/ns#forks": [
-                {
-                    "@type": "https://www.w3.org/ns/activitystreams#OrderedCollection",
-                    "https://www.w3.org/ns/activitystreams#totalItems": 42
-                }
-            ]
+            "@id": ...,
+            "https://forgefed.org/ns#forks": {
+                "@type": "https://www.w3.org/ns/activitystreams#OrderedCollection",
+                "https://www.w3.org/ns/activitystreams#totalItems": 42
+            }
         }
         """
         if isinstance(v, int):
-            translated_metadata.setdefault(FORGEFED_URI + "forks", []).append(
-                {
-                    "@type": ACTIVITYSTREAMS_URI + "OrderedCollection",
-                    ACTIVITYSTREAMS_URI + "totalItems": v,
-                }
-            )
-
-    @produce_terms(ACTIVITYSTREAMS_URI, ["likes"])
-    @produce_terms(ACTIVITYSTREAMS_URI, ["totalItems"])
-    def translate_stargazers_count(
-        self, translated_metadata: Dict[str, Any], v: Any
-    ) -> None:
+            collection = BNode()
+            graph.add((root, FORGEFED.forks, collection))
+            graph.add((collection, RDF.type, ACTIVITYSTREAMS.OrderedCollection))
+            graph.add((collection, ACTIVITYSTREAMS.totalItems, Literal(v)))
+
+    @produce_terms(ACTIVITYSTREAMS.likes, ACTIVITYSTREAMS.totalItems)
+    def translate_stargazers_count(self, graph: Graph, root: BNode, v: Any) -> None:
         """
 
-        >>> translated_metadata = {}
-        >>> GitHubMapping().translate_stargazers_count(translated_metadata, 42)
-        >>> _prettyprint(translated_metadata)
+        >>> graph = Graph()
+        >>> root = URIRef("http://example.org/test-software")
+        >>> GitHubMapping().translate_stargazers_count(graph, root, 42)
+        >>> prettyprint_graph(graph, root)
         {
-            "https://www.w3.org/ns/activitystreams#likes": [
-                {
-                    "@type": "https://www.w3.org/ns/activitystreams#Collection",
-                    "https://www.w3.org/ns/activitystreams#totalItems": 42
-                }
-            ]
+            "@id": ...,
+            "https://www.w3.org/ns/activitystreams#likes": {
+                "@type": "https://www.w3.org/ns/activitystreams#Collection",
+                "https://www.w3.org/ns/activitystreams#totalItems": 42
+            }
         }
         """
         if isinstance(v, int):
-            translated_metadata.setdefault(ACTIVITYSTREAMS_URI + "likes", []).append(
-                {
-                    "@type": ACTIVITYSTREAMS_URI + "Collection",
-                    ACTIVITYSTREAMS_URI + "totalItems": v,
-                }
-            )
-
-    @produce_terms(ACTIVITYSTREAMS_URI, ["followers"])
-    @produce_terms(ACTIVITYSTREAMS_URI, ["totalItems"])
-    def translate_watchers_count(
-        self, translated_metadata: Dict[str, Any], v: Any
-    ) -> None:
+            collection = BNode()
+            graph.add((root, ACTIVITYSTREAMS.likes, collection))
+            graph.add((collection, RDF.type, ACTIVITYSTREAMS.Collection))
+            graph.add((collection, ACTIVITYSTREAMS.totalItems, Literal(v)))
+
+    @produce_terms(ACTIVITYSTREAMS.followers, ACTIVITYSTREAMS.totalItems)
+    def translate_watchers_count(self, graph: Graph, root: BNode, v: Any) -> None:
         """
 
-        >>> translated_metadata = {}
-        >>> GitHubMapping().translate_watchers_count(translated_metadata, 42)
-        >>> _prettyprint(translated_metadata)
+        >>> graph = Graph()
+        >>> root = URIRef("http://example.org/test-software")
+        >>> GitHubMapping().translate_watchers_count(graph, root, 42)
+        >>> prettyprint_graph(graph, root)
         {
-            "https://www.w3.org/ns/activitystreams#followers": [
-                {
-                    "@type": "https://www.w3.org/ns/activitystreams#Collection",
-                    "https://www.w3.org/ns/activitystreams#totalItems": 42
-                }
-            ]
+            "@id": ...,
+            "https://www.w3.org/ns/activitystreams#followers": {
+                "@type": "https://www.w3.org/ns/activitystreams#Collection",
+                "https://www.w3.org/ns/activitystreams#totalItems": 42
+            }
         }
         """
         if isinstance(v, int):
-            translated_metadata.setdefault(
-                ACTIVITYSTREAMS_URI + "followers", []
-            ).append(
-                {
-                    "@type": ACTIVITYSTREAMS_URI + "Collection",
-                    ACTIVITYSTREAMS_URI + "totalItems": v,
-                }
-            )
+            collection = BNode()
+            graph.add((root, ACTIVITYSTREAMS.followers, collection))
+            graph.add((collection, RDF.type, ACTIVITYSTREAMS.Collection))
+            graph.add((collection, ACTIVITYSTREAMS.totalItems, Literal(v)))
 
     def normalize_license(self, d):
         """
 
         >>> GitHubMapping().normalize_license({'spdx_id': 'MIT'})
-        {'@id': 'https://spdx.org/licenses/MIT'}
+        rdflib.term.URIRef('https://spdx.org/licenses/MIT')
         """
         if isinstance(d, dict) and isinstance(d.get("spdx_id"), str):
-            return {"@id": "https://spdx.org/licenses/" + d["spdx_id"]}
+            return SPDX + d["spdx_id"]
diff --git a/swh/indexer/metadata_dictionary/maven.py b/swh/indexer/metadata_dictionary/maven.py
index 419eb74..a374a5e 100644
--- a/swh/indexer/metadata_dictionary/maven.py
+++ b/swh/indexer/metadata_dictionary/maven.py
@@ -1,162 +1,159 @@
-# Copyright (C) 2018-2021  The Software Heritage developers
+# Copyright (C) 2018-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import os
-from typing import Any, Dict, Optional
-import xml.parsers.expat
+from typing import Any, Dict
 
-import xmltodict
+from rdflib import Graph, Literal, URIRef
 
-from swh.indexer.codemeta import CROSSWALK_TABLE, SCHEMA_URI
+from swh.indexer.codemeta import CROSSWALK_TABLE
+from swh.indexer.namespaces import SCHEMA
 
-from .base import DictMapping, SingleFileIntrinsicMapping
+from .base import SingleFileIntrinsicMapping, XmlMapping
+from .utils import prettyprint_graph  # noqa
 
 
-class MavenMapping(DictMapping, SingleFileIntrinsicMapping):
+class MavenMapping(XmlMapping, SingleFileIntrinsicMapping):
     """
     dedicated class for Maven (pom.xml) mapping and translation
     """
 
     name = "maven"
     filename = b"pom.xml"
     mapping = CROSSWALK_TABLE["Java (Maven)"]
     string_fields = ["name", "version", "description", "email"]
 
-    def translate(self, content: bytes) -> Optional[Dict[str, Any]]:
-        try:
-            d = xmltodict.parse(content).get("project") or {}
-        except xml.parsers.expat.ExpatError:
-            self.log.warning("Error parsing XML from %s", self.log_suffix)
-            return None
-        except UnicodeDecodeError:
-            self.log.warning("Error unidecoding XML from %s", self.log_suffix)
-            return None
-        except (LookupError, ValueError):
-            # unknown encoding or multi-byte encoding
-            self.log.warning("Error detecting XML encoding from %s", self.log_suffix)
-            return None
-        if not isinstance(d, dict):
-            self.log.warning("Skipping ill-formed XML content: %s", content)
-            return None
-        metadata = self._translate_dict(d, normalize=False)
-        metadata[SCHEMA_URI + "codeRepository"] = self.parse_repositories(d)
-        metadata[SCHEMA_URI + "license"] = self.parse_licenses(d)
-        return self.normalize_translation(metadata)
-
     _default_repository = {"url": "https://repo.maven.apache.org/maven2/"}
 
-    def parse_repositories(self, d):
+    def _translate_dict(self, d: Dict[str, Any]) -> Dict[str, Any]:
+        return super()._translate_dict(d.get("project") or {})
+
+    def extra_translation(self, graph: Graph, root, d):
+        self.parse_repositories(graph, root, d)
+
+    def parse_repositories(self, graph: Graph, root, d):
         """https://maven.apache.org/pom.html#Repositories
 
+        >>> import rdflib
         >>> import xmltodict
         >>> from pprint import pprint
         >>> d = xmltodict.parse('''
         ... <repositories>
         ...   <repository>
         ...     <id>codehausSnapshots</id>
         ...     <name>Codehaus Snapshots</name>
         ...     <url>http://snapshots.maven.codehaus.org/maven2</url>
         ...     <layout>default</layout>
         ...   </repository>
         ... </repositories>
         ... ''')
-        >>> MavenMapping().parse_repositories(d)
+        >>> MavenMapping().parse_repositories(rdflib.Graph(), rdflib.BNode(), d)
         """
         repositories = d.get("repositories")
         if not repositories:
-            results = [self.parse_repository(d, self._default_repository)]
+            self.parse_repository(graph, root, d, self._default_repository)
         elif isinstance(repositories, dict):
             repositories = repositories.get("repository") or []
             if not isinstance(repositories, list):
                 repositories = [repositories]
-            results = [self.parse_repository(d, repo) for repo in repositories]
-        else:
-            results = []
-        return [res for res in results if res] or None
+            for repo in repositories:
+                self.parse_repository(graph, root, d, repo)
 
-    def parse_repository(self, d, repo):
+    def parse_repository(self, graph: Graph, root, d, repo):
         if not isinstance(repo, dict):
             return
         if repo.get("layout", "default") != "default":
             return  # TODO ?
         url = repo.get("url")
         group_id = d.get("groupId")
         artifact_id = d.get("artifactId")
         if (
             isinstance(url, str)
             and isinstance(group_id, str)
             and isinstance(artifact_id, str)
         ):
             repo = os.path.join(url, *group_id.split("."), artifact_id)
-            return {"@id": repo}
+            graph.add((root, SCHEMA.codeRepository, URIRef(repo)))
 
     def normalize_groupId(self, id_):
         """https://maven.apache.org/pom.html#Maven_Coordinates
 
         >>> MavenMapping().normalize_groupId('org.example')
-        {'@id': 'org.example'}
+        rdflib.term.Literal('org.example')
         """
         if isinstance(id_, str):
-            return {"@id": id_}
+            return Literal(id_)
 
-    def parse_licenses(self, d):
+    def translate_licenses(self, graph, root, licenses):
         """https://maven.apache.org/pom.html#Licenses
 
         >>> import xmltodict
         >>> import json
         >>> d = xmltodict.parse('''
         ... <licenses>
         ...   <license>
         ...     <name>Apache License, Version 2.0</name>
         ...     <url>https://www.apache.org/licenses/LICENSE-2.0.txt</url>
         ...   </license>
         ... </licenses>
         ... ''')
         >>> print(json.dumps(d, indent=4))
         {
             "licenses": {
                 "license": {
                     "name": "Apache License, Version 2.0",
                     "url": "https://www.apache.org/licenses/LICENSE-2.0.txt"
                 }
             }
         }
-        >>> MavenMapping().parse_licenses(d)
-        [{'@id': 'https://www.apache.org/licenses/LICENSE-2.0.txt'}]
+        >>> graph = Graph()
+        >>> root = URIRef("http://example.org/test-software")
+        >>> MavenMapping().translate_licenses(graph, root, d["licenses"])
+        >>> prettyprint_graph(graph, root)
+        {
+            "@id": ...,
+            "http://schema.org/license": {
+                "@id": "https://www.apache.org/licenses/LICENSE-2.0.txt"
+            }
+        }
 
         or, if there are more than one license:
 
         >>> import xmltodict
         >>> from pprint import pprint
         >>> d = xmltodict.parse('''
         ... <licenses>
         ...   <license>
         ...     <name>Apache License, Version 2.0</name>
         ...     <url>https://www.apache.org/licenses/LICENSE-2.0.txt</url>
         ...   </license>
         ...   <license>
         ...     <name>MIT License</name>
         ...     <url>https://opensource.org/licenses/MIT</url>
         ...   </license>
         ... </licenses>
         ... ''')
-        >>> pprint(MavenMapping().parse_licenses(d))
-        [{'@id': 'https://www.apache.org/licenses/LICENSE-2.0.txt'},
-         {'@id': 'https://opensource.org/licenses/MIT'}]
+        >>> graph = Graph()
+        >>> root = URIRef("http://example.org/test-software")
+        >>> MavenMapping().translate_licenses(graph, root, d["licenses"])
+        >>> pprint(set(graph.triples((root, URIRef("http://schema.org/license"), None))))
+        {(rdflib.term.URIRef('http://example.org/test-software'),
+          rdflib.term.URIRef('http://schema.org/license'),
+          rdflib.term.URIRef('https://opensource.org/licenses/MIT')),
+         (rdflib.term.URIRef('http://example.org/test-software'),
+          rdflib.term.URIRef('http://schema.org/license'),
+          rdflib.term.URIRef('https://www.apache.org/licenses/LICENSE-2.0.txt'))}
         """
 
-        licenses = d.get("licenses")
         if not isinstance(licenses, dict):
             return
         licenses = licenses.get("license")
         if isinstance(licenses, dict):
             licenses = [licenses]
         elif not isinstance(licenses, list):
             return
-        return [
-            {"@id": license["url"]}
-            for license in licenses
-            if isinstance(license, dict) and isinstance(license.get("url"), str)
-        ] or None
+        for license in licenses:
+            if isinstance(license, dict) and isinstance(license.get("url"), str):
+                graph.add((root, SCHEMA.license, URIRef(license["url"])))
diff --git a/swh/indexer/metadata_dictionary/npm.py b/swh/indexer/metadata_dictionary/npm.py
index 00231dc..1540ef6 100644
--- a/swh/indexer/metadata_dictionary/npm.py
+++ b/swh/indexer/metadata_dictionary/npm.py
@@ -1,243 +1,282 @@
 # Copyright (C) 2018-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import re
 import urllib.parse
 
-from swh.indexer.codemeta import CROSSWALK_TABLE, SCHEMA_URI
+from rdflib import RDF, BNode, Graph, Literal, URIRef
+
+from swh.indexer.codemeta import CROSSWALK_TABLE
+from swh.indexer.namespaces import SCHEMA
 
 from .base import JsonMapping, SingleFileIntrinsicMapping
+from .utils import add_list, prettyprint_graph  # noqa
+
+SPDX = URIRef("https://spdx.org/licenses/")
 
 
 class NpmMapping(JsonMapping, SingleFileIntrinsicMapping):
     """
     dedicated class for NPM (package.json) mapping and translation
     """
 
     name = "npm"
     mapping = CROSSWALK_TABLE["NodeJS"]
     filename = b"package.json"
-    string_fields = ["name", "version", "homepage", "description", "email"]
+    string_fields = ["name", "version", "description", "email"]
+    uri_fields = ["homepage"]
 
     _schema_shortcuts = {
         "github": "git+https://github.com/%s.git",
         "gist": "git+https://gist.github.com/%s.git",
         "gitlab": "git+https://gitlab.com/%s.git",
         # Bitbucket supports both hg and git, and the shortcut does not
         # tell which one to use.
         # 'bitbucket': 'https://bitbucket.org/',
     }
 
     def normalize_repository(self, d):
         """https://docs.npmjs.com/files/package.json#repository
 
         >>> NpmMapping().normalize_repository({
         ...     'type': 'git',
         ...     'url': 'https://example.org/foo.git'
         ... })
-        {'@id': 'git+https://example.org/foo.git'}
+        rdflib.term.URIRef('git+https://example.org/foo.git')
         >>> NpmMapping().normalize_repository(
         ...     'gitlab:foo/bar')
-        {'@id': 'git+https://gitlab.com/foo/bar.git'}
+        rdflib.term.URIRef('git+https://gitlab.com/foo/bar.git')
         >>> NpmMapping().normalize_repository(
         ...     'foo/bar')
-        {'@id': 'git+https://github.com/foo/bar.git'}
+        rdflib.term.URIRef('git+https://github.com/foo/bar.git')
         """
         if (
             isinstance(d, dict)
             and isinstance(d.get("type"), str)
             and isinstance(d.get("url"), str)
         ):
             url = "{type}+{url}".format(**d)
         elif isinstance(d, str):
             if "://" in d:
                 url = d
             elif ":" in d:
                 (schema, rest) = d.split(":", 1)
                 if schema in self._schema_shortcuts:
                     url = self._schema_shortcuts[schema] % rest
                 else:
                     return None
             else:
                 url = self._schema_shortcuts["github"] % d
 
         else:
             return None
 
-        return {"@id": url}
+        return URIRef(url)
 
     def normalize_bugs(self, d):
         """https://docs.npmjs.com/files/package.json#bugs
 
         >>> NpmMapping().normalize_bugs({
         ...     'url': 'https://example.org/bugs/',
         ...     'email': 'bugs@example.org'
         ... })
-        {'@id': 'https://example.org/bugs/'}
+        rdflib.term.URIRef('https://example.org/bugs/')
         >>> NpmMapping().normalize_bugs(
         ...     'https://example.org/bugs/')
-        {'@id': 'https://example.org/bugs/'}
+        rdflib.term.URIRef('https://example.org/bugs/')
         """
         if isinstance(d, dict) and isinstance(d.get("url"), str):
-            return {"@id": d["url"]}
+            return URIRef(d["url"])
         elif isinstance(d, str):
-            return {"@id": d}
+            return URIRef(d)
         else:
             return None
 
     _parse_author = re.compile(
         r"^ *" r"(?P<name>.*?)" r"( +<(?P<email>.*)>)?" r"( +\((?P<url>.*)\))?" r" *$"
     )
 
-    def normalize_author(self, d):
+    def translate_author(self, graph: Graph, root, d):
         r"""https://docs.npmjs.com/files/package.json#people-fields-author-contributors'
 
         >>> from pprint import pprint
-        >>> pprint(NpmMapping().normalize_author({
+        >>> root = URIRef("http://example.org/test-software")
+        >>> graph = Graph()
+        >>> NpmMapping().translate_author(graph, root, {
         ...     'name': 'John Doe',
         ...     'email': 'john.doe@example.org',
         ...     'url': 'https://example.org/~john.doe',
-        ... }))
-        {'@list': [{'@type': 'http://schema.org/Person',
-                    'http://schema.org/email': 'john.doe@example.org',
-                    'http://schema.org/name': 'John Doe',
-                    'http://schema.org/url': {'@id': 'https://example.org/~john.doe'}}]}
-        >>> pprint(NpmMapping().normalize_author(
+        ... })
+        >>> prettyprint_graph(graph, root)
+        {
+            "@id": ...,
+            "http://schema.org/author": {
+                "@list": [
+                    {
+                        "@type": "http://schema.org/Person",
+                        "http://schema.org/email": "john.doe@example.org",
+                        "http://schema.org/name": "John Doe",
+                        "http://schema.org/url": {
+                            "@id": "https://example.org/~john.doe"
+                        }
+                    }
+                ]
+            }
+        }
+        >>> graph = Graph()
+        >>> NpmMapping().translate_author(graph, root,
         ...     'John Doe <john.doe@example.org> (https://example.org/~john.doe)'
-        ... ))
-        {'@list': [{'@type': 'http://schema.org/Person',
-                    'http://schema.org/email': 'john.doe@example.org',
-                    'http://schema.org/name': 'John Doe',
-                    'http://schema.org/url': {'@id': 'https://example.org/~john.doe'}}]}
-        >>> pprint(NpmMapping().normalize_author({
+        ... )
+        >>> prettyprint_graph(graph, root)
+        {
+            "@id": ...,
+            "http://schema.org/author": {
+                "@list": [
+                    {
+                        "@type": "http://schema.org/Person",
+                        "http://schema.org/email": "john.doe@example.org",
+                        "http://schema.org/name": "John Doe",
+                        "http://schema.org/url": {
+                            "@id": "https://example.org/~john.doe"
+                        }
+                    }
+                ]
+            }
+        }
+        >>> graph = Graph()
+        >>> NpmMapping().translate_author(graph, root, {
         ...     'name': 'John Doe',
         ...     'email': 'john.doe@example.org',
         ...     'url': 'https:\\\\example.invalid/~john.doe',
-        ... }))
-        {'@list': [{'@type': 'http://schema.org/Person',
-                    'http://schema.org/email': 'john.doe@example.org',
-                    'http://schema.org/name': 'John Doe'}]}
+        ... })
+        >>> prettyprint_graph(graph, root)
+        {
+            "@id": ...,
+            "http://schema.org/author": {
+                "@list": [
+                    {
+                        "@type": "http://schema.org/Person",
+                        "http://schema.org/email": "john.doe@example.org",
+                        "http://schema.org/name": "John Doe"
+                    }
+                ]
+            }
+        }
         """  # noqa
-        author = {"@type": SCHEMA_URI + "Person"}
+        author = BNode()
+        graph.add((author, RDF.type, SCHEMA.Person))
         if isinstance(d, dict):
             name = d.get("name", None)
             email = d.get("email", None)
             url = d.get("url", None)
         elif isinstance(d, str):
             match = self._parse_author.match(d)
             if not match:
                 return None
             name = match.group("name")
             email = match.group("email")
             url = match.group("url")
         else:
             return None
 
         if name and isinstance(name, str):
-            author[SCHEMA_URI + "name"] = name
+            graph.add((author, SCHEMA.name, Literal(name)))
         if email and isinstance(email, str):
-            author[SCHEMA_URI + "email"] = email
+            graph.add((author, SCHEMA.email, Literal(email)))
         if url and isinstance(url, str):
             # Workaround for https://github.com/digitalbazaar/pyld/issues/91 : drop
             # URLs that are blatantly invalid early, so PyLD does not crash.
             parsed_url = urllib.parse.urlparse(url)
             if parsed_url.netloc:
-                author[SCHEMA_URI + "url"] = {"@id": url}
+                graph.add((author, SCHEMA.url, URIRef(url)))
 
-        return {"@list": [author]}
+        add_list(graph, root, SCHEMA.author, [author])
 
     def normalize_description(self, description):
         r"""Try to re-decode ``description`` as UTF-16, as this is a somewhat common
         mistake that causes issues in the database because of null bytes in JSON.
 
         >>> NpmMapping().normalize_description("foo bar")
-        'foo bar'
+        rdflib.term.Literal('foo bar')
         >>> NpmMapping().normalize_description(
         ...     "\ufffd\ufffd#\x00 \x00f\x00o\x00o\x00 \x00b\x00a\x00r\x00\r\x00 \x00"
         ... )
-        'foo bar'
+        rdflib.term.Literal('foo bar')
         >>> NpmMapping().normalize_description(
         ...     "\ufffd\ufffd\x00#\x00 \x00f\x00o\x00o\x00 \x00b\x00a\x00r\x00\r\x00 "
         ... )
-        'foo bar'
+        rdflib.term.Literal('foo bar')
         >>> NpmMapping().normalize_description(
         ...     # invalid UTF-16 and meaningless UTF-8:
         ...     "\ufffd\ufffd\x00#\x00\x00\x00 \x00\x00\x00\x00f\x00\x00\x00\x00"
         ... ) is None
         True
         >>> NpmMapping().normalize_description(
         ...     # ditto (ut looks like little-endian at first)
         ...     "\ufffd\ufffd#\x00\x00\x00 \x00\x00\x00\x00f\x00\x00\x00\x00\x00"
         ... ) is None
         True
         >>> NpmMapping().normalize_description(None) is None
         True
         """
         if not isinstance(description, str):
             return None
         # XXX: if this function ever need to support more cases, consider
         # switching to https://pypi.org/project/ftfy/ instead of adding more hacks
         if description.startswith("\ufffd\ufffd") and "\x00" in description:
             # 2 unicode replacement characters followed by '# ' encoded as UTF-16
             # is a common mistake, which indicates a README.md was saved as UTF-16,
             # and some NPM tool opened it as UTF-8 and used the first line as
             # description.
 
             description_bytes = description.encode()
 
             # Strip the the two unicode replacement characters
             assert description_bytes.startswith(b"\xef\xbf\xbd\xef\xbf\xbd")
             description_bytes = description_bytes[6:]
 
             # If the following attempts fail to recover the description, discard it
             # entirely because the current indexer storage backend (postgresql) cannot
             # store zero bytes in JSON columns.
             description = None
 
             if not description_bytes.startswith(b"\x00"):
                 # try UTF-16 little-endian (the most common) first
                 try:
                     description = description_bytes.decode("utf-16le")
                 except UnicodeDecodeError:
                     pass
             if description is None:
                 # if it fails, try UTF-16 big-endian
                 try:
                     description = description_bytes.decode("utf-16be")
                 except UnicodeDecodeError:
                     pass
 
             if description:
                 if description.startswith("# "):
                     description = description[2:]
-                return description.rstrip()
-        return description
+                return Literal(description.rstrip())
+            else:
+                return None
+        return Literal(description)
 
     def normalize_license(self, s):
         """https://docs.npmjs.com/files/package.json#license
 
         >>> NpmMapping().normalize_license('MIT')
-        {'@id': 'https://spdx.org/licenses/MIT'}
-        """
-        if isinstance(s, str):
-            return {"@id": "https://spdx.org/licenses/" + s}
-
-    def normalize_homepage(self, s):
-        """https://docs.npmjs.com/files/package.json#homepage
-
-        >>> NpmMapping().normalize_homepage('https://example.org/~john.doe')
-        {'@id': 'https://example.org/~john.doe'}
+        rdflib.term.URIRef('https://spdx.org/licenses/MIT')
         """
         if isinstance(s, str):
-            return {"@id": s}
+            return SPDX + s
 
     def normalize_keywords(self, lst):
         """https://docs.npmjs.com/files/package.json#homepage
 
         >>> NpmMapping().normalize_keywords(['foo', 'bar'])
-        ['foo', 'bar']
+        [rdflib.term.Literal('foo'), rdflib.term.Literal('bar')]
         """
         if isinstance(lst, list):
-            return [x for x in lst if isinstance(x, str)]
+            return [Literal(x) for x in lst if isinstance(x, str)]
diff --git a/swh/indexer/metadata_dictionary/nuget.py b/swh/indexer/metadata_dictionary/nuget.py
new file mode 100644
index 0000000..62f7ea9
--- /dev/null
+++ b/swh/indexer/metadata_dictionary/nuget.py
@@ -0,0 +1,95 @@
+# Copyright (C) 2022  The Software Heritage developers
+# See the AUTHORS file at the top-level directory of this distribution
+# License: GNU General Public License version 3, or any later version
+# See top-level LICENSE file for more information
+
+import os.path
+import re
+from typing import Any, Dict, List
+
+from rdflib import RDF, BNode, Graph, Literal, URIRef
+
+from swh.indexer.codemeta import _DATA_DIR, _read_crosstable
+from swh.indexer.namespaces import SCHEMA
+from swh.indexer.storage.interface import Sha1
+
+from .base import BaseIntrinsicMapping, DirectoryLsEntry, XmlMapping
+from .utils import add_list
+
+NUGET_TABLE_PATH = os.path.join(_DATA_DIR, "nuget.csv")
+
+with open(NUGET_TABLE_PATH) as fd:
+    (CODEMETA_TERMS, NUGET_TABLE) = _read_crosstable(fd)
+
+SPDX = URIRef("https://spdx.org/licenses/")
+
+
+class NuGetMapping(XmlMapping, BaseIntrinsicMapping):
+    """
+    dedicated class for NuGet (.nuspec) mapping and translation
+    """
+
+    name = "nuget"
+    mapping = NUGET_TABLE["NuGet"]
+    mapping["copyright"] = URIRef("http://schema.org/copyrightNotice")
+    mapping["language"] = URIRef("http://schema.org/inLanguage")
+    string_fields = [
+        "description",
+        "version",
+        "name",
+        "tags",
+        "license",
+        "summary",
+        "copyright",
+        "language",
+    ]
+    uri_fields = ["projectUrl", "licenseUrl"]
+
+    @classmethod
+    def detect_metadata_files(cls, file_entries: List[DirectoryLsEntry]) -> List[Sha1]:
+        for entry in file_entries:
+            if entry["name"].endswith(b".nuspec"):
+                return [entry["sha1"]]
+        return []
+
+    def _translate_dict(self, d: Dict[str, Any]) -> Dict[str, Any]:
+        return super()._translate_dict(d.get("package", {}).get("metadata", {}))
+
+    def translate_repository(self, graph, root, v):
+        if isinstance(v, dict) and isinstance(v["@url"], str):
+            codemeta_key = URIRef(self.mapping["repository.url"])
+            graph.add((root, codemeta_key, URIRef(v["@url"])))
+
+    def normalize_license(self, v):
+        if isinstance(v, dict) and v["@type"] == "expression":
+            license_string = v["#text"]
+            if not bool(
+                re.search(r" with |\(|\)| and ", license_string, re.IGNORECASE)
+            ):
+                return [
+                    SPDX + license_type.strip()
+                    for license_type in re.split(
+                        r" or ", license_string, flags=re.IGNORECASE
+                    )
+                ]
+            else:
+                return None
+
+    def translate_authors(self, graph: Graph, root, s):
+        if isinstance(s, str):
+            authors = []
+            for author_name in s.split(","):
+                author_name = author_name.strip()
+                author = BNode()
+                graph.add((author, RDF.type, SCHEMA.Person))
+                graph.add((author, SCHEMA.name, Literal(author_name)))
+                authors.append(author)
+            add_list(graph, root, SCHEMA.author, authors)
+
+    def translate_releaseNotes(self, graph: Graph, root, s):
+        if isinstance(s, str):
+            graph.add((root, SCHEMA.releaseNotes, Literal(s)))
+
+    def normalize_tags(self, s):
+        if isinstance(s, str):
+            return [Literal(tag) for tag in s.split(" ")]
diff --git a/swh/indexer/metadata_dictionary/python.py b/swh/indexer/metadata_dictionary/python.py
index 686deed..b16d681 100644
--- a/swh/indexer/metadata_dictionary/python.py
+++ b/swh/indexer/metadata_dictionary/python.py
@@ -1,76 +1,80 @@
-# Copyright (C) 2018-2019  The Software Heritage developers
+# Copyright (C) 2018-2021  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import email.parser
 import email.policy
-import itertools
 
-from swh.indexer.codemeta import CROSSWALK_TABLE, SCHEMA_URI
+from rdflib import BNode, Literal, URIRef
+
+from swh.indexer.codemeta import CROSSWALK_TABLE
+from swh.indexer.namespaces import RDF, SCHEMA
 
 from .base import DictMapping, SingleFileIntrinsicMapping
+from .utils import add_list
 
 _normalize_pkginfo_key = str.lower
 
 
 class LinebreakPreservingEmailPolicy(email.policy.EmailPolicy):
     def header_fetch_parse(self, name, value):
         if hasattr(value, "name"):
             return value
         value = value.replace("\n        ", "\n")
         return self.header_factory(name, value)
 
 
 class PythonPkginfoMapping(DictMapping, SingleFileIntrinsicMapping):
     """Dedicated class for Python's PKG-INFO mapping and translation.
 
     https://www.python.org/dev/peps/pep-0314/"""
 
     name = "pkg-info"
     filename = b"PKG-INFO"
     mapping = {
         _normalize_pkginfo_key(k): v
         for (k, v) in CROSSWALK_TABLE["Python PKG-INFO"].items()
     }
     string_fields = [
         "name",
         "version",
         "description",
         "summary",
         "author",
         "author-email",
     ]
 
     _parser = email.parser.BytesHeaderParser(policy=LinebreakPreservingEmailPolicy())
 
     def translate(self, content):
         msg = self._parser.parsebytes(content)
         d = {}
         for (key, value) in msg.items():
             key = _normalize_pkginfo_key(key)
             if value != "UNKNOWN":
                 d.setdefault(key, []).append(value)
-        metadata = self._translate_dict(d, normalize=False)
-        if SCHEMA_URI + "author" in metadata or SCHEMA_URI + "email" in metadata:
-            metadata[SCHEMA_URI + "author"] = {
-                "@list": [
-                    {
-                        "@type": SCHEMA_URI + "Person",
-                        SCHEMA_URI
-                        + "name": metadata.pop(SCHEMA_URI + "author", [None])[0],
-                        SCHEMA_URI
-                        + "email": metadata.pop(SCHEMA_URI + "email", [None])[0],
-                    }
-                ]
-            }
-        return self.normalize_translation(metadata)
+        return self._translate_dict(d)
+
+    def extra_translation(self, graph, root, d):
+        author_names = list(graph.triples((root, SCHEMA.author, None)))
+        author_emails = list(graph.triples((root, SCHEMA.email, None)))
+        graph.remove((root, SCHEMA.author, None))
+        graph.remove((root, SCHEMA.email, None))
+        if author_names or author_emails:
+            author = BNode()
+            graph.add((author, RDF.type, SCHEMA.Person))
+            for (_, _, author_name) in author_names:
+                graph.add((author, SCHEMA.name, author_name))
+            for (_, _, author_email) in author_emails:
+                graph.add((author, SCHEMA.email, author_email))
+            add_list(graph, root, SCHEMA.author, [author])
 
     def normalize_home_page(self, urls):
-        return [{"@id": url} for url in urls]
+        return [URIRef(url) for url in urls]
 
     def normalize_keywords(self, keywords):
-        return list(itertools.chain.from_iterable(s.split(" ") for s in keywords))
+        return [Literal(keyword) for s in keywords for keyword in s.split(" ")]
 
     def normalize_license(self, licenses):
-        return [{"@id": license} for license in licenses]
+        return [URIRef("https://spdx.org/licenses/" + license) for license in licenses]
diff --git a/swh/indexer/metadata_dictionary/ruby.py b/swh/indexer/metadata_dictionary/ruby.py
index bdb06aa..71a0b10 100644
--- a/swh/indexer/metadata_dictionary/ruby.py
+++ b/swh/indexer/metadata_dictionary/ruby.py
@@ -1,135 +1,130 @@
 # Copyright (C) 2018-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import ast
 import itertools
 import re
 from typing import List
 
-from swh.indexer.codemeta import CROSSWALK_TABLE, SCHEMA_URI
+from rdflib import RDF, BNode, Graph, Literal, URIRef
+
+from swh.indexer.codemeta import CROSSWALK_TABLE
 from swh.indexer.metadata_dictionary.base import DirectoryLsEntry
+from swh.indexer.namespaces import SCHEMA
 from swh.indexer.storage.interface import Sha1
 
 from .base import BaseIntrinsicMapping, DictMapping
+from .utils import add_map
+
+SPDX = URIRef("https://spdx.org/licenses/")
 
 
-def name_to_person(name):
-    return {
-        "@type": SCHEMA_URI + "Person",
-        SCHEMA_URI + "name": name,
-    }
+def name_to_person(graph: Graph, name):
+    if not isinstance(name, str):
+        return None
+    author = BNode()
+    graph.add((author, RDF.type, SCHEMA.Person))
+    graph.add((author, SCHEMA.name, Literal(name)))
+    return author
 
 
 class GemspecMapping(BaseIntrinsicMapping, DictMapping):
     name = "gemspec"
     mapping = CROSSWALK_TABLE["Ruby Gem"]
     string_fields = ["name", "version", "description", "summary", "email"]
+    uri_fields = ["homepage"]
 
     _re_spec_new = re.compile(r".*Gem::Specification.new +(do|\{) +\|.*\|.*")
     _re_spec_entry = re.compile(r"\s*\w+\.(?P<key>\w+)\s*=\s*(?P<expr>.*)")
 
     @classmethod
     def detect_metadata_files(cls, file_entries: List[DirectoryLsEntry]) -> List[Sha1]:
         for entry in file_entries:
             if entry["name"].endswith(b".gemspec"):
                 return [entry["sha1"]]
         return []
 
     def translate(self, raw_content):
         try:
             raw_content = raw_content.decode()
         except UnicodeDecodeError:
             self.log.warning("Error unidecoding from %s", self.log_suffix)
             return
 
         # Skip lines before 'Gem::Specification.new'
         lines = itertools.dropwhile(
             lambda x: not self._re_spec_new.match(x), raw_content.split("\n")
         )
 
         try:
             next(lines)  # Consume 'Gem::Specification.new'
         except StopIteration:
             self.log.warning("Could not find Gem::Specification in %s", self.log_suffix)
             return
 
         content_dict = {}
         for line in lines:
             match = self._re_spec_entry.match(line)
             if match:
                 value = self.eval_ruby_expression(match.group("expr"))
                 if value:
                     content_dict[match.group("key")] = value
         return self._translate_dict(content_dict)
 
     def eval_ruby_expression(self, expr):
         """Very simple evaluator of Ruby expressions.
 
         >>> GemspecMapping().eval_ruby_expression('"Foo bar"')
         'Foo bar'
         >>> GemspecMapping().eval_ruby_expression("'Foo bar'")
         'Foo bar'
         >>> GemspecMapping().eval_ruby_expression("['Foo', 'bar']")
         ['Foo', 'bar']
         >>> GemspecMapping().eval_ruby_expression("'Foo bar'.freeze")
         'Foo bar'
         >>> GemspecMapping().eval_ruby_expression( \
                 "['Foo'.freeze, 'bar'.freeze]")
         ['Foo', 'bar']
         """
 
         def evaluator(node):
             if isinstance(node, ast.Str):
                 return node.s
             elif isinstance(node, ast.List):
                 res = []
                 for element in node.elts:
                     val = evaluator(element)
                     if not val:
                         return
                     res.append(val)
                 return res
 
         expr = expr.replace(".freeze", "")
         try:
             # We're parsing Ruby expressions here, but Python's
             # ast.parse works for very simple Ruby expressions
             # (mainly strings delimited with " or ', and lists
             # of such strings).
             tree = ast.parse(expr, mode="eval")
         except (SyntaxError, ValueError):
             return
         if isinstance(tree, ast.Expression):
             return evaluator(tree.body)
 
-    def normalize_homepage(self, s):
-        if isinstance(s, str):
-            return {"@id": s}
-
     def normalize_license(self, s):
         if isinstance(s, str):
-            return [{"@id": "https://spdx.org/licenses/" + s}]
+            return SPDX + s
 
     def normalize_licenses(self, licenses):
         if isinstance(licenses, list):
-            return [
-                {"@id": "https://spdx.org/licenses/" + license}
-                for license in licenses
-                if isinstance(license, str)
-            ]
+            return [SPDX + license for license in licenses if isinstance(license, str)]
 
-    def normalize_author(self, author):
+    def translate_author(self, graph: Graph, root, author):
         if isinstance(author, str):
-            return {"@list": [name_to_person(author)]}
+            add_map(graph, root, SCHEMA.author, name_to_person, [author])
 
-    def normalize_authors(self, authors):
+    def translate_authors(self, graph: Graph, root, authors):
         if isinstance(authors, list):
-            return {
-                "@list": [
-                    name_to_person(author)
-                    for author in authors
-                    if isinstance(author, str)
-                ]
-            }
+            add_map(graph, root, SCHEMA.author, name_to_person, authors)
diff --git a/swh/indexer/metadata_dictionary/utils.py b/swh/indexer/metadata_dictionary/utils.py
new file mode 100644
index 0000000..173b146
--- /dev/null
+++ b/swh/indexer/metadata_dictionary/utils.py
@@ -0,0 +1,72 @@
+# Copyright (C) 2022  The Software Heritage developers
+# See the AUTHORS file at the top-level directory of this distribution
+# License: GNU General Public License version 3, or any later version
+# See top-level LICENSE file for more information
+
+
+import json
+from typing import Callable, Iterable, Optional, Sequence, TypeVar
+
+from pyld import jsonld
+from rdflib import RDF, Graph, URIRef
+import rdflib.term
+
+from swh.indexer.codemeta import _document_loader
+
+
+def prettyprint_graph(graph: Graph, root: URIRef):
+    s = graph.serialize(format="application/ld+json")
+    jsonld_graph = json.loads(s)
+    translated_metadata = jsonld.frame(
+        jsonld_graph,
+        {"@id": str(root)},
+        options={
+            "documentLoader": _document_loader,
+            "processingMode": "json-ld-1.1",
+        },
+    )
+    print(json.dumps(translated_metadata, indent=4))
+
+
+def add_list(
+    graph: Graph,
+    subject: rdflib.term.Node,
+    predicate: rdflib.term.Identifier,
+    objects: Sequence[rdflib.term.Node],
+) -> None:
+    """Adds triples to the ``graph`` so that they are equivalent to this
+    JSON-LD object::
+
+        {
+            "@id": subject,
+            predicate: {"@list": objects}
+        }
+
+    This is a naive implementation of
+    https://json-ld.org/spec/latest/json-ld-api/#list-to-rdf-conversion
+    """
+    # JSON-LD's @list is syntactic sugar for a linked list / chain in the RDF graph,
+    # which is what we are going to construct, starting from the end:
+    last_link: rdflib.term.Node
+    last_link = RDF.nil
+    for item in reversed(objects):
+        link = rdflib.BNode()
+        graph.add((link, RDF.first, item))
+        graph.add((link, RDF.rest, last_link))
+        last_link = link
+    graph.add((subject, predicate, last_link))
+
+
+TValue = TypeVar("TValue")
+
+
+def add_map(
+    graph: Graph,
+    subject: rdflib.term.Node,
+    predicate: rdflib.term.Identifier,
+    f: Callable[[Graph, TValue], Optional[rdflib.term.Node]],
+    values: Iterable[TValue],
+) -> None:
+    """Helper for :func:`add_list` that takes a mapper function ``f``."""
+    nodes = [f(graph, value) for value in values]
+    add_list(graph, subject, predicate, [node for node in nodes if node])
diff --git a/swh/indexer/namespaces.py b/swh/indexer/namespaces.py
new file mode 100644
index 0000000..65ab826
--- /dev/null
+++ b/swh/indexer/namespaces.py
@@ -0,0 +1,12 @@
+# Copyright (C) 2022  The Software Heritage developers
+# See the AUTHORS file at the top-level directory of this distribution
+# License: GNU General Public License version 3, or any later version
+# See top-level LICENSE file for more information
+
+from rdflib import Namespace as _Namespace
+from rdflib import RDF  # noqa
+
+SCHEMA = _Namespace("http://schema.org/")
+CODEMETA = _Namespace("https://codemeta.github.io/terms/")
+FORGEFED = _Namespace("https://forgefed.org/ns#")
+ACTIVITYSTREAMS = _Namespace("https://www.w3.org/ns/activitystreams#")
diff --git a/swh/indexer/tests/metadata_dictionary/test_cff.py b/swh/indexer/tests/metadata_dictionary/test_cff.py
index f91a689..fb50ba5 100644
--- a/swh/indexer/tests/metadata_dictionary/test_cff.py
+++ b/swh/indexer/tests/metadata_dictionary/test_cff.py
@@ -1,220 +1,225 @@
-# Copyright (C) 2017-2022  The Software Heritage developers
+# Copyright (C) 2021-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 
 def test_compute_metadata_cff():
     """
     testing CITATION.cff translation
     """
     content = """# YAML 1.2
 ---
 abstract: "Command line program to convert from Citation File \
 Format to various other formats such as BibTeX, EndNote, RIS, \
 schema.org, CodeMeta, and .zenodo.json."
 authors:
   -
     affiliation: "Netherlands eScience Center"
     family-names: Klaver
     given-names: Tom
   -
     affiliation: "Humboldt-Universität zu Berlin"
     family-names: Druskat
     given-names: Stephan
     orcid: https://orcid.org/0000-0003-4925-7248
 cff-version: "1.0.3"
 date-released: 2019-11-12
 doi: 10.5281/zenodo.1162057
 keywords:
   - "citation"
   - "bibliography"
   - "cff"
   - "CITATION.cff"
 license: Apache-2.0
 message: "If you use this software, please cite it using these metadata."
 license: Apache-2.0
 message: "If you use this software, please cite it using these metadata."
 repository-code: "https://github.com/citation-file-format/cff-converter-python"
 title: cffconvert
 version: "1.4.0-alpha0"
     """.encode(
         "utf-8"
     )
 
+    result = MAPPINGS["CffMapping"]().translate(content)
+    assert set(result.pop("keywords")) == {
+        "citation",
+        "bibliography",
+        "cff",
+        "CITATION.cff",
+    }
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [
             {
                 "type": "Person",
                 "affiliation": {
                     "type": "Organization",
                     "name": "Netherlands eScience Center",
                 },
                 "familyName": "Klaver",
                 "givenName": "Tom",
             },
             {
                 "id": "https://orcid.org/0000-0003-4925-7248",
                 "type": "Person",
                 "affiliation": {
                     "type": "Organization",
                     "name": "Humboldt-Universität zu Berlin",
                 },
                 "familyName": "Druskat",
                 "givenName": "Stephan",
             },
         ],
         "codeRepository": (
             "https://github.com/citation-file-format/cff-converter-python"
         ),
         "datePublished": "2019-11-12",
         "description": """Command line program to convert from \
 Citation File Format to various other formats such as BibTeX, EndNote, \
 RIS, schema.org, CodeMeta, and .zenodo.json.""",
         "identifier": "https://doi.org/10.5281/zenodo.1162057",
-        "keywords": ["citation", "bibliography", "cff", "CITATION.cff"],
         "license": "https://spdx.org/licenses/Apache-2.0",
         "version": "1.4.0-alpha0",
     }
 
-    result = MAPPINGS["CffMapping"]().translate(content)
     assert expected == result
 
 
 def test_compute_metadata_cff_invalid_yaml():
     """
     test yaml translation for invalid yaml file
     """
     content = """cff-version: 1.0.3
 message: To cite the SigMF specification, please include the following:
 authors:
   - name: The GNU Radio Foundation, Inc.
     """.encode(
         "utf-8"
     )
 
     expected = None
 
     result = MAPPINGS["CffMapping"]().translate(content)
     assert expected == result
 
 
 def test_compute_metadata_cff_empty():
     """
     test yaml translation for empty yaml file
     """
     content = """
     """.encode(
         "utf-8"
     )
 
     expected = None
 
     result = MAPPINGS["CffMapping"]().translate(content)
     assert expected == result
 
 
 def test_compute_metadata_cff_list():
     """
     test yaml translation for empty yaml file
     """
     content = """
 - Foo
 - Bar
     """.encode(
         "utf-8"
     )
 
     expected = None
 
     result = MAPPINGS["CffMapping"]().translate(content)
     assert expected == result
 
 
 def test_cff_empty_fields():
     """
     testing CITATION.cff translation
     """
     content = """# YAML 1.2
   authors:
   -
     affiliation: "Hogwarts"
     family-names:
     given-names: Harry
   -
     affiliation: "Ministry of Magic"
     family-names: Weasley
     orcid:
     given-names: Arthur
 
 
     """.encode(
         "utf-8"
     )
 
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [
             {
                 "type": "Person",
                 "affiliation": {
                     "type": "Organization",
                     "name": "Hogwarts",
                 },
                 "givenName": "Harry",
             },
             {
                 "type": "Person",
                 "affiliation": {
                     "type": "Organization",
                     "name": "Ministry of Magic",
                 },
                 "familyName": "Weasley",
                 "givenName": "Arthur",
             },
         ],
     }
 
     result = MAPPINGS["CffMapping"]().translate(content)
     assert expected == result
 
 
 def test_cff_invalid_fields():
     """
     testing CITATION.cff translation
     """
     content = """# YAML 1.2
   authors:
   -
     affiliation: "Hogwarts"
     family-names:
     - Potter
     - James
     given-names: Harry
 
     """.encode(
         "utf-8"
     )
 
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [
             {
                 "type": "Person",
                 "affiliation": {
                     "type": "Organization",
                     "name": "Hogwarts",
                 },
                 "givenName": "Harry",
             },
         ],
     }
 
     result = MAPPINGS["CffMapping"]().translate(content)
     assert expected == result
diff --git a/swh/indexer/tests/metadata_dictionary/test_codemeta.py b/swh/indexer/tests/metadata_dictionary/test_codemeta.py
index 383b4a7..21865ee 100644
--- a/swh/indexer/tests/metadata_dictionary/test_codemeta.py
+++ b/swh/indexer/tests/metadata_dictionary/test_codemeta.py
@@ -1,175 +1,367 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import json
 
 from hypothesis import HealthCheck, given, settings
 
 from swh.indexer.codemeta import CODEMETA_TERMS
 from swh.indexer.metadata_detector import detect_metadata
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 from ..utils import json_document_strategy
 
 
 def test_compute_metadata_valid_codemeta():
     raw_content = b"""{
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "@type": "SoftwareSourceCode",
         "identifier": "CodeMeta",
         "description": "CodeMeta is a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations.",
         "name": "CodeMeta: Minimal metadata schemas for science software and code, in JSON-LD",
         "codeRepository": "https://github.com/codemeta/codemeta",
         "issueTracker": "https://github.com/codemeta/codemeta/issues",
         "license": "https://spdx.org/licenses/Apache-2.0",
         "version": "2.0",
         "author": [
           {
             "@type": "Person",
             "givenName": "Carl",
             "familyName": "Boettiger",
             "email": "cboettig@gmail.com",
             "@id": "http://orcid.org/0000-0002-1642-628X"
           },
           {
             "@type": "Person",
             "givenName": "Matthew B.",
             "familyName": "Jones",
             "email": "jones@nceas.ucsb.edu",
             "@id": "http://orcid.org/0000-0003-0077-4738"
           }
         ],
         "maintainer": {
           "@type": "Person",
           "givenName": "Carl",
           "familyName": "Boettiger",
           "email": "cboettig@gmail.com",
           "@id": "http://orcid.org/0000-0002-1642-628X"
         },
         "contIntegration": "https://travis-ci.org/codemeta/codemeta",
         "developmentStatus": "active",
         "downloadUrl": "https://github.com/codemeta/codemeta/archive/2.0.zip",
         "funder": {
             "@id": "https://doi.org/10.13039/100000001",
             "@type": "Organization",
             "name": "National Science Foundation"
         },
         "funding":"1549758; Codemeta: A Rosetta Stone for Metadata in Scientific Software",
         "keywords": [
           "metadata",
           "software"
         ],
         "version":"2.0",
         "dateCreated":"2017-06-05",
         "datePublished":"2017-06-05",
         "programmingLanguage": "JSON-LD"
       }"""  # noqa
     expected_result = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "identifier": "CodeMeta",
         "description": "CodeMeta is a concept vocabulary that can "
         "be used to standardize the exchange of software metadata "
         "across repositories and organizations.",
         "name": "CodeMeta: Minimal metadata schemas for science "
         "software and code, in JSON-LD",
         "codeRepository": "https://github.com/codemeta/codemeta",
         "issueTracker": "https://github.com/codemeta/codemeta/issues",
         "license": "https://spdx.org/licenses/Apache-2.0",
         "version": "2.0",
         "author": [
             {
                 "type": "Person",
                 "givenName": "Carl",
                 "familyName": "Boettiger",
                 "email": "cboettig@gmail.com",
                 "id": "http://orcid.org/0000-0002-1642-628X",
             },
             {
                 "type": "Person",
                 "givenName": "Matthew B.",
                 "familyName": "Jones",
                 "email": "jones@nceas.ucsb.edu",
                 "id": "http://orcid.org/0000-0003-0077-4738",
             },
         ],
         "maintainer": {
             "type": "Person",
             "givenName": "Carl",
             "familyName": "Boettiger",
             "email": "cboettig@gmail.com",
             "id": "http://orcid.org/0000-0002-1642-628X",
         },
         "contIntegration": "https://travis-ci.org/codemeta/codemeta",
         "developmentStatus": "active",
         "downloadUrl": "https://github.com/codemeta/codemeta/archive/2.0.zip",
         "funder": {
             "id": "https://doi.org/10.13039/100000001",
             "type": "Organization",
             "name": "National Science Foundation",
         },
         "funding": "1549758; Codemeta: A Rosetta Stone for Metadata "
         "in Scientific Software",
         "keywords": ["metadata", "software"],
         "version": "2.0",
         "dateCreated": "2017-06-05",
         "datePublished": "2017-06-05",
         "programmingLanguage": "JSON-LD",
     }
     result = MAPPINGS["CodemetaMapping"]().translate(raw_content)
     assert result == expected_result
 
 
 def test_compute_metadata_codemeta_alternate_context():
     raw_content = b"""{
         "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
         "@type": "SoftwareSourceCode",
         "identifier": "CodeMeta"
     }"""  # noqa
     expected_result = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "identifier": "CodeMeta",
     }
     result = MAPPINGS["CodemetaMapping"]().translate(raw_content)
     assert result == expected_result
 
 
 @settings(suppress_health_check=[HealthCheck.too_slow])
 @given(json_document_strategy(keys=CODEMETA_TERMS))
 def test_codemeta_adversarial(doc):
     raw = json.dumps(doc).encode()
     MAPPINGS["CodemetaMapping"]().translate(raw)
 
 
 def test_detect_metadata_codemeta_json_uppercase():
     df = [
         {
             "sha1_git": b"abc",
             "name": b"index.html",
             "target": b"abc",
             "length": 897,
             "status": "visible",
             "type": "file",
             "perms": 33188,
             "dir_id": b"dir_a",
             "sha1": b"bcd",
         },
         {
             "sha1_git": b"aab",
             "name": b"CODEMETA.json",
             "target": b"aab",
             "length": 712,
             "status": "visible",
             "type": "file",
             "perms": 33188,
             "dir_id": b"dir_a",
             "sha1": b"bcd",
         },
     ]
     results = detect_metadata(df)
 
     expected_results = {"CodemetaMapping": [b"bcd"]}
     assert expected_results == results
+
+
+def test_sword_default_xmlns():
+    content = """<?xml version="1.0"?>
+    <atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns="https://doi.org/10.5063/schema/codemeta-2.0">
+      <name>My Software</name>
+      <author>
+        <name>Author 1</name>
+        <email>foo@example.org</email>
+      </author>
+      <author>
+        <name>Author 2</name>
+      </author>
+    </atom:entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+        "author": [
+            {"name": "Author 1", "email": "foo@example.org"},
+            {"name": "Author 2"},
+        ],
+    }
+
+
+def test_sword_basics():
+    content = """<?xml version="1.0"?>
+    <entry xmlns="http://www.w3.org/2005/Atom"
+           xmlns:codemeta="https://doi.org/10.5063/schema/codemeta-2.0">
+      <codemeta:name>My Software</codemeta:name>
+      <codemeta:author>
+        <codemeta:name>Author 1</codemeta:name>
+        <codemeta:email>foo@example.org</codemeta:email>
+      </codemeta:author>
+      <codemeta:author>
+        <codemeta:name>Author 2</codemeta:name>
+      </codemeta:author>
+      <author>
+        <name>Author 3</name>
+        <email>bar@example.org</email>
+      </author>
+    </entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+        "author": [
+            {"name": "Author 1", "email": "foo@example.org"},
+            {"name": "Author 2"},
+            {"name": "Author 3", "email": "bar@example.org"},
+        ],
+    }
+
+
+def test_sword_mixed():
+    content = """<?xml version="1.0"?>
+    <atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns="https://doi.org/10.5063/schema/codemeta-2.0"
+                xmlns:schema="http://schema.org/">
+      <name>My Software</name>
+      blah
+      <schema:version>1.2.3</schema:version>
+      blih
+    </atom:entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+        "version": "1.2.3",
+    }
+
+
+def test_sword_schemaorg_in_codemeta():
+    content = """<?xml version="1.0"?>
+    <atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns="https://doi.org/10.5063/schema/codemeta-2.0"
+                xmlns:schema="http://schema.org/">
+      <name>My Software</name>
+      <schema:version>1.2.3</schema:version>
+    </atom:entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+        "version": "1.2.3",
+    }
+
+
+def test_sword_schemaorg_in_codemeta_constrained():
+    """Resulting property has the compact URI 'schema:url' instead of just
+    the term 'url', because term 'url' is defined by the Codemeta schema
+    has having type '@id'."""
+    content = """<?xml version="1.0"?>
+    <atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns="https://doi.org/10.5063/schema/codemeta-2.0"
+                xmlns:schema="http://schema.org/">
+      <name>My Software</name>
+      <schema:url>http://example.org/my-software</schema:url>
+    </atom:entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+        "schema:url": "http://example.org/my-software",
+    }
+
+
+def test_sword_schemaorg_not_in_codemeta():
+    content = """<?xml version="1.0"?>
+    <atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns="https://doi.org/10.5063/schema/codemeta-2.0"
+                xmlns:schema="http://schema.org/">
+      <name>My Software</name>
+      <schema:sameAs>http://example.org/my-software</schema:sameAs>
+    </atom:entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+        "schema:sameAs": "http://example.org/my-software",
+    }
+
+
+def test_sword_atom_name():
+    content = """<?xml version="1.0"?>
+    <entry xmlns="http://www.w3.org/2005/Atom"
+           xmlns:codemeta="https://doi.org/10.5063/schema/codemeta-2.0">
+      <name>My Software</name>
+    </entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": "My Software",
+    }
+
+
+def test_sword_multiple_names():
+    content = """<?xml version="1.0"?>
+    <entry xmlns="http://www.w3.org/2005/Atom"
+           xmlns:codemeta="https://doi.org/10.5063/schema/codemeta-2.0">
+      <name>Atom Name 1</name>
+      <name>Atom Name 2</name>
+      <title>Atom Title 1</title>
+      <title>Atom Title 2</title>
+      <codemeta:name>Codemeta Name 1</codemeta:name>
+      <codemeta:name>Codemeta Name 2</codemeta:name>
+    </entry>
+    """
+
+    result = MAPPINGS["SwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "name": [
+            "Atom Name 1",
+            "Atom Name 2",
+            "Atom Title 1",
+            "Atom Title 2",
+            "Codemeta Name 1",
+            "Codemeta Name 2",
+        ],
+    }
+
+
+def test_json_sword():
+    content = """{"id": "hal-01243573", "@xmlns": "http://www.w3.org/2005/Atom", "author": {"name": "Author 1", "email": "foo@example.org"}, "client": "hal", "codemeta:url": "http://example.org/", "codemeta:name": "The assignment problem", "@xmlns:codemeta": "https://doi.org/10.5063/SCHEMA/CODEMETA-2.0", "codemeta:author": {"codemeta:name": "Author 2"}, "codemeta:license": {"codemeta:name": "GNU General Public License v3.0 or later"}}"""  # noqa
+    result = MAPPINGS["JsonSwordCodemetaMapping"]().translate(content)
+    assert result == {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "author": [
+            {"name": "Author 1", "email": "foo@example.org"},
+            {"name": "Author 2"},
+        ],
+        "license": {"name": "GNU General Public License v3.0 or later"},
+        "name": "The assignment problem",
+        "schema:url": "http://example.org/",
+        "name": "The assignment problem",
+    }
diff --git a/swh/indexer/tests/metadata_dictionary/test_composer.py b/swh/indexer/tests/metadata_dictionary/test_composer.py
index 9513938..809ac01 100644
--- a/swh/indexer/tests/metadata_dictionary/test_composer.py
+++ b/swh/indexer/tests/metadata_dictionary/test_composer.py
@@ -1,84 +1,89 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 
 def test_compute_metadata_composer():
     raw_content = """{
 "name": "symfony/polyfill-mbstring",
 "type": "library",
 "description": "Symfony polyfill for the Mbstring extension",
 "keywords": [
     "polyfill",
     "shim",
     "compatibility",
     "portable"
 ],
 "homepage": "https://symfony.com",
 "license": "MIT",
 "authors": [
     {
         "name": "Nicolas Grekas",
         "email": "p@tchwork.com"
     },
     {
         "name": "Symfony Community",
         "homepage": "https://symfony.com/contributors"
     }
 ],
 "require": {
     "php": ">=7.1"
 },
 "provide": {
     "ext-mbstring": "*"
 },
 "autoload": {
     "files": [
         "bootstrap.php"
     ]
 },
 "suggest": {
     "ext-mbstring": "For best performance"
 },
 "minimum-stability": "dev",
 "extra": {
     "branch-alias": {
         "dev-main": "1.26-dev"
     },
     "thanks": {
         "name": "symfony/polyfill",
         "url": "https://github.com/symfony/polyfill"
     }
 }
 }
     """.encode(
         "utf-8"
     )
 
     result = MAPPINGS["ComposerMapping"]().translate(raw_content)
 
+    assert set(result.pop("keywords")) == {
+        "polyfill",
+        "shim",
+        "compatibility",
+        "portable",
+    }, result
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "symfony/polyfill-mbstring",
-        "keywords": ["polyfill", "shim", "compatibility", "portable"],
         "description": "Symfony polyfill for the Mbstring extension",
         "url": "https://symfony.com",
         "license": "https://spdx.org/licenses/MIT",
         "author": [
             {
                 "type": "Person",
                 "name": "Nicolas Grekas",
                 "email": "p@tchwork.com",
             },
             {
                 "type": "Person",
                 "name": "Symfony Community",
             },
         ],
     }
 
     assert result == expected
diff --git a/swh/indexer/tests/metadata_dictionary/test_dart.py b/swh/indexer/tests/metadata_dictionary/test_dart.py
index 146f7c7..956d088 100644
--- a/swh/indexer/tests/metadata_dictionary/test_dart.py
+++ b/swh/indexer/tests/metadata_dictionary/test_dart.py
@@ -1,157 +1,160 @@
 # Copyright (C) 2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
+import pytest
+
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 
 def test_compute_metadata_pubspec():
     raw_content = """
 ---
 name: newtify
 description: >-
   Have you been turned into a newt?  Would you like to be?
   This package can help. It has all of the
   newt-transmogrification functionality you have been looking
   for.
 keywords:
   - polyfill
   - shim
   - compatibility
   - portable
   - mbstring
 version: 1.2.3
 license: MIT
 homepage: https://example-pet-store.com/newtify
 documentation: https://example-pet-store.com/newtify/docs
 
 environment:
   sdk: '>=2.10.0 <3.0.0'
 
 dependencies:
   efts: ^2.0.4
   transmogrify: ^0.4.0
 
 dev_dependencies:
   test: '>=1.15.0 <2.0.0'
     """.encode(
         "utf-8"
     )
 
     result = MAPPINGS["PubMapping"]().translate(raw_content)
 
+    assert set(result.pop("keywords")) == {
+        "polyfill",
+        "shim",
+        "compatibility",
+        "portable",
+        "mbstring",
+    }, result
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "newtify",
-        "keywords": [
-            "polyfill",
-            "shim",
-            "compatibility",
-            "portable",
-            "mbstring",
-        ],
         "description": """Have you been turned into a newt?  Would you like to be? \
 This package can help. It has all of the \
 newt-transmogrification functionality you have been looking \
 for.""",
         "url": "https://example-pet-store.com/newtify",
         "license": "https://spdx.org/licenses/MIT",
     }
 
     assert result == expected
 
 
 def test_normalize_author_pubspec():
     raw_content = """
     author: Atlee Pine <atlee@example.org>
     """.encode(
         "utf-8"
     )
 
     result = MAPPINGS["PubMapping"]().translate(raw_content)
 
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [
             {"type": "Person", "name": "Atlee Pine", "email": "atlee@example.org"},
         ],
     }
 
     assert result == expected
 
 
 def test_normalize_authors_pubspec():
     raw_content = """
     authors:
       - Vicky Merzown <vmz@example.org>
       - Ron Bilius Weasley
     """.encode(
         "utf-8"
     )
 
     result = MAPPINGS["PubMapping"]().translate(raw_content)
 
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [
             {"type": "Person", "name": "Vicky Merzown", "email": "vmz@example.org"},
             {
                 "type": "Person",
                 "name": "Ron Bilius Weasley",
             },
         ],
     }
 
     assert result == expected
 
 
+@pytest.mark.xfail(reason="https://github.com/w3c/json-ld-api/issues/547")
 def test_normalize_author_authors_pubspec():
     raw_content = """
     authors:
       - Vicky Merzown <vmz@example.org>
       - Ron Bilius Weasley
     author: Hermione Granger
     """.encode(
         "utf-8"
     )
 
     result = MAPPINGS["PubMapping"]().translate(raw_content)
 
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [
             {"type": "Person", "name": "Vicky Merzown", "email": "vmz@example.org"},
             {
                 "type": "Person",
                 "name": "Ron Bilius Weasley",
             },
             {
                 "type": "Person",
                 "name": "Hermione Granger",
             },
         ],
     }
 
     assert result == expected
 
 
 def test_normalize_empty_authors():
     raw_content = """
     authors:
     """.encode(
         "utf-8"
     )
 
     result = MAPPINGS["PubMapping"]().translate(raw_content)
 
     expected = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
     }
 
     assert result == expected
diff --git a/swh/indexer/tests/metadata_dictionary/test_github.py b/swh/indexer/tests/metadata_dictionary/test_github.py
index 290d91c..c0592dc 100644
--- a/swh/indexer/tests/metadata_dictionary/test_github.py
+++ b/swh/indexer/tests/metadata_dictionary/test_github.py
@@ -1,142 +1,142 @@
 # Copyright (C) 2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 CONTEXT = [
     "https://doi.org/10.5063/schema/codemeta-2.0",
     {
         "as": "https://www.w3.org/ns/activitystreams#",
         "forge": "https://forgefed.org/ns#",
     },
 ]
 
 
 def test_compute_metadata_none():
     """
     testing content empty content is empty
     should return None
     """
     content = b""
 
     # None if no metadata was found or an error occurred
     declared_metadata = None
     result = MAPPINGS["GitHubMapping"]().translate(content)
     assert declared_metadata == result
 
 
 def test_supported_terms():
     terms = MAPPINGS["GitHubMapping"].supported_terms()
     assert {
         "http://schema.org/name",
         "http://schema.org/license",
         "https://forgefed.org/ns#forks",
         "https://www.w3.org/ns/activitystreams#totalItems",
     } <= terms
 
 
 def test_compute_metadata_github():
     """
     testing only computation of metadata with hard_mapping_npm
     """
     content = b"""
 {
   "id": 80521091,
   "node_id": "MDEwOlJlcG9zaXRvcnk4MDUyMTA5MQ==",
   "name": "swh-indexer",
   "full_name": "SoftwareHeritage/swh-indexer",
   "private": false,
   "owner": {
     "login": "SoftwareHeritage",
     "id": 18555939,
     "node_id": "MDEyOk9yZ2FuaXphdGlvbjE4NTU1OTM5",
     "avatar_url": "https://avatars.githubusercontent.com/u/18555939?v=4",
     "gravatar_id": "",
     "url": "https://api.github.com/users/SoftwareHeritage",
     "type": "Organization",
     "site_admin": false
   },
   "html_url": "https://github.com/SoftwareHeritage/swh-indexer",
   "description": "GitHub mirror of Metadata indexer",
   "fork": false,
   "url": "https://api.github.com/repos/SoftwareHeritage/swh-indexer",
   "created_at": "2017-01-31T13:05:39Z",
   "updated_at": "2022-06-22T08:02:20Z",
   "pushed_at": "2022-06-29T09:01:08Z",
   "git_url": "git://github.com/SoftwareHeritage/swh-indexer.git",
   "ssh_url": "git@github.com:SoftwareHeritage/swh-indexer.git",
   "clone_url": "https://github.com/SoftwareHeritage/swh-indexer.git",
   "svn_url": "https://github.com/SoftwareHeritage/swh-indexer",
   "homepage": "https://forge.softwareheritage.org/source/swh-indexer/",
   "size": 2713,
   "stargazers_count": 13,
   "watchers_count": 12,
   "language": "Python",
   "has_issues": false,
   "has_projects": false,
   "has_downloads": true,
   "has_wiki": false,
   "has_pages": false,
   "forks_count": 1,
   "mirror_url": null,
   "archived": false,
   "disabled": false,
   "open_issues_count": 0,
   "license": {
     "key": "gpl-3.0",
     "name": "GNU General Public License v3.0",
     "spdx_id": "GPL-3.0",
     "url": "https://api.github.com/licenses/gpl-3.0",
     "node_id": "MDc6TGljZW5zZTk="
   },
   "allow_forking": true,
   "is_template": false,
   "web_commit_signoff_required": false,
   "topics": [
 
   ],
   "visibility": "public",
   "forks": 1,
   "open_issues": 0,
   "watchers": 13,
   "default_branch": "master",
   "temp_clone_token": null,
   "organization": {
     "login": "SoftwareHeritage",
     "id": 18555939,
     "node_id": "MDEyOk9yZ2FuaXphdGlvbjE4NTU1OTM5",
     "avatar_url": "https://avatars.githubusercontent.com/u/18555939?v=4",
     "gravatar_id": "",
     "type": "Organization",
     "site_admin": false
   },
   "network_count": 1,
   "subscribers_count": 6
 }
 
     """
     result = MAPPINGS["GitHubMapping"]().translate(content)
     assert result == {
         "@context": CONTEXT,
-        "type": "https://forgefed.org/ns#Repository",
+        "type": "forge:Repository",
         "forge:forks": {
             "as:totalItems": 1,
             "type": "as:OrderedCollection",
         },
         "as:likes": {
             "as:totalItems": 13,
             "type": "as:Collection",
         },
         "as:followers": {
             "as:totalItems": 12,
             "type": "as:Collection",
         },
         "license": "https://spdx.org/licenses/GPL-3.0",
         "name": "SoftwareHeritage/swh-indexer",
         "description": "GitHub mirror of Metadata indexer",
         "schema:codeRepository": "https://github.com/SoftwareHeritage/swh-indexer",
         "schema:dateCreated": "2017-01-31T13:05:39Z",
         "schema:dateModified": "2022-06-22T08:02:20Z",
     }
diff --git a/swh/indexer/tests/metadata_dictionary/test_maven.py b/swh/indexer/tests/metadata_dictionary/test_maven.py
index ea51860..0267e95 100644
--- a/swh/indexer/tests/metadata_dictionary/test_maven.py
+++ b/swh/indexer/tests/metadata_dictionary/test_maven.py
@@ -1,365 +1,365 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import logging
 
 from hypothesis import HealthCheck, given, settings
 
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 from ..utils import xml_document_strategy
 
 
 def test_compute_metadata_maven():
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
       <repositories>
         <repository>
           <id>central</id>
           <name>Maven Repository Switchboard</name>
           <layout>default</layout>
           <url>http://repo1.maven.org/maven2</url>
           <snapshots>
             <enabled>false</enabled>
           </snapshots>
         </repository>
       </repositories>
       <licenses>
         <license>
           <name>Apache License, Version 2.0</name>
           <url>https://www.apache.org/licenses/LICENSE-2.0.txt</url>
           <distribution>repo</distribution>
           <comments>A business-friendly OSS license</comments>
         </license>
       </licenses>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
         "license": "https://www.apache.org/licenses/LICENSE-2.0.txt",
         "codeRepository": ("http://repo1.maven.org/maven2/com/mycompany/app/my-app"),
     }
 
 
 def test_compute_metadata_maven_empty():
     raw_content = b"""
     <project>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
     }
 
 
 def test_compute_metadata_maven_almost_empty():
     raw_content = b"""
     <project>
       <foo/>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
     }
 
 
 def test_compute_metadata_maven_invalid_xml(caplog):
     expected_warning = (
         "swh.indexer.metadata_dictionary.maven.MavenMapping",
         logging.WARNING,
         "Error parsing XML from foo",
     )
     caplog.at_level(logging.WARNING, logger="swh.indexer.metadata_dictionary")
 
     raw_content = b"""
     <project>"""
     caplog.clear()
     result = MAPPINGS["MavenMapping"]("foo").translate(raw_content)
     assert caplog.record_tuples == [expected_warning], result
     assert result is None
 
     raw_content = b"""
     """
     caplog.clear()
     result = MAPPINGS["MavenMapping"]("foo").translate(raw_content)
     assert caplog.record_tuples == [expected_warning], result
     assert result is None
 
 
 def test_compute_metadata_maven_unknown_encoding(caplog):
     expected_warning = (
         "swh.indexer.metadata_dictionary.maven.MavenMapping",
         logging.WARNING,
         "Error detecting XML encoding from foo",
     )
     caplog.at_level(logging.WARNING, logger="swh.indexer.metadata_dictionary")
 
     raw_content = b"""<?xml version="1.0" encoding="foo"?>
     <project>
     </project>"""
     caplog.clear()
     result = MAPPINGS["MavenMapping"]("foo").translate(raw_content)
     assert caplog.record_tuples == [expected_warning], result
     assert result is None
 
     raw_content = b"""<?xml version="1.0" encoding="UTF-7"?>
     <project>
     </project>"""
     caplog.clear()
     result = MAPPINGS["MavenMapping"]("foo").translate(raw_content)
     assert caplog.record_tuples == [expected_warning], result
     assert result is None
 
 
 def test_compute_metadata_maven_invalid_encoding(caplog):
     expected_warning = [
         # libexpat1 <= 2.2.10-2+deb11u1
         [
             (
                 "swh.indexer.metadata_dictionary.maven.MavenMapping",
                 logging.WARNING,
                 "Error unidecoding XML from foo",
             )
         ],
         # libexpat1 >= 2.2.10-2+deb11u2
         [
             (
                 "swh.indexer.metadata_dictionary.maven.MavenMapping",
                 logging.WARNING,
                 "Error parsing XML from foo",
             )
         ],
     ]
     caplog.at_level(logging.WARNING, logger="swh.indexer.metadata_dictionary")
 
     raw_content = b"""<?xml version="1.0" encoding="UTF-8"?>
     <foo\xe5ct>
     </foo>"""
     caplog.clear()
     result = MAPPINGS["MavenMapping"]("foo").translate(raw_content)
     assert caplog.record_tuples in expected_warning, result
     assert result is None
 
 
 def test_compute_metadata_maven_minimal():
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
         "codeRepository": (
             "https://repo.maven.apache.org/maven2/com/mycompany/app/my-app"
         ),
     }
 
 
 def test_compute_metadata_maven_empty_nodes():
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
       <repositories>
       </repositories>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
         "codeRepository": (
             "https://repo.maven.apache.org/maven2/com/mycompany/app/my-app"
         ),
     }
 
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version></version>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "codeRepository": (
             "https://repo.maven.apache.org/maven2/com/mycompany/app/my-app"
         ),
     }
 
     raw_content = b"""
     <project>
       <name></name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
         "codeRepository": (
             "https://repo.maven.apache.org/maven2/com/mycompany/app/my-app"
         ),
     }
 
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
       <licenses>
       </licenses>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
         "codeRepository": (
             "https://repo.maven.apache.org/maven2/com/mycompany/app/my-app"
         ),
     }
 
     raw_content = b"""
     <project>
       <groupId></groupId>
       <version>1.2.3</version>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "version": "1.2.3",
     }
 
 
 def test_compute_metadata_maven_invalid_licenses():
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
       <licenses>
         foo
       </licenses>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
         "codeRepository": (
             "https://repo.maven.apache.org/maven2/com/mycompany/app/my-app"
         ),
     }
 
 
 def test_compute_metadata_maven_multiple():
     """Tests when there are multiple code repos and licenses."""
     raw_content = b"""
     <project>
       <name>Maven Default Project</name>
       <modelVersion>4.0.0</modelVersion>
       <groupId>com.mycompany.app</groupId>
       <artifactId>my-app</artifactId>
       <version>1.2.3</version>
       <repositories>
         <repository>
           <id>central</id>
           <name>Maven Repository Switchboard</name>
           <layout>default</layout>
           <url>http://repo1.maven.org/maven2</url>
           <snapshots>
             <enabled>false</enabled>
           </snapshots>
         </repository>
         <repository>
           <id>example</id>
           <name>Example Maven Repo</name>
           <layout>default</layout>
           <url>http://example.org/maven2</url>
         </repository>
       </repositories>
       <licenses>
         <license>
           <name>Apache License, Version 2.0</name>
           <url>https://www.apache.org/licenses/LICENSE-2.0.txt</url>
           <distribution>repo</distribution>
           <comments>A business-friendly OSS license</comments>
         </license>
         <license>
           <name>MIT license</name>
           <url>https://opensource.org/licenses/MIT</url>
         </license>
       </licenses>
     </project>"""
     result = MAPPINGS["MavenMapping"]().translate(raw_content)
+    assert set(result.pop("license")) == {
+        "https://www.apache.org/licenses/LICENSE-2.0.txt",
+        "https://opensource.org/licenses/MIT",
+    }, result
+    assert set(result.pop("codeRepository")) == {
+        "http://repo1.maven.org/maven2/com/mycompany/app/my-app",
+        "http://example.org/maven2/com/mycompany/app/my-app",
+    }, result
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "Maven Default Project",
-        "identifier": "com.mycompany.app",
+        "schema:identifier": "com.mycompany.app",
         "version": "1.2.3",
-        "license": [
-            "https://www.apache.org/licenses/LICENSE-2.0.txt",
-            "https://opensource.org/licenses/MIT",
-        ],
-        "codeRepository": [
-            "http://repo1.maven.org/maven2/com/mycompany/app/my-app",
-            "http://example.org/maven2/com/mycompany/app/my-app",
-        ],
     }
 
 
 @settings(suppress_health_check=[HealthCheck.too_slow])
 @given(
     xml_document_strategy(
         keys=list(MAPPINGS["MavenMapping"].mapping),  # type: ignore
         root="project",
         xmlns="http://maven.apache.org/POM/4.0.0",
     )
 )
 def test_maven_adversarial(doc):
     MAPPINGS["MavenMapping"]().translate(doc)
diff --git a/swh/indexer/tests/metadata_dictionary/test_npm.py b/swh/indexer/tests/metadata_dictionary/test_npm.py
index 781e995..000cb7c 100644
--- a/swh/indexer/tests/metadata_dictionary/test_npm.py
+++ b/swh/indexer/tests/metadata_dictionary/test_npm.py
@@ -1,318 +1,313 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import json
 
 from hypothesis import HealthCheck, given, settings
 import pytest
 
 from swh.indexer.metadata_detector import detect_metadata
 from swh.indexer.metadata_dictionary import MAPPINGS
 from swh.indexer.storage.model import ContentMetadataRow
 
 from ..test_metadata import TRANSLATOR_TOOL, ContentMetadataTestIndexer
 from ..utils import (
     BASE_TEST_CONFIG,
     MAPPING_DESCRIPTION_CONTENT_SHA1,
     json_document_strategy,
 )
 
 
 def test_compute_metadata_none():
     """
     testing content empty content is empty
     should return None
     """
     content = b""
 
     # None if no metadata was found or an error occurred
     declared_metadata = None
     result = MAPPINGS["NpmMapping"]().translate(content)
     assert declared_metadata == result
 
 
 def test_compute_metadata_npm():
     """
     testing only computation of metadata with hard_mapping_npm
     """
     content = b"""
         {
             "name": "test_metadata",
             "version": "0.0.2",
             "description": "Simple package.json test for indexer",
               "repository": {
                 "type": "git",
                 "url": "https://github.com/moranegg/metadata_test"
             },
             "author": {
                 "email": "moranegg@example.com",
                 "name": "Morane G"
             }
         }
     """
     declared_metadata = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "test_metadata",
         "version": "0.0.2",
         "description": "Simple package.json test for indexer",
         "codeRepository": "git+https://github.com/moranegg/metadata_test",
         "author": [
             {
                 "type": "Person",
                 "name": "Morane G",
                 "email": "moranegg@example.com",
             }
         ],
     }
 
     result = MAPPINGS["NpmMapping"]().translate(content)
     assert declared_metadata == result
 
 
 def test_compute_metadata_invalid_description_npm():
     """
     testing only computation of metadata with hard_mapping_npm
     """
     content = b"""
         {
             "name": "test_metadata",
             "version": "0.0.2",
             "description": 1234
     }
     """
     declared_metadata = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "test_metadata",
         "version": "0.0.2",
     }
 
     result = MAPPINGS["NpmMapping"]().translate(content)
     assert declared_metadata == result
 
 
 def test_index_content_metadata_npm(storage, obj_storage):
     """
     testing NPM with package.json
     - one sha1 uses a file that can't be translated to metadata and
       should return None in the translated metadata
     """
     sha1s = [
         MAPPING_DESCRIPTION_CONTENT_SHA1["json:test-metadata-package.json"],
         MAPPING_DESCRIPTION_CONTENT_SHA1["json:npm-package.json"],
         MAPPING_DESCRIPTION_CONTENT_SHA1["python:code"],
     ]
 
     # this metadata indexer computes only metadata for package.json
     # in npm context with a hard mapping
     config = BASE_TEST_CONFIG.copy()
     config["tools"] = [TRANSLATOR_TOOL]
     metadata_indexer = ContentMetadataTestIndexer(config=config)
     metadata_indexer.run(sha1s, log_suffix="unknown content")
     results = list(metadata_indexer.idx_storage.content_metadata_get(sha1s))
 
     expected_results = [
         ContentMetadataRow(
             id=sha1s[0],
             tool=TRANSLATOR_TOOL,
             metadata={
                 "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                 "type": "SoftwareSourceCode",
                 "codeRepository": "git+https://github.com/moranegg/metadata_test",
                 "description": "Simple package.json test for indexer",
                 "name": "test_metadata",
                 "version": "0.0.1",
             },
         ),
         ContentMetadataRow(
             id=sha1s[1],
             tool=TRANSLATOR_TOOL,
             metadata={
                 "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                 "type": "SoftwareSourceCode",
                 "issueTracker": "https://github.com/npm/npm/issues",
                 "author": [
                     {
                         "type": "Person",
                         "name": "Isaac Z. Schlueter",
                         "email": "i@izs.me",
                         "url": "http://blog.izs.me",
                     }
                 ],
                 "codeRepository": "git+https://github.com/npm/npm",
                 "description": "a package manager for JavaScript",
                 "license": "https://spdx.org/licenses/Artistic-2.0",
                 "version": "5.0.3",
                 "name": "npm",
-                "keywords": [
-                    "install",
-                    "modules",
-                    "package manager",
-                    "package.json",
-                ],
                 "url": "https://docs.npmjs.com/",
             },
         ),
     ]
 
     for result in results:
         del result.tool["id"]
+        result.metadata.pop("keywords", None)
 
     # The assertion below returns False sometimes because of nested lists
     assert expected_results == results
 
 
 def test_npm_bugs_normalization():
     # valid dictionary
     package_json = b"""{
         "name": "foo",
         "bugs": {
             "url": "https://github.com/owner/project/issues",
             "email": "foo@example.com"
         }
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "issueTracker": "https://github.com/owner/project/issues",
         "type": "SoftwareSourceCode",
     }
 
     # "invalid" dictionary
     package_json = b"""{
         "name": "foo",
         "bugs": {
             "email": "foo@example.com"
         }
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "type": "SoftwareSourceCode",
     }
 
     # string
     package_json = b"""{
         "name": "foo",
         "bugs": "https://github.com/owner/project/issues"
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "issueTracker": "https://github.com/owner/project/issues",
         "type": "SoftwareSourceCode",
     }
 
 
 def test_npm_repository_normalization():
     # normal
     package_json = b"""{
         "name": "foo",
         "repository": {
             "type" : "git",
             "url" : "https://github.com/npm/cli.git"
         }
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "codeRepository": "git+https://github.com/npm/cli.git",
         "type": "SoftwareSourceCode",
     }
 
     # missing url
     package_json = b"""{
         "name": "foo",
         "repository": {
             "type" : "git"
         }
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "type": "SoftwareSourceCode",
     }
 
     # github shortcut
     package_json = b"""{
         "name": "foo",
         "repository": "github:npm/cli"
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     expected_result = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "codeRepository": "git+https://github.com/npm/cli.git",
         "type": "SoftwareSourceCode",
     }
     assert result == expected_result
 
     # github shortshortcut
     package_json = b"""{
         "name": "foo",
         "repository": "npm/cli"
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == expected_result
 
     # gitlab shortcut
     package_json = b"""{
         "name": "foo",
         "repository": "gitlab:user/repo"
     }"""
     result = MAPPINGS["NpmMapping"]().translate(package_json)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "name": "foo",
         "codeRepository": "git+https://gitlab.com/user/repo.git",
         "type": "SoftwareSourceCode",
     }
 
 
 @settings(suppress_health_check=[HealthCheck.too_slow])
 @given(json_document_strategy(keys=list(MAPPINGS["NpmMapping"].mapping)))  # type: ignore
 def test_npm_adversarial(doc):
     raw = json.dumps(doc).encode()
     MAPPINGS["NpmMapping"]().translate(raw)
 
 
 @pytest.mark.parametrize(
     "filename", [b"package.json", b"Package.json", b"PACKAGE.json", b"PACKAGE.JSON"]
 )
 def test_detect_metadata_package_json(filename):
     df = [
         {
             "sha1_git": b"abc",
             "name": b"index.js",
             "target": b"abc",
             "length": 897,
             "status": "visible",
             "type": "file",
             "perms": 33188,
             "dir_id": b"dir_a",
             "sha1": b"bcd",
         },
         {
             "sha1_git": b"aab",
             "name": filename,
             "target": b"aab",
             "length": 712,
             "status": "visible",
             "type": "file",
             "perms": 33188,
             "dir_id": b"dir_a",
             "sha1": b"cde",
         },
     ]
     results = detect_metadata(df)
 
     expected_results = {"NpmMapping": [b"cde"]}
     assert expected_results == results
diff --git a/swh/indexer/tests/metadata_dictionary/test_nuget.py b/swh/indexer/tests/metadata_dictionary/test_nuget.py
new file mode 100644
index 0000000..e83ad6f
--- /dev/null
+++ b/swh/indexer/tests/metadata_dictionary/test_nuget.py
@@ -0,0 +1,172 @@
+# Copyright (C) 2022  The Software Heritage developers
+# See the AUTHORS file at the top-level directory of this distribution
+# License: GNU General Public License version 3, or any later version
+# See top-level LICENSE file for more information
+
+import pytest
+
+from swh.indexer.metadata_detector import detect_metadata
+from swh.indexer.metadata_dictionary import MAPPINGS
+
+
+def test_compute_metadata_nuget():
+    raw_content = b"""<?xml version="1.0" encoding="utf-8"?>
+    <package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
+        <metadata>
+            <id>sample</id>
+            <version>1.2.3</version>
+            <authors>Kim Abercrombie, Franck Halmaert</authors>
+            <description>Sample exists only to show a sample .nuspec file.</description>
+            <summary>Summary is being deprecated. Use description instead.</summary>
+            <projectUrl>http://example.org/</projectUrl>
+            <repository type="git" url="https://github.com/NuGet/NuGet.Client.git"/>
+            <license type="expression">MIT</license>
+            <licenseUrl>https://raw.github.com/timrwood/moment/master/LICENSE</licenseUrl>
+            <dependencies>
+                <dependency id="another-package" version="3.0.0" />
+                <dependency id="yet-another-package" version="1.0.0" />
+            </dependencies>
+            <releaseNotes>
+                See the [changelog](https://github.com/httpie/httpie/releases/tag/3.2.0).
+            </releaseNotes>
+            <tags>python3 java cpp search-tag</tags>
+        </metadata>
+        <files>
+            <file src="bin\\Debug\\*.dll" target="lib" />
+        </files>
+    </package>"""
+
+    result = MAPPINGS["NuGetMapping"]().translate(raw_content)
+
+    assert set(result.pop("keywords")) == {
+        "python3",
+        "java",
+        "cpp",
+        "search-tag",
+    }, result
+
+    assert set(result.pop("license")) == {
+        "https://spdx.org/licenses/MIT",
+        "https://raw.github.com/timrwood/moment/master/LICENSE",
+    }, result
+
+    assert set(result.pop("description")) == {
+        "Sample exists only to show a sample .nuspec file.",
+        "Summary is being deprecated. Use description instead.",
+    }, result
+
+    expected = {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "type": "SoftwareSourceCode",
+        "author": [
+            {"type": "Person", "name": "Kim Abercrombie"},
+            {"type": "Person", "name": "Franck Halmaert"},
+        ],
+        "codeRepository": "https://github.com/NuGet/NuGet.Client.git",
+        "url": "http://example.org/",
+        "version": "1.2.3",
+        "schema:releaseNotes": (
+            "See the [changelog](https://github.com/httpie/httpie/releases/tag/3.2.0)."
+        ),
+    }
+
+    assert result == expected
+
+
+@pytest.mark.parametrize(
+    "filename",
+    [b"package_name.nuspec", b"number_5.nuspec", b"CAPS.nuspec", b"\x8anan.nuspec"],
+)
+def test_detect_metadata_package_nuspec(filename):
+    df = [
+        {
+            "sha1_git": b"abc",
+            "name": b"example.json",
+            "target": b"abc",
+            "length": 897,
+            "status": "visible",
+            "type": "file",
+            "perms": 33188,
+            "dir_id": b"dir_a",
+            "sha1": b"bcd",
+        },
+        {
+            "sha1_git": b"aab",
+            "name": filename,
+            "target": b"aab",
+            "length": 712,
+            "status": "visible",
+            "type": "file",
+            "perms": 33188,
+            "dir_id": b"dir_a",
+            "sha1": b"cde",
+        },
+    ]
+    results = detect_metadata(df)
+
+    expected_results = {"NuGetMapping": [b"cde"]}
+    assert expected_results == results
+
+
+def test_normalize_license_multiple_licenses_or_delimiter():
+    raw_content = raw_content = b"""<?xml version="1.0" encoding="utf-8"?>
+    <package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
+        <metadata>
+            <license type="expression">BitTorrent-1.0 or GPL-3.0-with-GCC-exception</license>
+        </metadata>
+        <files>
+            <file src="bin\\Debug\\*.dll" target="lib" />
+        </files>
+    </package>"""
+    result = MAPPINGS["NuGetMapping"]().translate(raw_content)
+    assert set(result.pop("license")) == {
+        "https://spdx.org/licenses/BitTorrent-1.0",
+        "https://spdx.org/licenses/GPL-3.0-with-GCC-exception",
+    }
+    expected = {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "type": "SoftwareSourceCode",
+    }
+
+    assert result == expected
+
+
+def test_normalize_license_unsupported_delimiter():
+    raw_content = raw_content = b"""<?xml version="1.0" encoding="utf-8"?>
+    <package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
+        <metadata>
+            <license type="expression">(MIT)</license>
+        </metadata>
+        <files>
+            <file src="bin\\Debug\\*.dll" target="lib" />
+        </files>
+    </package>"""
+    result = MAPPINGS["NuGetMapping"]().translate(raw_content)
+    expected = {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "type": "SoftwareSourceCode",
+    }
+
+    assert result == expected
+
+
+def test_copyrightNotice_absolute_uri_property():
+    raw_content = raw_content = b"""<?xml version="1.0" encoding="utf-8"?>
+    <package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
+        <metadata>
+            <copyright>Copyright 2017-2022</copyright>
+            <language>en-us</language>
+        </metadata>
+        <files>
+            <file src="bin\\Debug\\*.dll" target="lib" />
+        </files>
+    </package>"""
+    result = MAPPINGS["NuGetMapping"]().translate(raw_content)
+    expected = {
+        "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
+        "type": "SoftwareSourceCode",
+        "schema:copyrightNotice": "Copyright 2017-2022",
+        "schema:inLanguage": "en-us",
+    }
+
+    assert result == expected
diff --git a/swh/indexer/tests/metadata_dictionary/test_python.py b/swh/indexer/tests/metadata_dictionary/test_python.py
index 106a9ca..dbbabd1 100644
--- a/swh/indexer/tests/metadata_dictionary/test_python.py
+++ b/swh/indexer/tests/metadata_dictionary/test_python.py
@@ -1,114 +1,113 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 
 def test_compute_metadata_pkginfo():
     raw_content = b"""\
 Metadata-Version: 2.1
 Name: swh.core
 Version: 0.0.49
 Summary: Software Heritage core utilities
 Home-page: https://forge.softwareheritage.org/diffusion/DCORE/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-core
 Description: swh-core
         ========
        \x20
         core library for swh's modules:
         - config parser
         - hash computations
         - serialization
         - logging mechanism
        \x20
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 """  # noqa
     result = MAPPINGS["PythonPkginfoMapping"]().translate(raw_content)
-    assert result["description"] == [
+    assert set(result.pop("description")) == {
         "Software Heritage core utilities",  # note the comma here
         "swh-core\n"
         "========\n"
         "\n"
         "core library for swh's modules:\n"
         "- config parser\n"
         "- hash computations\n"
         "- serialization\n"
         "- logging mechanism\n"
         "",
-    ], result
-    del result["description"]
+    }, result
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "url": "https://forge.softwareheritage.org/diffusion/DCORE/",
         "name": "swh.core",
         "author": [
             {
                 "type": "Person",
                 "name": "Software Heritage developers",
                 "email": "swh-devel@inria.fr",
             }
         ],
         "version": "0.0.49",
     }
 
 
 def test_compute_metadata_pkginfo_utf8():
     raw_content = b"""\
 Metadata-Version: 1.1
 Name: snowpyt
 Description-Content-Type: UNKNOWN
 Description: foo
         Hydrology N\xc2\xb083
 """  # noqa
     result = MAPPINGS["PythonPkginfoMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "snowpyt",
         "description": "foo\nHydrology N°83",
     }
 
 
 def test_compute_metadata_pkginfo_keywords():
     raw_content = b"""\
 Metadata-Version: 2.1
 Name: foo
 Keywords: foo bar baz
 """  # noqa
     result = MAPPINGS["PythonPkginfoMapping"]().translate(raw_content)
+    assert set(result.pop("keywords")) == {"foo", "bar", "baz"}, result
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "foo",
-        "keywords": ["foo", "bar", "baz"],
     }
 
 
 def test_compute_metadata_pkginfo_license():
     raw_content = b"""\
 Metadata-Version: 2.1
 Name: foo
 License: MIT
 """  # noqa
     result = MAPPINGS["PythonPkginfoMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "foo",
-        "license": "MIT",
+        "license": "https://spdx.org/licenses/MIT",
     }
diff --git a/swh/indexer/tests/metadata_dictionary/test_ruby.py b/swh/indexer/tests/metadata_dictionary/test_ruby.py
index ba2cc30..53e0a0a 100644
--- a/swh/indexer/tests/metadata_dictionary/test_ruby.py
+++ b/swh/indexer/tests/metadata_dictionary/test_ruby.py
@@ -1,134 +1,136 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 from hypothesis import HealthCheck, given, settings, strategies
+import pytest
 
 from swh.indexer.metadata_dictionary import MAPPINGS
 
 
 def test_gemspec_base():
     raw_content = b"""
 Gem::Specification.new do |s|
 s.name        = 'example'
 s.version     = '0.1.0'
 s.licenses    = ['MIT']
 s.summary     = "This is an example!"
 s.description = "Much longer explanation of the example!"
 s.authors     = ["Ruby Coder"]
 s.email       = 'rubycoder@example.com'
 s.files       = ["lib/example.rb"]
 s.homepage    = 'https://rubygems.org/gems/example'
 s.metadata    = { "source_code_uri" => "https://github.com/example/example" }
 end"""
     result = MAPPINGS["GemspecMapping"]().translate(raw_content)
     assert set(result.pop("description")) == {
         "This is an example!",
         "Much longer explanation of the example!",
     }
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [{"type": "Person", "name": "Ruby Coder"}],
         "name": "example",
         "license": "https://spdx.org/licenses/MIT",
         "codeRepository": "https://rubygems.org/gems/example",
         "email": "rubycoder@example.com",
         "version": "0.1.0",
     }
 
 
+@pytest.mark.xfail(reason="https://github.com/w3c/json-ld-api/issues/547")
 def test_gemspec_two_author_fields():
     raw_content = b"""
 Gem::Specification.new do |s|
 s.authors     = ["Ruby Coder1"]
 s.author      = "Ruby Coder2"
 end"""
     result = MAPPINGS["GemspecMapping"]().translate(raw_content)
     assert result.pop("author") in (
         [
             {"type": "Person", "name": "Ruby Coder1"},
             {"type": "Person", "name": "Ruby Coder2"},
         ],
         [
             {"type": "Person", "name": "Ruby Coder2"},
             {"type": "Person", "name": "Ruby Coder1"},
         ],
     )
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
     }
 
 
 def test_gemspec_invalid_author():
     raw_content = b"""
 Gem::Specification.new do |s|
 s.author      = ["Ruby Coder"]
 end"""
     result = MAPPINGS["GemspecMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
     }
     raw_content = b"""
 Gem::Specification.new do |s|
 s.author      = "Ruby Coder1",
 end"""
     result = MAPPINGS["GemspecMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
     }
     raw_content = b"""
 Gem::Specification.new do |s|
 s.authors     = ["Ruby Coder1", ["Ruby Coder2"]]
 end"""
     result = MAPPINGS["GemspecMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "author": [{"type": "Person", "name": "Ruby Coder1"}],
     }
 
 
 def test_gemspec_alternative_header():
     raw_content = b"""
 require './lib/version'
 
 Gem::Specification.new { |s|
 s.name = 'rb-system-with-aliases'
 s.summary = 'execute system commands with aliases'
 }
 """
     result = MAPPINGS["GemspecMapping"]().translate(raw_content)
     assert result == {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "type": "SoftwareSourceCode",
         "name": "rb-system-with-aliases",
         "description": "execute system commands with aliases",
     }
 
 
 @settings(suppress_health_check=[HealthCheck.too_slow])
 @given(
     strategies.dictionaries(
         # keys
         strategies.one_of(
             strategies.text(),
             *map(strategies.just, MAPPINGS["GemspecMapping"].mapping),  # type: ignore
         ),
         # values
         strategies.recursive(
             strategies.characters(),
             lambda children: strategies.lists(children, min_size=1),
         ),
     )
 )
 def test_gemspec_adversarial(doc):
     parts = [b"Gem::Specification.new do |s|\n"]
     for (k, v) in doc.items():
         parts.append("  s.{} = {}\n".format(k, repr(v)).encode())
     parts.append(b"end\n")
     MAPPINGS["GemspecMapping"]().translate(b"".join(parts))
diff --git a/swh/indexer/tests/test_cli.py b/swh/indexer/tests/test_cli.py
index bd67a05..6bbab40 100644
--- a/swh/indexer/tests/test_cli.py
+++ b/swh/indexer/tests/test_cli.py
@@ -1,908 +1,922 @@
 # Copyright (C) 2019-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import datetime
 from functools import reduce
 import re
 from typing import Any, Dict, List
 from unittest.mock import patch
 
 import attr
 from click.testing import CliRunner
 from confluent_kafka import Consumer
 import pytest
 
 from swh.indexer import fossology_license
 from swh.indexer.cli import indexer_cli_group
 from swh.indexer.storage.interface import IndexerStorageInterface
 from swh.indexer.storage.model import (
     ContentLicenseRow,
     ContentMimetypeRow,
     DirectoryIntrinsicMetadataRow,
     OriginExtrinsicMetadataRow,
     OriginIntrinsicMetadataRow,
 )
 from swh.journal.writer import get_journal_writer
 from swh.model.hashutil import hash_to_bytes
 from swh.model.model import Content, Origin, OriginVisitStatus
 
 from .test_metadata import REMD
 from .utils import (
     DIRECTORY2,
     RAW_CONTENT_IDS,
     RAW_CONTENTS,
     REVISION,
     SHA1_TO_LICENSES,
     mock_compute_license,
 )
 
 
 def fill_idx_storage(idx_storage: IndexerStorageInterface, nb_rows: int) -> List[int]:
     tools: List[Dict[str, Any]] = [
         {
             "tool_name": "tool %d" % i,
             "tool_version": "0.0.1",
             "tool_configuration": {},
         }
         for i in range(2)
     ]
     tools = idx_storage.indexer_configuration_add(tools)
 
     origin_metadata = [
         OriginIntrinsicMetadataRow(
             id="file://dev/%04d" % origin_id,
             from_directory=hash_to_bytes("abcd{:0>36}".format(origin_id)),
             indexer_configuration_id=tools[origin_id % 2]["id"],
             metadata={"name": "origin %d" % origin_id},
             mappings=["mapping%d" % (origin_id % 10)],
         )
         for origin_id in range(nb_rows)
     ]
     directory_metadata = [
         DirectoryIntrinsicMetadataRow(
             id=hash_to_bytes("abcd{:0>36}".format(origin_id)),
             indexer_configuration_id=tools[origin_id % 2]["id"],
             metadata={"name": "origin %d" % origin_id},
             mappings=["mapping%d" % (origin_id % 10)],
         )
         for origin_id in range(nb_rows)
     ]
 
     idx_storage.directory_intrinsic_metadata_add(directory_metadata)
     idx_storage.origin_intrinsic_metadata_add(origin_metadata)
 
     return [tool["id"] for tool in tools]
 
 
 def _origins_in_task_args(tasks):
     """Returns the set of origins contained in the arguments of the
     provided tasks (assumed to be of type index-origin-metadata)."""
     return reduce(
         set.union, (set(task["arguments"]["args"][0]) for task in tasks), set()
     )
 
 
 def _assert_tasks_for_origins(tasks, origins):
     expected_kwargs = {}
     assert {task["type"] for task in tasks} == {"index-origin-metadata"}
     assert all(len(task["arguments"]["args"]) == 1 for task in tasks)
     for task in tasks:
         assert task["arguments"]["kwargs"] == expected_kwargs, task
     assert _origins_in_task_args(tasks) == set(["file://dev/%04d" % i for i in origins])
 
 
 @pytest.fixture
 def cli_runner():
     return CliRunner()
 
 
 def test_cli_mapping_list(cli_runner, swh_config):
     result = cli_runner.invoke(
         indexer_cli_group,
         ["-C", swh_config, "mapping", "list"],
         catch_exceptions=False,
     )
     expected_output = "\n".join(
         [
             "cff",
             "codemeta",
             "composer",
             "gemspec",
             "github",
+            "json-sword-codemeta",
             "maven",
             "npm",
+            "nuget",
             "pkg-info",
             "pubspec",
+            "sword-codemeta",
             "",
         ]  # must be sorted for test to pass
     )
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
 
 def test_cli_mapping_list_terms(cli_runner, swh_config):
     result = cli_runner.invoke(
         indexer_cli_group,
         ["-C", swh_config, "mapping", "list-terms"],
         catch_exceptions=False,
     )
     assert result.exit_code == 0, result.output
     assert re.search(r"http://schema.org/url:\n.*npm", result.output)
     assert re.search(r"http://schema.org/url:\n.*codemeta", result.output)
     assert re.search(
         r"https://codemeta.github.io/terms/developmentStatus:\n\tcodemeta",
         result.output,
     )
 
 
 def test_cli_mapping_list_terms_exclude(cli_runner, swh_config):
     result = cli_runner.invoke(
         indexer_cli_group,
-        ["-C", swh_config, "mapping", "list-terms", "--exclude-mapping", "codemeta"],
+        [
+            "-C",
+            swh_config,
+            "mapping",
+            "list-terms",
+            "--exclude-mapping",
+            "codemeta",
+            "--exclude-mapping",
+            "json-sword-codemeta",
+            "--exclude-mapping",
+            "sword-codemeta",
+        ],
         catch_exceptions=False,
     )
     assert result.exit_code == 0, result.output
     assert re.search(r"http://schema.org/url:\n.*npm", result.output)
     assert not re.search(r"http://schema.org/url:\n.*codemeta", result.output)
     assert not re.search(
         r"https://codemeta.github.io/terms/developmentStatus:\n\tcodemeta",
         result.output,
     )
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_empty_db(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "schedule",
             "reindex_origin_metadata",
         ],
         catch_exceptions=False,
     )
     expected_output = "Nothing to do (no origin metadata matched the criteria).\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 0
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_divisor(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     """Tests the re-indexing when origin_batch_size*task_batch_size is a
     divisor of nb_origins."""
     fill_idx_storage(idx_storage, 90)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "schedule",
             "reindex_origin_metadata",
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = (
         "Scheduled 3 tasks (30 origins).\n"
         "Scheduled 6 tasks (60 origins).\n"
         "Scheduled 9 tasks (90 origins).\n"
         "Done.\n"
     )
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 9
     _assert_tasks_for_origins(tasks, range(90))
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_dry_run(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     """Tests the re-indexing when origin_batch_size*task_batch_size is a
     divisor of nb_origins."""
     fill_idx_storage(idx_storage, 90)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "schedule",
             "--dry-run",
             "reindex_origin_metadata",
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = (
         "Scheduled 3 tasks (30 origins).\n"
         "Scheduled 6 tasks (60 origins).\n"
         "Scheduled 9 tasks (90 origins).\n"
         "Done.\n"
     )
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 0
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_nondivisor(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     """Tests the re-indexing when neither origin_batch_size or
     task_batch_size is a divisor of nb_origins."""
     fill_idx_storage(idx_storage, 70)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "schedule",
             "reindex_origin_metadata",
             "--batch-size",
             "20",
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = (
         "Scheduled 3 tasks (60 origins).\n"
         "Scheduled 4 tasks (70 origins).\n"
         "Done.\n"
     )
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 4
     _assert_tasks_for_origins(tasks, range(70))
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_filter_one_mapping(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     """Tests the re-indexing when origin_batch_size*task_batch_size is a
     divisor of nb_origins."""
     fill_idx_storage(idx_storage, 110)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "schedule",
             "reindex_origin_metadata",
             "--mapping",
             "mapping1",
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Scheduled 2 tasks (11 origins).\nDone.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 2
     _assert_tasks_for_origins(tasks, [1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101])
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_filter_two_mappings(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     """Tests the re-indexing when origin_batch_size*task_batch_size is a
     divisor of nb_origins."""
     fill_idx_storage(idx_storage, 110)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "--config-file",
             swh_config,
             "schedule",
             "reindex_origin_metadata",
             "--mapping",
             "mapping1",
             "--mapping",
             "mapping2",
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Scheduled 3 tasks (22 origins).\nDone.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 3
     _assert_tasks_for_origins(
         tasks,
         [
             1,
             11,
             21,
             31,
             41,
             51,
             61,
             71,
             81,
             91,
             101,
             2,
             12,
             22,
             32,
             42,
             52,
             62,
             72,
             82,
             92,
             102,
         ],
     )
 
 
 @patch("swh.scheduler.cli.utils.TASK_BATCH_SIZE", 3)
 @patch("swh.scheduler.cli_utils.TASK_BATCH_SIZE", 3)
 def test_cli_origin_metadata_reindex_filter_one_tool(
     cli_runner, swh_config, indexer_scheduler, idx_storage, storage
 ):
     """Tests the re-indexing when origin_batch_size*task_batch_size is a
     divisor of nb_origins."""
     tool_ids = fill_idx_storage(idx_storage, 110)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "schedule",
             "reindex_origin_metadata",
             "--tool-id",
             str(tool_ids[0]),
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = (
         "Scheduled 3 tasks (30 origins).\n"
         "Scheduled 6 tasks (55 origins).\n"
         "Done.\n"
     )
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks()
     assert len(tasks) == 6
     _assert_tasks_for_origins(tasks, [x * 2 for x in range(55)])
 
 
 def now():
     return datetime.datetime.now(tz=datetime.timezone.utc)
 
 
 def test_cli_journal_client_schedule(
     cli_runner,
     swh_config,
     indexer_scheduler,
     kafka_prefix: str,
     kafka_server,
     consumer: Consumer,
 ):
     """Test the 'swh indexer journal-client' cli tool."""
     journal_writer = get_journal_writer(
         "kafka",
         brokers=[kafka_server],
         prefix=kafka_prefix,
         client_id="test producer",
         value_sanitizer=lambda object_type, value: value,
         flush_timeout=3,  # fail early if something is going wrong
     )
 
     visit_statuses = [
         OriginVisitStatus(
             origin="file:///dev/zero",
             visit=1,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(
             origin="file:///dev/foobar",
             visit=2,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(
             origin="file:///tmp/spamegg",
             visit=3,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(
             origin="file:///dev/0002",
             visit=6,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(  # will be filtered out due to its 'partial' status
             origin="file:///dev/0000",
             visit=4,
             date=now(),
             status="partial",
             snapshot=None,
         ),
         OriginVisitStatus(  # will be filtered out due to its 'ongoing' status
             origin="file:///dev/0001",
             visit=5,
             date=now(),
             status="ongoing",
             snapshot=None,
         ),
     ]
 
     journal_writer.write_additions("origin_visit_status", visit_statuses)
     visit_statuses_full = [vs for vs in visit_statuses if vs.status == "full"]
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "journal-client",
             "--broker",
             kafka_server,
             "--prefix",
             kafka_prefix,
             "--group-id",
             "test-consumer",
             "--stop-after-objects",
             len(visit_statuses),
             "--origin-metadata-task-type",
             "index-origin-metadata",
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Done.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     # Check scheduled tasks
     tasks = indexer_scheduler.search_tasks(task_type="index-origin-metadata")
 
     # This can be split into multiple tasks but no more than the origin-visit-statuses
     # written in the journal
     assert len(tasks) <= len(visit_statuses_full)
 
     actual_origins = []
     for task in tasks:
         actual_task = dict(task)
         assert actual_task["type"] == "index-origin-metadata"
         scheduled_origins = actual_task["arguments"]["args"][0]
         actual_origins.extend(scheduled_origins)
 
     assert set(actual_origins) == {vs.origin for vs in visit_statuses_full}
 
 
 def test_cli_journal_client_without_brokers(
     cli_runner, swh_config, kafka_prefix: str, kafka_server, consumer: Consumer
 ):
     """Without brokers configuration, the cli fails."""
 
     with pytest.raises(ValueError, match="brokers"):
         cli_runner.invoke(
             indexer_cli_group,
             [
                 "-C",
                 swh_config,
                 "journal-client",
             ],
             catch_exceptions=False,
         )
 
 
 @pytest.mark.parametrize("indexer_name", ["origin_intrinsic_metadata", "*"])
 def test_cli_journal_client_index__origin_intrinsic_metadata(
     cli_runner,
     swh_config,
     kafka_prefix: str,
     kafka_server,
     consumer: Consumer,
     idx_storage,
     storage,
     mocker,
     swh_indexer_config,
     indexer_name: str,
 ):
     """Test the 'swh indexer journal-client' cli tool."""
     journal_writer = get_journal_writer(
         "kafka",
         brokers=[kafka_server],
         prefix=kafka_prefix,
         client_id="test producer",
         value_sanitizer=lambda object_type, value: value,
         flush_timeout=3,  # fail early if something is going wrong
     )
 
     visit_statuses = [
         OriginVisitStatus(
             origin="file:///dev/zero",
             visit=1,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(
             origin="file:///dev/foobar",
             visit=2,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(
             origin="file:///tmp/spamegg",
             visit=3,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(
             origin="file:///dev/0002",
             visit=6,
             date=now(),
             status="full",
             snapshot=None,
         ),
         OriginVisitStatus(  # will be filtered out due to its 'partial' status
             origin="file:///dev/0000",
             visit=4,
             date=now(),
             status="partial",
             snapshot=None,
         ),
         OriginVisitStatus(  # will be filtered out due to its 'ongoing' status
             origin="file:///dev/0001",
             visit=5,
             date=now(),
             status="ongoing",
             snapshot=None,
         ),
     ]
 
     journal_writer.write_additions("origin_visit_status", visit_statuses)
     visit_statuses_full = [vs for vs in visit_statuses if vs.status == "full"]
     storage.revision_add([REVISION])
 
     mocker.patch(
         "swh.indexer.metadata.get_head_swhid",
         return_value=REVISION.swhid(),
     )
 
     mocker.patch(
         "swh.indexer.metadata.DirectoryMetadataIndexer.index",
         return_value=[
             DirectoryIntrinsicMetadataRow(
                 id=DIRECTORY2.id,
                 indexer_configuration_id=1,
                 mappings=["cff"],
                 metadata={"foo": "bar"},
             )
         ],
     )
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "journal-client",
             indexer_name,
             "--broker",
             kafka_server,
             "--prefix",
             kafka_prefix,
             "--group-id",
             "test-consumer",
             "--stop-after-objects",
             len(visit_statuses),
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Done.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     results = idx_storage.origin_intrinsic_metadata_get(
         [status.origin for status in visit_statuses]
     )
     expected_results = [
         OriginIntrinsicMetadataRow(
             id=status.origin,
             from_directory=DIRECTORY2.id,
             tool={"id": 1, **swh_indexer_config["tools"]},
             mappings=["cff"],
             metadata={"foo": "bar"},
         )
         for status in sorted(visit_statuses_full, key=lambda r: r.origin)
     ]
     assert sorted(results, key=lambda r: r.id) == expected_results
 
 
 @pytest.mark.parametrize("indexer_name", ["extrinsic_metadata", "*"])
 def test_cli_journal_client_index__origin_extrinsic_metadata(
     cli_runner,
     swh_config,
     kafka_prefix: str,
     kafka_server,
     consumer: Consumer,
     idx_storage,
     storage,
     mocker,
     swh_indexer_config,
     indexer_name: str,
 ):
     """Test the 'swh indexer journal-client' cli tool."""
     journal_writer = get_journal_writer(
         "kafka",
         brokers=[kafka_server],
         prefix=kafka_prefix,
         client_id="test producer",
         value_sanitizer=lambda object_type, value: value,
         flush_timeout=3,  # fail early if something is going wrong
     )
 
     origin = Origin("http://example.org/repo.git")
     storage.origin_add([origin])
     raw_extrinsic_metadata = attr.evolve(REMD, target=origin.swhid())
     raw_extrinsic_metadata = attr.evolve(
         raw_extrinsic_metadata, id=raw_extrinsic_metadata.compute_hash()
     )
     journal_writer.write_additions("raw_extrinsic_metadata", [raw_extrinsic_metadata])
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "journal-client",
             indexer_name,
             "--broker",
             kafka_server,
             "--prefix",
             kafka_prefix,
             "--group-id",
             "test-consumer",
             "--stop-after-objects",
             1,
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Done.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     results = idx_storage.origin_extrinsic_metadata_get([origin.url])
     expected_results = [
         OriginExtrinsicMetadataRow(
             id=origin.url,
             from_remd_id=raw_extrinsic_metadata.id,
             tool={"id": 1, **swh_indexer_config["tools"]},
             mappings=["github"],
             metadata={
                 "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                 "type": "https://forgefed.org/ns#Repository",
                 "name": "test software",
             },
         )
     ]
     assert sorted(results, key=lambda r: r.id) == expected_results
 
 
 def test_cli_journal_client_index__content_mimetype(
     cli_runner,
     swh_config,
     kafka_prefix: str,
     kafka_server,
     consumer: Consumer,
     idx_storage,
     obj_storage,
     storage,
     mocker,
     swh_indexer_config,
 ):
     """Test the 'swh indexer journal-client' cli tool."""
     journal_writer = get_journal_writer(
         "kafka",
         brokers=[kafka_server],
         prefix=kafka_prefix,
         client_id="test producer",
         value_sanitizer=lambda object_type, value: value,
         flush_timeout=3,  # fail early if something is going wrong
     )
 
     contents = []
     expected_results = []
     content_ids = []
     for content_id, (raw_content, mimetypes, encoding) in RAW_CONTENTS.items():
         content = Content.from_data(raw_content)
         assert content_id == content.sha1
 
         contents.append(content)
         content_ids.append(content_id)
 
         # Older libmagic versions (e.g. buster: 1:5.35-4+deb10u2, bullseye: 1:5.39-3)
         # returns different results. This allows to deal with such a case when executing
         # tests on different environments machines (e.g. ci tox, ci debian, dev machine,
         # ...)
         all_mimetypes = mimetypes if isinstance(mimetypes, tuple) else [mimetypes]
 
         expected_results.extend(
             [
                 ContentMimetypeRow(
                     id=content.sha1,
                     tool={"id": 1, **swh_indexer_config["tools"]},
                     mimetype=mimetype,
                     encoding=encoding,
                 )
                 for mimetype in all_mimetypes
             ]
         )
 
     assert len(contents) == len(RAW_CONTENTS)
 
     journal_writer.write_additions("content", contents)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "journal-client",
             "content_mimetype",
             "--broker",
             kafka_server,
             "--prefix",
             kafka_prefix,
             "--group-id",
             "test-consumer",
             "--stop-after-objects",
             len(contents),
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Done.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     results = idx_storage.content_mimetype_get(content_ids)
     assert len(results) == len(contents)
     for result in results:
         assert result in expected_results
 
 
 def test_cli_journal_client_index__fossology_license(
     cli_runner,
     swh_config,
     kafka_prefix: str,
     kafka_server,
     consumer: Consumer,
     idx_storage,
     obj_storage,
     storage,
     mocker,
     swh_indexer_config,
 ):
     """Test the 'swh indexer journal-client' cli tool."""
 
     # Patch
     fossology_license.compute_license = mock_compute_license
 
     journal_writer = get_journal_writer(
         "kafka",
         brokers=[kafka_server],
         prefix=kafka_prefix,
         client_id="test producer",
         value_sanitizer=lambda object_type, value: value,
         flush_timeout=3,  # fail early if something is going wrong
     )
 
     tool = {"id": 1, **swh_indexer_config["tools"]}
 
     id0, id1, id2 = RAW_CONTENT_IDS
 
     contents = []
     content_ids = []
     expected_results = []
     for content_id, (raw_content, _, _) in RAW_CONTENTS.items():
         content = Content.from_data(raw_content)
         assert content_id == content.sha1
 
         contents.append(content)
         content_ids.append(content_id)
 
         expected_results.extend(
             [
                 ContentLicenseRow(id=content_id, tool=tool, license=license)
                 for license in SHA1_TO_LICENSES[content_id]
             ]
         )
 
     assert len(contents) == len(RAW_CONTENTS)
 
     journal_writer.write_additions("content", contents)
 
     result = cli_runner.invoke(
         indexer_cli_group,
         [
             "-C",
             swh_config,
             "journal-client",
             "content_fossology_license",
             "--broker",
             kafka_server,
             "--prefix",
             kafka_prefix,
             "--group-id",
             "test-consumer",
             "--stop-after-objects",
             len(contents),
         ],
         catch_exceptions=False,
     )
 
     # Check the output
     expected_output = "Done.\n"
     assert result.exit_code == 0, result.output
     assert result.output == expected_output
 
     results = idx_storage.content_fossology_license_get(content_ids)
     assert len(results) == len(expected_results)
     for result in results:
         assert result in expected_results
diff --git a/swh/indexer/tests/test_codemeta.py b/swh/indexer/tests/test_codemeta.py
index 1829a70..6d394d4 100644
--- a/swh/indexer/tests/test_codemeta.py
+++ b/swh/indexer/tests/test_codemeta.py
@@ -1,298 +1,270 @@
 # Copyright (C) 2018-2020  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
-import pytest
-
-from swh.indexer.codemeta import CROSSWALK_TABLE, merge_documents, merge_values
+from swh.indexer.codemeta import CROSSWALK_TABLE, merge_documents
 
 
 def test_crosstable():
-    assert CROSSWALK_TABLE["NodeJS"] == {
+    assert {k: str(v) for (k, v) in CROSSWALK_TABLE["NodeJS"].items()} == {
         "repository": "http://schema.org/codeRepository",
         "os": "http://schema.org/operatingSystem",
         "cpu": "http://schema.org/processorRequirements",
         "engines": "http://schema.org/runtimePlatform",
         "author": "http://schema.org/author",
         "author.email": "http://schema.org/email",
         "author.name": "http://schema.org/name",
         "contributors": "http://schema.org/contributor",
         "keywords": "http://schema.org/keywords",
         "license": "http://schema.org/license",
         "version": "http://schema.org/version",
         "description": "http://schema.org/description",
         "name": "http://schema.org/name",
         "bugs": "https://codemeta.github.io/terms/issueTracker",
         "homepage": "http://schema.org/url",
     }
 
 
-def test_merge_values():
-    assert merge_values("a", "b") == ["a", "b"]
-    assert merge_values(["a", "b"], "c") == ["a", "b", "c"]
-    assert merge_values("a", ["b", "c"]) == ["a", "b", "c"]
-
-    assert merge_values({"@list": ["a"]}, {"@list": ["b"]}) == {"@list": ["a", "b"]}
-    assert merge_values({"@list": ["a", "b"]}, {"@list": ["c"]}) == {
-        "@list": ["a", "b", "c"]
-    }
-
-    with pytest.raises(ValueError):
-        merge_values({"@list": ["a"]}, "b")
-    with pytest.raises(ValueError):
-        merge_values("a", {"@list": ["b"]})
-    with pytest.raises(ValueError):
-        merge_values({"@list": ["a"]}, ["b"])
-    with pytest.raises(ValueError):
-        merge_values(["a"], {"@list": ["b"]})
-
-    assert merge_values("a", None) == "a"
-    assert merge_values(["a", "b"], None) == ["a", "b"]
-    assert merge_values(None, ["b", "c"]) == ["b", "c"]
-    assert merge_values({"@list": ["a"]}, None) == {"@list": ["a"]}
-    assert merge_values(None, {"@list": ["a"]}) == {"@list": ["a"]}
-
-
 def test_merge_documents():
     """
     Test the creation of a coherent minimal metadata set
     """
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "name": "test_1",
             "version": "0.0.2",
             "description": "Simple package.json test for indexer",
             "codeRepository": "git+https://github.com/moranegg/metadata_test",
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "name": "test_0_1",
             "version": "0.0.2",
             "description": "Simple package.json test for indexer",
             "codeRepository": "git+https://github.com/moranegg/metadata_test",
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "name": "test_metadata",
             "version": "0.0.2",
             "author": {
                 "type": "Person",
                 "name": "moranegg",
             },
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "version": "0.0.2",
         "description": "Simple package.json test for indexer",
         "name": ["test_1", "test_0_1", "test_metadata"],
         "author": [{"type": "Person", "name": "moranegg"}],
         "codeRepository": "git+https://github.com/moranegg/metadata_test",
     }
     assert results == expected_results
 
 
 def test_merge_documents_ids():
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "id": "http://example.org/test1",
             "name": "test_1",
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "id": "http://example.org/test2",
             "name": "test_2",
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "id": "http://example.org/test1",
         "schema:sameAs": "http://example.org/test2",
         "name": ["test_1", "test_2"],
     }
     assert results == expected_results
 
 
 def test_merge_documents_duplicate_ids():
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "id": "http://example.org/test1",
             "name": "test_1",
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "id": "http://example.org/test1",
             "name": "test_1b",
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "id": "http://example.org/test2",
             "name": "test_2",
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "id": "http://example.org/test1",
         "schema:sameAs": "http://example.org/test2",
         "name": ["test_1", "test_1b", "test_2"],
     }
     assert results == expected_results
 
 
 def test_merge_documents_lists():
     """Tests merging two @list elements."""
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {
                 "@list": [
                     {"name": "test_1"},
                 ]
             },
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {
                 "@list": [
                     {"name": "test_2"},
                 ]
             },
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "author": [
             {"name": "test_1"},
             {"name": "test_2"},
         ],
     }
     assert results == expected_results
 
 
 def test_merge_documents_lists_duplicates():
     """Tests merging two @list elements with a duplicate subelement."""
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {
                 "@list": [
                     {"name": "test_1"},
                 ]
             },
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {
                 "@list": [
                     {"name": "test_2"},
                     {"name": "test_1"},
                 ]
             },
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "author": [
             {"name": "test_1"},
             {"name": "test_2"},
         ],
     }
     assert results == expected_results
 
 
 def test_merge_documents_list_left():
     """Tests merging a singleton with an @list."""
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {"name": "test_1"},
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {
                 "@list": [
                     {"name": "test_2"},
                 ]
             },
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "author": [
             {"name": "test_1"},
             {"name": "test_2"},
         ],
     }
     assert results == expected_results
 
 
 def test_merge_documents_list_right():
     """Tests merging an @list with a singleton."""
     # given
     metadata_list = [
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {
                 "@list": [
                     {"name": "test_1"},
                 ]
             },
         },
         {
             "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
             "author": {"name": "test_2"},
         },
     ]
 
     # when
     results = merge_documents(metadata_list)
 
     # then
     expected_results = {
         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
         "author": [
             {"name": "test_1"},
             {"name": "test_2"},
         ],
     }
     assert results == expected_results
diff --git a/swh/indexer/tests/test_origin_metadata.py b/swh/indexer/tests/test_origin_metadata.py
index 4f6df9a..567f479 100644
--- a/swh/indexer/tests/test_origin_metadata.py
+++ b/swh/indexer/tests/test_origin_metadata.py
@@ -1,356 +1,356 @@
 # Copyright (C) 2018-2020  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import copy
 from unittest.mock import patch
 
 import pytest
 
 from swh.indexer.metadata import OriginMetadataIndexer
 from swh.indexer.storage.interface import IndexerStorageInterface
 from swh.indexer.storage.model import (
     DirectoryIntrinsicMetadataRow,
     OriginIntrinsicMetadataRow,
 )
 from swh.model.model import Origin
 from swh.storage.interface import StorageInterface
 
 from .test_metadata import TRANSLATOR_TOOL
 from .utils import DIRECTORY2, YARN_PARSER_METADATA
 
 
 @pytest.fixture
 def swh_indexer_config(swh_indexer_config):
     """Override the default configuration to override the tools entry"""
     cfg = copy.deepcopy(swh_indexer_config)
     cfg["tools"] = TRANSLATOR_TOOL
     return cfg
 
 
 def test_origin_metadata_indexer_release(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     origin = "https://npm.example.org/yarn-parser"
     indexer.run([origin])
 
     tool = swh_indexer_config["tools"]
 
     dir_id = DIRECTORY2.id
     dir_metadata = DirectoryIntrinsicMetadataRow(
         id=dir_id,
         tool=tool,
         metadata=YARN_PARSER_METADATA,
         mappings=["npm"],
     )
     origin_metadata = OriginIntrinsicMetadataRow(
         id=origin,
         tool=tool,
         from_directory=dir_id,
         metadata=YARN_PARSER_METADATA,
         mappings=["npm"],
     )
 
     dir_results = list(idx_storage.directory_intrinsic_metadata_get([dir_id]))
     for dir_result in dir_results:
         assert dir_result.tool
         del dir_result.tool["id"]
     assert dir_results == [dir_metadata]
 
     orig_results = list(idx_storage.origin_intrinsic_metadata_get([origin]))
     for orig_result in orig_results:
         assert orig_result.tool
         del orig_result.tool["id"]
     assert orig_results == [origin_metadata]
 
 
 def test_origin_metadata_indexer_revision(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     origin = "https://github.com/librariesio/yarn-parser"
     indexer.run([origin])
 
     tool = swh_indexer_config["tools"]
 
     dir_id = DIRECTORY2.id
     dir_metadata = DirectoryIntrinsicMetadataRow(
         id=dir_id,
         tool=tool,
         metadata=YARN_PARSER_METADATA,
         mappings=["npm"],
     )
     origin_metadata = OriginIntrinsicMetadataRow(
         id=origin,
         tool=tool,
         from_directory=dir_id,
         metadata=YARN_PARSER_METADATA,
         mappings=["npm"],
     )
 
     dir_results = list(idx_storage.directory_intrinsic_metadata_get([dir_id]))
     for dir_result in dir_results:
         assert dir_result.tool
         del dir_result.tool["id"]
     assert dir_results == [dir_metadata]
 
     orig_results = list(idx_storage.origin_intrinsic_metadata_get([origin]))
     for orig_result in orig_results:
         assert orig_result.tool
         del orig_result.tool["id"]
     assert orig_results == [origin_metadata]
 
 
 def test_origin_metadata_indexer_duplicate_origin(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     indexer.storage = storage
     indexer.idx_storage = idx_storage
     indexer.run(["https://github.com/librariesio/yarn-parser"])
     indexer.run(["https://github.com/librariesio/yarn-parser"] * 2)
 
     origin = "https://github.com/librariesio/yarn-parser"
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert len(dir_results) == 1
 
     orig_results = list(indexer.idx_storage.origin_intrinsic_metadata_get([origin]))
     assert len(orig_results) == 1
 
 
 def test_origin_metadata_indexer_missing_head(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
     storage.origin_add([Origin(url="https://example.com")])
 
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     indexer.run(["https://example.com"])
 
     origin = "https://example.com"
 
     results = list(indexer.idx_storage.origin_intrinsic_metadata_get([origin]))
     assert results == []
 
 
 def test_origin_metadata_indexer_partial_missing_head(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
 
     origin1 = "https://example.com"
     origin2 = "https://github.com/librariesio/yarn-parser"
     storage.origin_add([Origin(url=origin1)])
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     indexer.run([origin1, origin2])
 
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert dir_results == [
         DirectoryIntrinsicMetadataRow(
             id=dir_id,
             metadata=YARN_PARSER_METADATA,
             mappings=["npm"],
             tool=dir_results[0].tool,
         )
     ]
 
     orig_results = list(
         indexer.idx_storage.origin_intrinsic_metadata_get([origin1, origin2])
     )
     for orig_result in orig_results:
         assert orig_results == [
             OriginIntrinsicMetadataRow(
                 id=origin2,
                 from_directory=dir_id,
                 metadata=YARN_PARSER_METADATA,
                 mappings=["npm"],
                 tool=orig_results[0].tool,
             )
         ]
 
 
 def test_origin_metadata_indexer_duplicate_directory(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     indexer.storage = storage
     indexer.idx_storage = idx_storage
     indexer.catch_exceptions = False
     origin1 = "https://github.com/librariesio/yarn-parser"
     origin2 = "https://github.com/librariesio/yarn-parser.git"
     indexer.run([origin1, origin2])
 
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert len(dir_results) == 1
 
     orig_results = list(
         indexer.idx_storage.origin_intrinsic_metadata_get([origin1, origin2])
     )
     assert len(orig_results) == 2
 
 
 def test_origin_metadata_indexer_no_metadata_file(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
 
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     origin = "https://github.com/librariesio/yarn-parser"
     with patch("swh.indexer.metadata_dictionary.npm.NpmMapping.filename", b"foo.json"):
         indexer.run([origin])
 
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert dir_results == []
 
     orig_results = list(indexer.idx_storage.origin_intrinsic_metadata_get([origin]))
     assert orig_results == []
 
 
 def test_origin_metadata_indexer_no_metadata(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
 
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     origin = "https://github.com/librariesio/yarn-parser"
     with patch(
         "swh.indexer.metadata.DirectoryMetadataIndexer"
         ".translate_directory_intrinsic_metadata",
         return_value=(["npm"], {"@context": "foo"}),
     ):
         indexer.run([origin])
 
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert dir_results == []
 
     orig_results = list(indexer.idx_storage.origin_intrinsic_metadata_get([origin]))
     assert orig_results == []
 
 
 @pytest.mark.parametrize("catch_exceptions", [True, False])
 def test_origin_metadata_indexer_directory_error(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
     sentry_events,
     catch_exceptions,
 ) -> None:
 
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     origin = "https://github.com/librariesio/yarn-parser"
 
     indexer.catch_exceptions = catch_exceptions
 
     with patch(
         "swh.indexer.metadata.DirectoryMetadataIndexer"
         ".translate_directory_intrinsic_metadata",
         return_value=None,
     ):
         indexer.run([origin])
 
     assert len(sentry_events) == 1
     sentry_event = sentry_events.pop()
     assert sentry_event.get("tags") == {
         "swh-indexer-origin-head-swhid": (
-            "swh:1:rev:179fd041d75edab00feba8e4439897422f3bdfa1"
+            "swh:1:rev:a78410ce2f78f5078fd4ee7edb8c82c02a4a712c"
         ),
         "swh-indexer-origin-url": origin,
     }
     assert "'TypeError'" in str(sentry_event)
 
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert dir_results == []
 
     orig_results = list(indexer.idx_storage.origin_intrinsic_metadata_get([origin]))
     assert orig_results == []
 
 
 @pytest.mark.parametrize("catch_exceptions", [True, False])
 def test_origin_metadata_indexer_content_exception(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
     sentry_events,
     catch_exceptions,
 ) -> None:
 
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     origin = "https://github.com/librariesio/yarn-parser"
 
     indexer.catch_exceptions = catch_exceptions
 
     class TestException(Exception):
         pass
 
     with patch(
         "swh.indexer.metadata.ContentMetadataRow",
         side_effect=TestException(),
     ):
         indexer.run([origin])
 
     assert len(sentry_events) == 1
     sentry_event = sentry_events.pop()
     assert sentry_event.get("tags") == {
-        "swh-indexer-content-sha1": "d8f40c3ca9cc30ddaca25c55b5dff18271ff030e",
+        "swh-indexer-content-sha1": "df9d3bcc0158faa446bd1af225f8e2e4afa576d7",
         "swh-indexer-origin-head-swhid": (
-            "swh:1:rev:179fd041d75edab00feba8e4439897422f3bdfa1"
+            "swh:1:rev:a78410ce2f78f5078fd4ee7edb8c82c02a4a712c"
         ),
         "swh-indexer-origin-url": origin,
     }
     assert ".TestException'" in str(sentry_event), sentry_event
 
     dir_id = DIRECTORY2.id
 
     dir_results = list(indexer.idx_storage.directory_intrinsic_metadata_get([dir_id]))
     assert dir_results == []
 
     orig_results = list(indexer.idx_storage.origin_intrinsic_metadata_get([origin]))
     assert orig_results == []
 
 
 def test_origin_metadata_indexer_unknown_origin(
     swh_indexer_config,
     idx_storage: IndexerStorageInterface,
     storage: StorageInterface,
     obj_storage,
 ) -> None:
 
     indexer = OriginMetadataIndexer(config=swh_indexer_config)
     result = indexer.index_list([Origin("https://unknown.org/foo")])
     assert not result
diff --git a/swh/indexer/tests/utils.py b/swh/indexer/tests/utils.py
index db0ee95..7938cdc 100644
--- a/swh/indexer/tests/utils.py
+++ b/swh/indexer/tests/utils.py
@@ -1,774 +1,761 @@
 # Copyright (C) 2017-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import abc
 import datetime
 import functools
 from typing import Any, Dict, List, Tuple
 import unittest
 
 from hypothesis import strategies
 
 from swh.core.api.classes import stream_results
 from swh.indexer.storage import INDEXER_CFG_KEY
 from swh.model.hashutil import hash_to_bytes
 from swh.model.model import (
     Content,
     Directory,
     DirectoryEntry,
     ObjectType,
     Origin,
     OriginVisit,
     OriginVisitStatus,
     Person,
     Release,
     Revision,
     RevisionType,
     Snapshot,
     SnapshotBranch,
     TargetType,
     TimestampWithTimezone,
 )
 from swh.storage.utils import now
 
 BASE_TEST_CONFIG: Dict[str, Dict[str, Any]] = {
     "storage": {"cls": "memory"},
     "objstorage": {"cls": "memory"},
     INDEXER_CFG_KEY: {"cls": "memory"},
 }
 
 ORIGIN_VISITS = [
     {"type": "git", "origin": "https://github.com/SoftwareHeritage/swh-storage"},
     {"type": "ftp", "origin": "rsync://ftp.gnu.org/gnu/3dldf"},
     {
         "type": "deposit",
         "origin": "https://forge.softwareheritage.org/source/jesuisgpl/",
     },
     {
         "type": "pypi",
         "origin": "https://old-pypi.example.org/project/limnoria/",
     },  # with rev head
     {"type": "pypi", "origin": "https://pypi.org/project/limnoria/"},  # with rel head
     {"type": "svn", "origin": "http://0-512-md.googlecode.com/svn/"},
     {"type": "git", "origin": "https://github.com/librariesio/yarn-parser"},
     {"type": "git", "origin": "https://github.com/librariesio/yarn-parser.git"},
     {"type": "git", "origin": "https://npm.example.org/yarn-parser"},
 ]
 
 ORIGINS = [Origin(url=visit["origin"]) for visit in ORIGIN_VISITS]
 
 OBJ_STORAGE_RAW_CONTENT: Dict[str, bytes] = {
     "text:some": b"this is some text",
     "text:another": b"another text",
     "text:yet": b"yet another text",
     "python:code": b"""
     import unittest
     import logging
     from swh.indexer.mimetype import MimetypeIndexer
     from swh.indexer.tests.test_utils import MockObjStorage
 
     class MockStorage():
         def content_mimetype_add(self, mimetypes):
             self.state = mimetypes
 
         def indexer_configuration_add(self, tools):
             return [{
                 'id': 10,
             }]
     """,
     "c:struct": b"""
         #ifndef __AVL__
         #define __AVL__
 
         typedef struct _avl_tree avl_tree;
 
         typedef struct _data_t {
           int content;
         } data_t;
     """,
     "lisp:assertion": b"""
     (should 'pygments (recognize 'lisp 'easily))
 
     """,
     "json:test-metadata-package.json": b"""
     {
         "name": "test_metadata",
         "version": "0.0.1",
         "description": "Simple package.json test for indexer",
         "repository": {
           "type": "git",
           "url": "https://github.com/moranegg/metadata_test"
       }
     }
     """,
     "json:npm-package.json": b"""
     {
       "version": "5.0.3",
       "name": "npm",
       "description": "a package manager for JavaScript",
-      "keywords": [
-        "install",
-        "modules",
-        "package manager",
-        "package.json"
-      ],
       "preferGlobal": true,
       "config": {
         "publishtest": false
       },
       "homepage": "https://docs.npmjs.com/",
       "author": "Isaac Z. Schlueter <i@izs.me> (http://blog.izs.me)",
       "repository": {
         "type": "git",
         "url": "https://github.com/npm/npm"
       },
       "bugs": {
         "url": "https://github.com/npm/npm/issues"
       },
       "dependencies": {
         "JSONStream": "~1.3.1",
         "abbrev": "~1.1.0",
         "ansi-regex": "~2.1.1",
         "ansicolors": "~0.3.2",
         "ansistyles": "~0.1.3"
       },
       "devDependencies": {
         "tacks": "~1.2.6",
         "tap": "~10.3.2"
       },
       "license": "Artistic-2.0"
     }
 
     """,
     "text:carriage-return": b"""
     """,
     "text:empty": b"",
     # was 626364 / b'bcd'
     "text:unimportant": b"unimportant content for bcd",
     # was 636465 / b'cde' now yarn-parser package.json
     "json:yarn-parser-package.json": b"""
     {
       "name": "yarn-parser",
       "version": "1.0.0",
       "description": "Tiny web service for parsing yarn.lock files",
       "main": "index.js",
       "scripts": {
         "start": "node index.js",
         "test": "mocha"
       },
       "engines": {
         "node": "9.8.0"
       },
       "repository": {
         "type": "git",
         "url": "git+https://github.com/librariesio/yarn-parser.git"
       },
-      "keywords": [
-        "yarn",
-        "parse",
-        "lock",
-        "dependencies"
-      ],
       "author": "Andrew Nesbitt",
       "license": "AGPL-3.0",
       "bugs": {
         "url": "https://github.com/librariesio/yarn-parser/issues"
       },
       "homepage": "https://github.com/librariesio/yarn-parser#readme",
       "dependencies": {
         "@yarnpkg/lockfile": "^1.0.0",
         "body-parser": "^1.15.2",
         "express": "^4.14.0"
       },
       "devDependencies": {
         "chai": "^4.1.2",
         "mocha": "^5.2.0",
         "request": "^2.87.0",
         "test": "^0.6.0"
       }
     }
 
 """,
 }
 
 MAPPING_DESCRIPTION_CONTENT_SHA1GIT: Dict[str, bytes] = {}
 MAPPING_DESCRIPTION_CONTENT_SHA1: Dict[str, bytes] = {}
 OBJ_STORAGE_DATA: Dict[bytes, bytes] = {}
 
 for key_description, data in OBJ_STORAGE_RAW_CONTENT.items():
     content = Content.from_data(data)
     MAPPING_DESCRIPTION_CONTENT_SHA1GIT[key_description] = content.sha1_git
     MAPPING_DESCRIPTION_CONTENT_SHA1[key_description] = content.sha1
     OBJ_STORAGE_DATA[content.sha1] = data
 
 
 RAW_CONTENT_METADATA = [
     (
         "du français".encode(),
         "text/plain",
         "utf-8",
     ),
     (
         b"def __init__(self):",
         ("text/x-python", "text/x-script.python"),
         "us-ascii",
     ),
     (
         b"\xff\xfe\x00\x00\x00\x00\xff\xfe\xff\xff",
         "application/octet-stream",
         "",
     ),
 ]
 
 RAW_CONTENTS: Dict[bytes, Tuple] = {}
 RAW_CONTENT_IDS: List[bytes] = []
 
 for index, raw_content_d in enumerate(RAW_CONTENT_METADATA):
     raw_content = raw_content_d[0]
     content = Content.from_data(raw_content)
     RAW_CONTENTS[content.sha1] = raw_content_d
     RAW_CONTENT_IDS.append(content.sha1)
     # and write it to objstorage data so it's flushed in the objstorage
     OBJ_STORAGE_DATA[content.sha1] = raw_content
 
 
 SHA1_TO_LICENSES: Dict[bytes, List[str]] = {
     RAW_CONTENT_IDS[0]: ["GPL"],
     RAW_CONTENT_IDS[1]: ["AGPL"],
     RAW_CONTENT_IDS[2]: [],
 }
 
 
 DIRECTORY = Directory(
     entries=(
         DirectoryEntry(
             name=b"index.js",
             type="file",
             target=MAPPING_DESCRIPTION_CONTENT_SHA1GIT["text:some"],
             perms=0o100644,
         ),
         DirectoryEntry(
             name=b"package.json",
             type="file",
             target=MAPPING_DESCRIPTION_CONTENT_SHA1GIT[
                 "json:test-metadata-package.json"
             ],
             perms=0o100644,
         ),
         DirectoryEntry(
             name=b".github",
             type="dir",
             target=Directory(entries=()).id,
             perms=0o040000,
         ),
     ),
 )
 
 DIRECTORY2 = Directory(
     entries=(
         DirectoryEntry(
             name=b"package.json",
             type="file",
             target=MAPPING_DESCRIPTION_CONTENT_SHA1GIT["json:yarn-parser-package.json"],
             perms=0o100644,
         ),
     ),
 )
 
 _utc_plus_2 = datetime.timezone(datetime.timedelta(minutes=120))
 
 REVISION = Revision(
     message=b"Improve search functionality",
     author=Person(
         name=b"Andrew Nesbitt",
         fullname=b"Andrew Nesbitt <andrewnez@gmail.com>",
         email=b"andrewnez@gmail.com",
     ),
     committer=Person(
         name=b"Andrew Nesbitt",
         fullname=b"Andrew Nesbitt <andrewnez@gmail.com>",
         email=b"andrewnez@gmail.com",
     ),
     committer_date=TimestampWithTimezone.from_datetime(
         datetime.datetime(2013, 10, 4, 12, 50, 49, tzinfo=_utc_plus_2)
     ),
     type=RevisionType.GIT,
     synthetic=False,
     date=TimestampWithTimezone.from_datetime(
         datetime.datetime(2017, 2, 20, 16, 14, 16, tzinfo=_utc_plus_2)
     ),
     directory=DIRECTORY2.id,
     parents=(),
 )
 
 REVISIONS = [REVISION]
 
 RELEASE = Release(
     name=b"v0.0.0",
     message=None,
     author=Person(
         name=b"Andrew Nesbitt",
         fullname=b"Andrew Nesbitt <andrewnez@gmail.com>",
         email=b"andrewnez@gmail.com",
     ),
     synthetic=False,
     date=TimestampWithTimezone.from_datetime(
         datetime.datetime(2017, 2, 20, 16, 14, 16, tzinfo=_utc_plus_2)
     ),
     target_type=ObjectType.DIRECTORY,
     target=DIRECTORY2.id,
 )
 
 RELEASES = [RELEASE]
 
 SNAPSHOTS = [
     # https://github.com/SoftwareHeritage/swh-storage
     Snapshot(
         branches={
             b"refs/heads/add-revision-origin-cache": SnapshotBranch(
                 target=b'L[\xce\x1c\x88\x8eF\t\xf1"\x19\x1e\xfb\xc0s\xe7/\xe9l\x1e',
                 target_type=TargetType.REVISION,
             ),
             b"refs/head/master": SnapshotBranch(
                 target=b"8K\x12\x00d\x03\xcc\xe4]bS\xe3\x8f{\xd7}\xac\xefrm",
                 target_type=TargetType.REVISION,
             ),
             b"HEAD": SnapshotBranch(
                 target=b"refs/head/master", target_type=TargetType.ALIAS
             ),
             b"refs/tags/v0.0.103": SnapshotBranch(
                 target=b'\xb6"Im{\xfdLb\xb0\x94N\xea\x96m\x13x\x88+\x0f\xdd',
                 target_type=TargetType.RELEASE,
             ),
         },
     ),
     # rsync://ftp.gnu.org/gnu/3dldf
     Snapshot(
         branches={
             b"3DLDF-1.1.4.tar.gz": SnapshotBranch(
                 target=b'dJ\xfb\x1c\x91\xf4\x82B%]6\xa2\x90|\xd3\xfc"G\x99\x11',
                 target_type=TargetType.REVISION,
             ),
             b"3DLDF-2.0.2.tar.gz": SnapshotBranch(
                 target=b"\xb6\x0e\xe7\x9e9\xac\xaa\x19\x9e=\xd1\xc5\x00\\\xc6\xfc\xe0\xa6\xb4V",  # noqa
                 target_type=TargetType.REVISION,
             ),
             b"3DLDF-2.0.3-examples.tar.gz": SnapshotBranch(
                 target=b"!H\x19\xc0\xee\x82-\x12F1\xbd\x97\xfe\xadZ\x80\x80\xc1\x83\xff",  # noqa
                 target_type=TargetType.REVISION,
             ),
             b"3DLDF-2.0.3.tar.gz": SnapshotBranch(
                 target=b"\x8e\xa9\x8e/\xea}\x9feF\xf4\x9f\xfd\xee\xcc\x1a\xb4`\x8c\x8by",  # noqa
                 target_type=TargetType.REVISION,
             ),
             b"3DLDF-2.0.tar.gz": SnapshotBranch(
                 target=b"F6*\xff(?\x19a\xef\xb6\xc2\x1fv$S\xe3G\xd3\xd1m",
                 target_type=TargetType.REVISION,
             ),
         },
     ),
     # https://forge.softwareheritage.org/source/jesuisgpl/",
     Snapshot(
         branches={
             b"master": SnapshotBranch(
                 target=b"\xe7n\xa4\x9c\x9f\xfb\xb7\xf76\x11\x08{\xa6\xe9\x99\xb1\x9e]q\xeb",  # noqa
                 target_type=TargetType.REVISION,
             )
         },
     ),
     # https://old-pypi.example.org/project/limnoria/
     Snapshot(
         branches={
             b"HEAD": SnapshotBranch(
                 target=b"releases/2018.09.09", target_type=TargetType.ALIAS
             ),
             b"releases/2018.09.01": SnapshotBranch(
                 target=b"<\xee1(\xe8\x8d_\xc1\xc9\xa6rT\xf1\x1d\xbb\xdfF\xfdw\xcf",
                 target_type=TargetType.REVISION,
             ),
             b"releases/2018.09.09": SnapshotBranch(
                 target=b"\x83\xb9\xb6\xc7\x05\xb1%\xd0\xfem\xd8kA\x10\x9d\xc5\xfa2\xf8t",  # noqa
                 target_type=TargetType.REVISION,
             ),
         },
     ),
     # https://pypi.org/project/limnoria/
     Snapshot(
         branches={
             b"HEAD": SnapshotBranch(
                 target=b"releases/2018.09.09", target_type=TargetType.ALIAS
             ),
             b"releases/2018.09.01": SnapshotBranch(
                 target=b"<\xee1(\xe8\x8d_\xc1\xc9\xa6rT\xf1\x1d\xbb\xdfF\xfdw\xcf",
                 target_type=TargetType.RELEASE,
             ),
             b"releases/2018.09.09": SnapshotBranch(
                 target=b"\x83\xb9\xb6\xc7\x05\xb1%\xd0\xfem\xd8kA\x10\x9d\xc5\xfa2\xf8t",  # noqa
                 target_type=TargetType.RELEASE,
             ),
         },
     ),
     # http://0-512-md.googlecode.com/svn/
     Snapshot(
         branches={
             b"master": SnapshotBranch(
                 target=b"\xe4?r\xe1,\x88\xab\xec\xe7\x9a\x87\xb8\xc9\xad#.\x1bw=\x18",
                 target_type=TargetType.REVISION,
             )
         },
     ),
     # https://github.com/librariesio/yarn-parser
     Snapshot(
         branches={
             b"HEAD": SnapshotBranch(
                 target=REVISION.id,
                 target_type=TargetType.REVISION,
             )
         },
     ),
     # https://github.com/librariesio/yarn-parser.git
     Snapshot(
         branches={
             b"HEAD": SnapshotBranch(
                 target=REVISION.id,
                 target_type=TargetType.REVISION,
             )
         },
     ),
     # https://npm.example.org/yarn-parser
     Snapshot(
         branches={
             b"HEAD": SnapshotBranch(
                 target=RELEASE.id,
                 target_type=TargetType.RELEASE,
             )
         },
     ),
 ]
 
 assert len(SNAPSHOTS) == len(ORIGIN_VISITS)
 
 
 YARN_PARSER_METADATA = {
     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
     "url": "https://github.com/librariesio/yarn-parser#readme",
     "codeRepository": "git+git+https://github.com/librariesio/yarn-parser.git",
     "author": [{"type": "Person", "name": "Andrew Nesbitt"}],
     "license": "https://spdx.org/licenses/AGPL-3.0",
     "version": "1.0.0",
     "description": "Tiny web service for parsing yarn.lock files",
     "issueTracker": "https://github.com/librariesio/yarn-parser/issues",
     "name": "yarn-parser",
-    "keywords": ["yarn", "parse", "lock", "dependencies"],
     "type": "SoftwareSourceCode",
 }
 
 
 json_dict_keys = strategies.one_of(
     strategies.characters(),
     strategies.just("type"),
     strategies.just("url"),
     strategies.just("name"),
     strategies.just("email"),
     strategies.just("@id"),
     strategies.just("@context"),
     strategies.just("repository"),
     strategies.just("license"),
     strategies.just("repositories"),
     strategies.just("licenses"),
 )
 """Hypothesis strategy that generates strings, with an emphasis on those
 that are often used as dictionary keys in metadata files."""
 
 
 generic_json_document = strategies.recursive(
     strategies.none()
     | strategies.booleans()
     | strategies.floats()
     | strategies.characters(),
     lambda children: (
         strategies.lists(children, min_size=1)
         | strategies.dictionaries(json_dict_keys, children, min_size=1)
     ),
 )
 """Hypothesis strategy that generates possible values for values of JSON
 metadata files."""
 
 
 def json_document_strategy(keys=None):
     """Generates an hypothesis strategy that generates metadata files
     for a JSON-based format that uses the given keys."""
     if keys is None:
         keys = strategies.characters()
     else:
         keys = strategies.one_of(map(strategies.just, keys))
 
     return strategies.dictionaries(keys, generic_json_document, min_size=1)
 
 
 def _tree_to_xml(root, xmlns, data):
     def encode(s):
         "Skips unpaired surrogates generated by json_document_strategy"
         return s.encode("utf8", "replace")
 
     def to_xml(data, indent=b" "):
         if data is None:
             return b""
         elif isinstance(data, (bool, str, int, float)):
             return indent + encode(str(data))
         elif isinstance(data, list):
             return b"\n".join(to_xml(v, indent=indent) for v in data)
         elif isinstance(data, dict):
             lines = []
             for (key, value) in data.items():
                 lines.append(indent + encode("<{}>".format(key)))
                 lines.append(to_xml(value, indent=indent + b" "))
                 lines.append(indent + encode("</{}>".format(key)))
             return b"\n".join(lines)
         else:
             raise TypeError(data)
 
     return b"\n".join(
         [
             '<{} xmlns="{}">'.format(root, xmlns).encode(),
             to_xml(data),
             "</{}>".format(root).encode(),
         ]
     )
 
 
 class TreeToXmlTest(unittest.TestCase):
     def test_leaves(self):
         self.assertEqual(
             _tree_to_xml("root", "http://example.com", None),
             b'<root xmlns="http://example.com">\n\n</root>',
         )
         self.assertEqual(
             _tree_to_xml("root", "http://example.com", True),
             b'<root xmlns="http://example.com">\n True\n</root>',
         )
         self.assertEqual(
             _tree_to_xml("root", "http://example.com", "abc"),
             b'<root xmlns="http://example.com">\n abc\n</root>',
         )
         self.assertEqual(
             _tree_to_xml("root", "http://example.com", 42),
             b'<root xmlns="http://example.com">\n 42\n</root>',
         )
         self.assertEqual(
             _tree_to_xml("root", "http://example.com", 3.14),
             b'<root xmlns="http://example.com">\n 3.14\n</root>',
         )
 
     def test_dict(self):
         self.assertIn(
             _tree_to_xml("root", "http://example.com", {"foo": "bar", "baz": "qux"}),
             [
                 b'<root xmlns="http://example.com">\n'
                 b" <foo>\n  bar\n </foo>\n"
                 b" <baz>\n  qux\n </baz>\n"
                 b"</root>",
                 b'<root xmlns="http://example.com">\n'
                 b" <baz>\n  qux\n </baz>\n"
                 b" <foo>\n  bar\n </foo>\n"
                 b"</root>",
             ],
         )
 
     def test_list(self):
         self.assertEqual(
             _tree_to_xml(
                 "root",
                 "http://example.com",
                 [
                     {"foo": "bar"},
                     {"foo": "baz"},
                 ],
             ),
             b'<root xmlns="http://example.com">\n'
             b" <foo>\n  bar\n </foo>\n"
             b" <foo>\n  baz\n </foo>\n"
             b"</root>",
         )
 
 
 def xml_document_strategy(keys, root, xmlns):
     """Generates an hypothesis strategy that generates metadata files
     for an XML format that uses the given keys."""
 
     return strategies.builds(
         functools.partial(_tree_to_xml, root, xmlns), json_document_strategy(keys)
     )
 
 
 def filter_dict(d, keys):
     "return a copy of the dict with keys deleted"
     if not isinstance(keys, (list, tuple)):
         keys = (keys,)
     return dict((k, v) for (k, v) in d.items() if k not in keys)
 
 
 def fill_obj_storage(obj_storage):
     """Add some content in an object storage."""
     for obj_id, content in OBJ_STORAGE_DATA.items():
         obj_storage.add(content, obj_id)
 
 
 def fill_storage(storage):
     """Fill in storage with consistent test dataset."""
     storage.content_add([Content.from_data(data) for data in OBJ_STORAGE_DATA.values()])
     storage.directory_add([DIRECTORY, DIRECTORY2])
     storage.revision_add(REVISIONS)
     storage.release_add(RELEASES)
     storage.snapshot_add(SNAPSHOTS)
 
     storage.origin_add(ORIGINS)
     for visit, snapshot in zip(ORIGIN_VISITS, SNAPSHOTS):
         assert snapshot.id is not None
 
         visit = storage.origin_visit_add(
             [OriginVisit(origin=visit["origin"], date=now(), type=visit["type"])]
         )[0]
         visit_status = OriginVisitStatus(
             origin=visit.origin,
             visit=visit.visit,
             date=now(),
             status="full",
             snapshot=snapshot.id,
         )
         storage.origin_visit_status_add([visit_status])
 
 
 class CommonContentIndexerTest(metaclass=abc.ABCMeta):
     def get_indexer_results(self, ids):
         """Override this for indexers that don't have a mock storage."""
         return self.indexer.idx_storage.state
 
     def assert_results_ok(self, sha1s, expected_results=None):
         sha1s = [hash_to_bytes(sha1) for sha1 in sha1s]
         actual_results = list(self.get_indexer_results(sha1s))
 
         if expected_results is None:
             expected_results = self.expected_results
 
         # expected results may contain slightly duplicated results
         assert 0 < len(actual_results) <= len(expected_results)
         for result in actual_results:
             assert result in expected_results
 
     def test_index(self):
         """Known sha1 have their data indexed"""
         sha1s = [self.id0, self.id1, self.id2]
 
         # when
         self.indexer.run(sha1s)
 
         self.assert_results_ok(sha1s)
 
         # 2nd pass
         self.indexer.run(sha1s)
 
         self.assert_results_ok(sha1s)
 
     def test_index_one_unknown_sha1(self):
         """Unknown sha1s are not indexed"""
         sha1s = [
             self.id1,
             "799a5ef812c53907562fe379d4b3851e69c7cb15",  # unknown
             "800a5ef812c53907562fe379d4b3851e69c7cb15",  # unknown
         ]  # unknown
 
         # when
         self.indexer.run(sha1s)
 
         # then
         expected_results = [res for res in self.expected_results if res.id in sha1s]
 
         self.assert_results_ok(sha1s, expected_results)
 
 
 class CommonContentIndexerPartitionTest:
     """Allows to factorize tests on range indexer."""
 
     def setUp(self):
         self.contents = sorted(OBJ_STORAGE_DATA)
 
     def assert_results_ok(self, partition_id, nb_partitions, actual_results):
         expected_ids = [
             c.sha1
             for c in stream_results(
                 self.indexer.storage.content_get_partition,
                 partition_id=partition_id,
                 nb_partitions=nb_partitions,
             )
         ]
 
         actual_results = list(actual_results)
         for indexed_data in actual_results:
             _id = indexed_data.id
             assert _id in expected_ids
 
             _tool_id = indexed_data.indexer_configuration_id
             assert _tool_id == self.indexer.tool["id"]
 
     def test__index_contents(self):
         """Indexing contents without existing data results in indexed data"""
         partition_id = 0
         nb_partitions = 4
 
         actual_results = list(
             self.indexer._index_contents(partition_id, nb_partitions, indexed={})
         )
 
         self.assert_results_ok(partition_id, nb_partitions, actual_results)
 
     def test__index_contents_with_indexed_data(self):
         """Indexing contents with existing data results in less indexed data"""
         partition_id = 3
         nb_partitions = 4
 
         # first pass
         actual_results = list(
             self.indexer._index_contents(partition_id, nb_partitions, indexed={}),
         )
 
         self.assert_results_ok(partition_id, nb_partitions, actual_results)
 
         indexed_ids = {res.id for res in actual_results}
 
         actual_results = list(
             self.indexer._index_contents(
                 partition_id, nb_partitions, indexed=indexed_ids
             )
         )
 
         # already indexed, so nothing new
         assert actual_results == []
 
     def test_generate_content_get(self):
         """Optimal indexing should result in indexed data"""
         partition_id = 0
         nb_partitions = 1
 
         actual_results = self.indexer.run(
             partition_id, nb_partitions, skip_existing=False
         )
 
         assert actual_results["status"] == "eventful", actual_results
 
     def test_generate_content_get_no_result(self):
         """No result indexed returns False"""
         actual_results = self.indexer.run(1, 2**512, incremental=False)
 
         assert actual_results == {"status": "uneventful"}
 
 
 def mock_compute_license(path):
     """path is the content identifier"""
     if isinstance(id, bytes):
         path = path.decode("utf-8")
     # path is something like /tmp/tmpXXX/<sha1> so we keep only the sha1 part
     id_ = path.split("/")[-1]
     return {"licenses": SHA1_TO_LICENSES.get(hash_to_bytes(id_), [])}