Changeset View
Changeset View
Standalone View
Standalone View
docs/metadata-workflow.rst
Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines | |||||
------------------------ | ------------------------ | ||||
The following terms may be found in the output of the metadata translation | The following terms may be found in the output of the metadata translation | ||||
(other than the `codemeta` mapping, which is the identity function, and | (other than the `codemeta` mapping, which is the identity function, and | ||||
therefore supports all terms): | therefore supports all terms): | ||||
.. program-output:: python3 -m swh.indexer.cli mapping list-terms --exclude-mapping codemeta | .. program-output:: python3 -m swh.indexer.cli mapping list-terms --exclude-mapping codemeta | ||||
:nostderr: | :nostderr: | ||||
zack: Please use a different expression here than "mapping". As we have discussed in the past, it's… | |||||
Done Inline ActionsIndeed, I like it. vlorentz: Indeed, I like it. | |||||
Writing your own mapping | |||||
------------------------ | |||||
Done Inline ActionsI don't think this comment belongs there, a developer setup is a prerequisite and we don't want to have to mention it in every single piece of documentation we write. zack: I don't think this comment belongs there, a developer setup is a prerequisite and we don't want… | |||||
Done Inline ActionsIndeed. (I initially wrote this in the blog post draft before deciding to move it to the documentation) vlorentz: Indeed. (I initially wrote this in the blog post draft before deciding to move it to the… | |||||
First, follow the :ref:`developer-setup` to download our source code. | |||||
Done Inline Actionsthis anchor looks a bit fragile, maybe just use CodeMeta crosswalks instead ? zack: this anchor looks a bit fragile, maybe just use `CodeMeta crosswalks` instead ? | |||||
You should start by picking one of the `crosswalks made available by CodeMeta`_. | |||||
Done Inline Actionss/^Create/Then create/ zack: s/^Create/Then create/ | |||||
Create a new file in `swh-indexer/swh/indexer/metadata_dictionary/`, that | |||||
will contain your code, and create a new class that inherits from helper | |||||
classes, with some documentation about your indexer: | |||||
.. code-block:: python | |||||
from .base import DictMapping, SingleFileMapping | |||||
from swh.indexer.codemeta import CROSSWALK_TABLE | |||||
class MyMapping(DictMapping, SingleFileMapping): | |||||
"""Dedicated class for ...""" | |||||
name = 'my-mapping' | |||||
filename = b'the-filename' | |||||
mapping = CROSSWALK_TABLE['Name of the CodeMeta crosswalk'] | |||||
.. _crosswalks made available by CodeMeta: https://github.com/codemeta/codemeta/tree/master/crosswalks | |||||
Then, add a `string_fields` attribute, that is the list of all keys whose | |||||
Done Inline ActionsLink to the actual implementation of the PKG-INFO mapping here, so that people can compare the pseudo-code in this section with a real implementation. zack: Link to the actual implementation of the PKG-INFO mapping here, so that people can compare the… | |||||
values are simple text values. For instance, for the Python PKG-INFO mapping, | |||||
it's: | |||||
.. code-block:: python | |||||
string_fields = ['name', 'version', 'description', 'summary', | |||||
'author', 'author-email'] | |||||
Last step to get your mapping working: add a `translate` method that will | |||||
take a single byte string as argument, turn it into a Python dictionary, | |||||
whose keys are the ones of the input document, and pass it to | |||||
`_translate_dict`. | |||||
For instance, if the input document is in JSON, it can be as simple as: | |||||
.. code-block:: python | |||||
def translate(self, raw_content): | |||||
raw_content = raw_content.decode() # bytes to str | |||||
content_dict = json.loads(raw_content) # str to dict | |||||
return self._translate_dict(content_dict) # convert to CodeMeta | |||||
`_translate_dict` will do the heavy work of reading the crosswalk table for | |||||
each of `string_fields`, read the corresponding value in the `content_dict`, | |||||
and build a CodeMeta dictionary with the corresponding names from the | |||||
crosswalk table. | |||||
One last thing to run your mapping: add it to the list in | |||||
`swh-indexer/swh/indexer/metadata_dictionary/__init__.py`, so the rest of the | |||||
code is aware of it. | |||||
Now, you can run it: | |||||
.. code-block:: shell | |||||
python3 -m swh.indexer.metadata_dictionary MyMapping path/to/input/file | |||||
and it will (hopefully) returns a CodeMeta object. | |||||
If it works, well done! | |||||
You can now improve your mapping further, by adding methods that will do | |||||
more advanced conversion. For example, if there is a field named `license` | |||||
containing an SPDX identifier, you must convert it to an URI, like this: | |||||
.. code-block:: python | |||||
def normalize_license(self, s): | |||||
if isinstance(s, str): | |||||
return {"@id": "https://spdx.org/licenses/" + s} | |||||
This method will automatically get called by `_translate_dict` when it | |||||
Not Done Inline ActionsMaybe mentioning somewhere the mapping should be tested (prior to opening a diff ;). ardumont: Maybe mentioning somewhere the mapping should be tested (prior to opening a diff ;).
| |||||
Done Inline Actionsrather: "when it finds a license field in content_dict." zack: rather: "when it finds a `license` field in `content_dict`." | |||||
sees the `license` field in the `content_dict`. |
Please use a different expression here than "mapping". As we have discussed in the past, it's an implementation detail, not related to the goal that readers of this section will try to achieve.
A concrete suggestion is "Add support for additional ecosystem-specific metadata". ("Ecosystem" is an attempt at being more general than "language" or "package manager", YMMV.) Similarly, adapting the description to mention "mapping" only when it's really needed would be nice.