Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F7124637
D994.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
D994.diff
View Options
diff --git a/docs/metadata-workflow.rst b/docs/metadata-workflow.rst
--- a/docs/metadata-workflow.rst
+++ b/docs/metadata-workflow.rst
@@ -11,6 +11,9 @@
multiple indexers, which coordinate with each other and save their results
at each step in the indexer storage.
+Indexer architecture
+--------------------
+
.. thumbnail:: images/tasks-metadata-indexers.svg
@@ -42,6 +45,7 @@
as `codemeta.json`, `package.json`, or `pom.xml`. If there are any, it
runs the Content Metadata Indexer on them, which in turn fetches their
contents and runs them through extraction dictionaries/mappings.
+See below for details.
Their results are saved in a database (the indexer storage), associated with
the content and revision hashes.
@@ -62,3 +66,33 @@
efficiently find out which origins matched the pattern.
Running that search on the `revision_metadata` table would require either
a reverse lookup from revisions to origins, which is costly.
+
+
+Translation from language-specific metadata to CodeMeta
+-------------------------------------------------------
+
+Intrinsic metadata are extracted from files provided with a project's source
+code, and translated using `CodeMeta`_'s `crosswalk table`_.
+
+All input formats supported so far are straightforward dictionaries (eg. JSON)
+or can be accessed as such (eg. XML); and the first part of the translation is
+to map their keys to a term in the CodeMeta vocabulary.
+This is done by parsing the crosswalk table's `CSV file`_ and using it as a
+map between these two vocabularies; and this does not require any
+format-specific code in the indexers.
+
+The second part is to normalize values. As language-specific metadata files
+each have their way(s) of formating these values, we need to turn them into
+the data type required by CodeMeta.
+This normalization makes up for most of the code of
+:py:mod:`swh.indexer.metadata_dictionary`.
+
+
+Supported intrinsic metadata
+----------------------------
+
+
+
+.. _CodeMeta: https://codemeta.github.io/
+.. _crosswalk table: https://codemeta.github.io/crosswalk/
+.. _CSV file: https://github.com/codemeta/codemeta/blob/master/crosswalk.csv
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Dec 21 2024, 3:16 PM (11 w, 4 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3222282
Attached To
D994: Document the metadata translation process and list supported metadata sources.
Event Timeline
Log In to Comment