Changeset View
Changeset View
Standalone View
Standalone View
README.md
swh-indexer | swh-indexer | ||||
============ | ============ | ||||
Tools to compute multiple indexes on SWH's raw contents: | Tools to compute multiple indexes on SWH's raw contents: | ||||
- content: | - content: | ||||
- mimetype | - mimetype | ||||
- ctags | |||||
- language | |||||
- fossology-license | - fossology-license | ||||
- metadata | - metadata | ||||
- revision: | - origin: | ||||
- metadata | - metadata (intrinsic, using the content indexer; and extrinsic) | ||||
An indexer is in charge of: | An indexer is in charge of: | ||||
- looking up objects | - looking up objects | ||||
- extracting information from those objects | - extracting information from those objects | ||||
- store those information in the swh-indexer db | - store those information in the swh-indexer db | ||||
There are multiple indexers working on different object types: | There are multiple indexers working on different object types: | ||||
- content indexer: works with content sha1 hashes | - content indexer: works with content sha1 hashes | ||||
- revision indexer: works with revision sha1 hashes | - revision indexer: works with revision sha1 hashes | ||||
- origin indexer: works with origin identifiers | - origin indexer: works with origin identifiers | ||||
Indexation procedure: | Indexation procedure: | ||||
- receive batch of ids | - receive batch of ids | ||||
- retrieve the associated data depending on object type | - retrieve the associated data depending on object type | ||||
- compute for that object some index | - compute for that object some index | ||||
- store the result to swh's storage | - store the result to swh's storage | ||||
Current content indexers: | Current content indexers: | ||||
- mimetype (queue swh_indexer_content_mimetype): detect the encoding | - mimetype (queue swh_indexer_content_mimetype): detect the encoding | ||||
and mimetype | and mimetype | ||||
- language (queue swh_indexer_content_language): detect the | |||||
programming language | |||||
- ctags (queue swh_indexer_content_ctags): compute tags information | |||||
- fossology-license (queue swh_indexer_fossology_license): compute the | - fossology-license (queue swh_indexer_fossology_license): compute the | ||||
license | license | ||||
- metadata: translate file into translated_metadata dict | - metadata: translate file from an ecosystem-specific formats to JSON-LD | ||||
(using schema.org/CodeMeta vocabulary) | |||||
Current revision indexers: | Current origin indexers: | ||||
- metadata: detects files containing metadata and retrieves translated_metadata | - metadata: translate file from an ecosystem-specific formats to JSON-LD | ||||
in content_metadata table in storage or run content indexer to translate | (using schema.org/CodeMeta and ForgeFed vocabularies) | ||||
files. |