Details
- Reviewers
anlambert moranegg - Group Reviewers
Reviewers - Commits
- rDDOC5f92841cb0e9: Add an overview of the metadata workflow
Diff Detail
- Repository
- rDDOC Development documentation
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
docs/architecture/metadata.rst | ||
---|---|---|
9 | Should not it be plural here ? These metadata are partitioned ... |
docs/architecture/metadata.rst | ||
---|---|---|
9 | "(meta)data" is uncountable, so it's usually singular. |
docs/architecture/metadata.rst | ||
---|---|---|
48 | An indexer work only on stored/archived metadata? Can't we have an indexer to store number of forks or stars in a github repo? |
docs/architecture/metadata.rst | ||
---|---|---|
48 | No; indexers are not supposed to access the internet. Instead, we'll need to add some sort of loader that fetches and stores API responses; and indexers would read the stored responses |
Good idea to add this documentation!
I have added a few comments.
docs/architecture/metadata.rst | ||
---|---|---|
7 | intrinsic metadata is part of the code. |swh| calls "metadata" information it collects and extracts that describes and provides additional information on the source code itself. | |
13 | switch collected with extracted. I propose to keep collect for the actions of gathering from external resources. | |
17 | here I would suggest not to define with the negative statement :term:`extrinsic metadata`, which is collected or deposited from external sources. | |
71 | I would put this section before the metadata mining as an introduction to it | |
74 | delete as we saw above | |
75 | The raw metadata is the authentic piece of metadata while the indexed metadata is a processed version, where the raw metadata is translated to a uniform vocabulary. both intrinsic and extrinsic metadata can be indexed and translated. | |
77 | drop and is not bug free | |
79 | it's not only because of bugs, we keep both because the information is different, we don't translate all properties, etc. |
docs/architecture/metadata.rst | ||
---|---|---|
77 | why? |
docs/architecture/metadata.rst | ||
---|---|---|
77 | because of the explanation in the next comment. |
docs/architecture/metadata.rst | ||
---|---|---|
61 | Add: The raw metadata is the authentic piece of metadata while the indexed metadata is a processed version, where the raw metadata is translated to a uniform vocabulary. Both intrinsic and extrinsic metadata can be indexed and translated. | |
66 | Add: By keeping the raw metadata we ensure the possibility to re-compute the metadata in the future with other vocabularies. Furthermore, if we did not store the raw metadata, this would mean bugs in indexers.... (continue with the sentence in text) |