Details
- Reviewers
anlambert moranegg - Group Reviewers
Reviewers - Commits
- rDDOC5f92841cb0e9: Add an overview of the metadata workflow
Diff Detail
- Repository
- rDDOC Development documentation
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 23538 Build 36725: arc lint + arc unit
Event Timeline
| docs/architecture/metadata.rst | ||
|---|---|---|
| 9 | Should not it be plural here ? These metadata are partitioned ... | |
| docs/architecture/metadata.rst | ||
|---|---|---|
| 9 | "(meta)data" is uncountable, so it's usually singular. | |
| docs/architecture/metadata.rst | ||
|---|---|---|
| 48 | An indexer work only on stored/archived metadata? Can't we have an indexer to store number of forks or stars in a github repo? | |
| docs/architecture/metadata.rst | ||
|---|---|---|
| 48 | No; indexers are not supposed to access the internet. Instead, we'll need to add some sort of loader that fetches and stores API responses; and indexers would read the stored responses | |
Good idea to add this documentation!
I have added a few comments.
| docs/architecture/metadata.rst | ||
|---|---|---|
| 7 | intrinsic metadata is part of the code. |swh| calls "metadata" information it collects and extracts that describes and provides additional information on the source code itself. | |
| 13 | switch collected with extracted. I propose to keep collect for the actions of gathering from external resources. | |
| 17 | here I would suggest not to define with the negative statement :term:`extrinsic metadata`, which is collected or deposited from external sources. | |
| 71 | I would put this section before the metadata mining as an introduction to it | |
| 74 | delete as we saw above | |
| 75 | The raw metadata is the authentic piece of metadata while the indexed metadata is a processed version, where the raw metadata is translated to a uniform vocabulary. both intrinsic and extrinsic metadata can be indexed and translated. | |
| 77 | drop and is not bug free | |
| 79 | it's not only because of bugs, we keep both because the information is different, we don't translate all properties, etc. | |
| docs/architecture/metadata.rst | ||
|---|---|---|
| 77 | why? | |
| docs/architecture/metadata.rst | ||
|---|---|---|
| 77 | because of the explanation in the next comment. | |
| docs/architecture/metadata.rst | ||
|---|---|---|
| 61 | Add: The raw metadata is the authentic piece of metadata while the indexed metadata is a processed version, where the raw metadata is translated to a uniform vocabulary. Both intrinsic and extrinsic metadata can be indexed and translated. | |
| 66 | Add: By keeping the raw metadata we ensure the possibility to re-compute the metadata in the future with other vocabularies. Furthermore, if we did not store the raw metadata, this would mean bugs in indexers.... (continue with the sentence in text) | |