No; indexers are not supposed to access the internet. Instead, we'll need to add some sort of loader that fetches and stores API responses; and indexers would read the stored responses
Good idea to add this documentation!
I have added a few comments.
intrinsic metadata is part of the code.
|swh| calls "metadata" information it collects and extracts that describes and provides additional information on the source code itself.
switch collected with extracted.
I propose to keep collect for the actions of gathering from external resources.
here I would suggest not to define with the negative statement
:term:`extrinsic metadata`, which is collected or deposited from external sources.
I would put this section before the metadata mining as an introduction to it
delete as we saw above
The raw metadata is the authentic piece of metadata while the indexed metadata is a processed version, where the raw metadata is translated to a uniform vocabulary.
both intrinsic and extrinsic metadata can be indexed and translated.
drop and is not bug free
it's not only because of bugs, we keep both because the information is different, we don't translate all properties, etc.
The raw metadata is the authentic piece of metadata while the indexed metadata is a processed version, where the raw metadata is translated to a uniform vocabulary. Both intrinsic and extrinsic metadata can be indexed and translated.
By keeping the raw metadata we ensure the possibility to re-compute the metadata in the future with other vocabularies. Furthermore, if we did not store the raw metadata, this would mean bugs in indexers....
(continue with the sentence in text)