Page MenuHomeSoftware Heritage

Make the OriginMetadataIndexer fetch rev metadata from the storage instead of getting them via the scheduler.
ClosedPublic

Authored by vlorentz on Nov 23 2018, 4:42 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Some non blocking remark/question.

swh/indexer/metadata.py
308

wondering whether we are not missing a tool id here (not an immediate problem i think).

swh/indexer/origin_head.py
87

how come you don't need that anymore?

This revision is now accepted and ready to land.Nov 23 2018, 5:11 PM
vlorentz added inline comments.
swh/indexer/metadata.py
308

Good point, I didn't think of that.

That makes me realize this line only work with the mock idx storage. revision_metadata_get returns a list, with possibly more than one item per id when there are multiple tools.

swh/indexer/origin_head.py
87

D704 made it optional

swh/indexer/origin_head.py
87

Yes but why you needed it and now you don't?

Also, i think it was opened only for that case ;)

  • Fix revision_metadata_get mock and its usage.
vlorentz added inline comments.
swh/indexer/origin_head.py
87

Because the origin int meta indexer now calls revision_metadata_get instead of getting revisions_metadata via the scheduler.