HomeSoftware Heritage

swh.scheduler.cli: Use bulk api to index tasks

Description

swh.scheduler.cli: Use bulk api to index tasks

Unfortunately, the bulk api does not return the full indexed data's
original source [1] (which we need to identify the data clean up in
the db). So we leverage elasticsearch's multi-get api to read back
the original source.

Related T986

Details

Provenance
ardumontAuthored on Mar 29 2018, 10:47 AM
ardumontPushed on Mar 29 2018, 12:32 PM
Parents
rDSCH4d13f5dc4940: swh.scheduler.cli.archive: Improve dry-run behavior
Branches
Unknown
Tags
Unknown