Page MenuHomeSoftware Heritage

Decrease parallelism on indexer workers running journal clients
ClosedPublic

Authored by ardumont on Aug 1 2022, 2:49 PM.

Details

Summary

And now:

  • indexer-worker0[1-3] runs content indexer (mimetype, fossology)
  • indexer-worker0[4-6] runs origin intrinsic metadata indexer

origin extrinsic metadata is running on all instances.

And vault-worker01: Clean up useless configuration

Related to T4415

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd added a subscriber: olasd.

The origin extrinsic metadata indexer only runs on a single file at a time, so I don't think it needs to run alone on dedicated hosts.

Having journal clients in the swh::deploy::worker hierarchy is a bit confusing, we should probably properly split the manifests between celery workers and journal clients to unconfuse things.

Furthermore, we will probably want to deploy multiple instances of the same journal client on a given host (for parallelism), using parametric systemd units, so the way we deploy them will probably diverge even further

This revision is now accepted and ready to land.Aug 1 2022, 3:35 PM
In D8167#212470, @olasd wrote:

The origin extrinsic metadata indexer only runs on a single file at a time, so I don't think it needs to run alone on dedicated hosts.

ok, so i'll amend this so it's declared on all hosts.

Having journal clients in the swh::deploy::worker hierarchy is a bit confusing, we should probably properly split the manifests between celery workers and journal clients to unconfuse things.

Yes, agreed.

I previously did the minimal work to allow cleaning up the previous services and declare the new indexer journal client to be deployed.

Furthermore, we will probably want to deploy multiple instances of the same journal client on a given host (for parallelism), using parametric systemd units, so the way we deploy them will probably diverge even further

Makes sense indeed.

  • Decrease parallelism on indexer workers running journal clients