Page MenuHomeSoftware Heritage

vault: retry RPC calls on transient network errors
Started, Work in Progress, NormalPublic

Description

Until now, vault cooking fails on any network errors.
A retrying mechanism must be used on RPC calls to storage and scheduler to be resilient in case of transient errors during those calls.

Related to T1191

Event Timeline

tenma changed the task status from Open to Work in Progress.Dec 7 2020, 5:35 PM
tenma triaged this task as Normal priority.
tenma created this task.

Vault 0.5.0 packaged (includes a fix from @tenma about dropping the unused default configuration).
Vault configuration adapted to add the retry behavior puppet side.

This is deployed on staging and, from my tryouts, it seems to behave fine.
Remains to deploy it to the vault server now (vangogh.euwest.azure).

Deployed configuration change in staging without issues.
Deployed configuration change in produciton as well.