Page MenuHomeSoftware Heritage

make sure front-end services work when the Inria infra is down
Closed, MigratedEdits Locked

Description

During the last planned downtime it has become evident that front-end services (like the web app) does not work when the infrastructure hosted at Inria is down, in spite of being (at least in theory) fully cloud-hosted.

We should evaluate why that has been the case and make front-end services really independent from services currently running at Inria.

Event Timeline

zack renamed this task from make sure front-end services work when the Inria inra is down to make sure front-end services work when the Inria infra is down.Jun 21 2019, 2:25 PM
zack triaged this task as Normal priority.
zack created this task.

Just to correct the record: the web app is *not* fully cloud hosted today, in theory or practice, and never has been. The main archive.softwareheritage.org domain points to infrastructure hosted on Inria premises.

The performance of the database replica on azure has never been good enough to run the main frontend there. That's T1116, which has been closed in favor of *one* specific culprit, but there were more, e.g. large revision logs timeout, as well as large directory fetches. There hasn't been more efforts to make this replica work better since we've tried putting it there, and we've therefore never changed the pointer to the main archive to point to it.

The downtime of the web app during the migration of louvre was due to a routing issue which made all the internal traffic of virtual machines with two interfaces (public and private) go through louvre (the server that was being migrated) even though that was unnecessary. This affected all key services, notably the web app, and the internal DNS server on which all internal communication somewhat depends (after a while).

Once the problem was noticed and fixed, the downtime of that public-facing service ended.

The webapp hosted on azure, TTBOMK, didn't suffer any downtime.