Page MenuHomeSoftware Heritage

DB storage cleanup
Closed, ResolvedPublic

Description

Our current mount point for our main db is saturated (the one for the worker and currently for the archiver in its main design).

But, as it turns out, we have softwareheritage-log db which is around 1.1Tb (this is one quarter of the available space for the main cluster).

So we can:

  • clean up a bit (by removing logs with log level > info older than 1 month).
  • dump the db softwareheritage-log
  • restore that dump in the other cluster (the other mount point which is standard hdd with much space).

-> The db will be compressed to what's needed only so less than 1.1Tb this time.

  • drop softwareheritage-log in the main cluster
  • Update /etc/pbbouncer/pgbouncer.ini with the new db port to keep serving softwareheritage-log

Note:

  • As we use pgbouncer, this should not change anything downstream.

Event Timeline

Thanks for opening this task, I agree this is the way to go. Once done we can easily deploy the archiver as it is supposed to work in the long-term, reducing the need of further migrations down the road.

There are several sub-tasks here though, that I think should be untangled:

  1. moving the log DB from the SSD-based DB to the spinning-drive one
  2. removing old verbose log entries (level > info)
  3. reducing the current log configuration of our various services so that log size becomes more manageable in the medium term

What is blocking for T240 is probably only (1). But if you start looking into this (which I recommend you do), it'd be nice to fix all of it, while the context is clear.

ardumont renamed this task from db storage cleanup to DB storage cleanup.Jul 13 2016, 10:55 AM
ardumont removed a parent task: T240: content archiver.

But if you start looking into this (which I recommend you do), it'd be nice to fix all of it, while the context is clear.

Agreed and on it ^^

Note for readers:

  • The bullet points do not represent order, it's merely references to get back to ^^