The general idea is to split the workload between:
- granet1 (current server): will be used to serve the compressed graph in production
- granet2 (to be provisioned): will be used to periodically export/compress the graph from the archive, and sporadic R&D experiments
that way we can play with the graph compression stack without impacting the planned in-production use of swh-graph (T2220).
Hardware specs (WIP)
- RAM: 2-3 TiB
- 2 TiB is a comfortable estimate for being able to compress the graph for now and in the foreseeable future
- 3 TiB is what we want to be able to extend to in the future, ideally without having to swap RAM (this impacts both the kind of DIMMs we want and possibly CPUs)
- CPUs: 48 cores (same as granet1) should be plenty
- Disks:
- SSD: 20 TB
- HDD: 50 TB
General comments on storage:- For the intended use of granet2 (!= granet1), all work space on disk could be considered scratch space with no redundancy, if needed to keep storage costs under control
- If the new operation requirements for granet1 allows that, we should consider moving disks (of any kind) from granet1 to granet2
- In hindsight, buying "read-intensive" for granet1 might have been a mistake. So we should avoid that this time