Page MenuHomeSoftware Heritage

[production] Provision enough space for the search ES cluster to ingest all intrinsic metadata
Open, NormalPublic

Description

Considering it looks like there will be a lot of new indexed object, we must be sure the chosen solution will be able to scale up in the future

Possible identified options:

  • Deploy a new elasticsearch instance on esnode* [1]
  • Build a new cluster with new bare metal servers [2]
  • extends current cluster storage [3]

[1] Some drawbacks:

  • there is 32Go of memory on the servers, 16G are allocated to the current elasticsearches, in the short term, we can use 8go per instance but we must be sure the log cluster is functional with only 8go
    • Possible solution: Increase memory (4 memory slots are still available on the servers)
  • the current cluster use 3To per node for a total of 7To, which could be too small to keep the replication factor in case of a failure of one node. There is no more slot available to add new disks, the possible solutions:
    • reducing the retention delay of the logs (estimated gain 2To)
    • Replace disk by bigger ones

[2] 3 new servers 1U 4x2.4To (4.8Toeffective) 32Go memory cost: ~8500e (possible to have up to 8 disks)
[3] The storage is on the ceph cluster can be increased as there are several free disk slots on beaubourg/hypervisor3/branly

Initial discussion on irc:

11:25 <+vsellier> we will have to check the elasticsearch behavior with production volume
11:26 <+vsellier> the size increases have been important on staging
11:26 <+vsellier> around x5
11:26 <+vsellier> (the index size)
11:27 <+vsellier> if the ratio is the same for production, the bump could be from 250g to 1.2T
11:28 <+vsellier> the search-esnode are not yet sized for this
11:31 <vlorentz> shouldn't we stop the metadata ingestion until they are?
11:31 <+vsellier> (they are vms on proxmox, we will have to think where we can find so much space)
11:32 <+vsellier> the search journal client is not deployed so no problem for the moment, nothing is sent to ES
11:33 <+vsellier> (I mean not deployed in production, it is only on staging)

Event Timeline

vlorentz triaged this task as Normal priority.Feb 11 2021, 1:17 PM
vlorentz created this task.
vsellier renamed this task from Provision enough space for the search ES cluster to ingest all intrinsic metadata to [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.Feb 15 2021, 10:02 AM

Sharing resources with the existing esnodes is a non-starter IMO; we probably want SSD storage for this. Plus they're at (physical) capacity.

Adding more storage to the proxmox ceph means eating a 3x replication cost on top of the elasticsearch replication, which doesn't give me very warm feelings. I felt the current deployment was more of a PoC than something we would expand longer term.

So I'm leaning towards the dedicated hw proposal. But I'm guessing your pricing is using rotational storage; Is this 10k SAS? Do you think that will be fast enough? I believe the current esnodes are 7.2k SATA, so pretty slow spinners.

What would pricing look like with SSDs?

Thanks for the feedback

The initial quotation was for a PowerEdge R6515 with 2 system drives (240Go) + 4 additional SAS 10k 2.4T drives.
With 4 1.92 To SSD (3.8To effective) the quotation is ~ 11500e

Final quotation sent for approval.
The details are:
3 PowerEdge R6515 (1u) with per server:

  • 10 disks enclosure
  • BOSS controller with 2 240Go cards (for system)
  • 4 SAS 2.5" 10k 2.4To disks
  • SFP+ network card
  • 2 SFP cables
  • 2 power supplies with their cables
  • IDRac enterprise
  • Rack mount rails with cable management

(I don't attach the quotation to respect the math-info EULA)

After talking with @rdicosmo, we finally chose to replace on each server the 4 HDD 2.4To by 6 SSD 1.9To to be sure we will have good performances and enought space for the future.
The quote wil nowl be sent to the purchasing service according to the usual procedure [1]

[1] https://intranet.softwareheritage.org/wiki/Team_charter#Procurement

Apparently the order was lost somewhere after it was sent to dell the 6th april 🤔
It was reissued yesterday...

The order was received and confirmed by dell ETA: 28th may
The detail was sent on the sysadm mailing list

According to the tracking page, the command has left the factory the Apr 22, 2021, The ETA is May 28, 2021*.

  • The DSI is notified of the arrival of the package.

The command seems to be delivered, I will check with the DSI how we can proceed for the installation