Running the benchmarks: August 6th, 2021, 9 days
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	dachary
	Jul 6 2021, 8:18 AM

Description

✅ Ask for special permission to use the cluster during 9 days
✅ reserve 30 machine for the Read Storage + 3 machines for the Write Storage for 216 hours starting August 6th 7pm
✅ August 6th 7pm, install the latest version of the benchmark software T3149 and prepare the run
✅ Populate the global index with 10 billions entries
✅ August 7th 7pm, run the benchmark with 40TB
✅ August 16th, analyze the results.

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3116 Roll out at least one operational mirror
Migrated	gitlab-migration	T3054 Scale out object storage design
Migrated	gitlab-migration	T3422 Running the benchmarks: August 6th, 2021, 9 days

Event Timeline

dachary changed the task status from Open to Work in Progress.Jul 6 2021, 8:18 AM

dachary triaged this task as Normal priority.

dachary created this task.

dachary created this object in space S1 Public.

dachary added a parent task: T3054: Scale out object storage design.

dachary mentioned this in T3054: Scale out object storage design.Jul 6 2021, 8:21 AM

dachary updated the task description. (Show Details)Jul 6 2021, 1:49 PM

Special permission request sent:

Bonjour,

Account: https://api.grid5000.fr/stable/users/ user ldachary
Laboratory: Software Heritage Special Task Force Unit Detached
Project: Software Heritage

In the past months a novel object storage architecture was designed[0] and experimented on using the grid5000 grenoble cluster[1]. It allows for the efficient storage of 100 billions immutable small objects (median size of 4KB). It will be used by the Software Heritage project to keep accumulating the publicly available source code that is constantly growing. Software Heritage already published articles[2][3] and more are expected in the future. Their work would not be possible without this novel object storage architecture because the current solutions are either not efficient enough or too costly.

Request for resources:

Site grenoble

32 dahu nodes if possible, 28 minimum https://www.grid5000.fr/w/Grenoble:Hardware#dahu

4 yeti nodes if possible, 2 minimum https://www.grid5000.fr/w/Grenoble:Hardware#yeti

Date of the reservation: August 6th or 14th or 21st, 7pm

Duration: 9 days

The goal is to run a benchmark demonstrating the object storage architecture delivers the expected results in an experimental environment at scale. Running them over the week-end (60 hours) shows they behave as expected but they do not exhaust the resources of the cluster (using only 20% of the disk capacity). Running the benchmark during 9 days would allow to use approximately 100TB of storage instead of 20TB. It is still only a fraction of the target volume (10PB) but it may reveal issues that could not be observed on a smaller scale.

Cheers

[0] https://wiki.softwareheritage.org/wiki/A_practical_approach_to_efficiently_store_100_billions_small_objects_in_Ceph
[1] https://forge.softwareheritage.org/T3149
[2] https://www.softwareheritage.org/wp-content/uploads/2021/03/ieee-sw-gender-swh.pdf
[3] https://hal.archives-ouvertes.fr/hal-02543794

anlambert added a project: Object storage.Jul 7 2021, 2:35 PM

Received yesterday:

Hello Loïc,

Your request is approved.

You can reserve 30 dahu and 3 yeti nodes from August 6th for 9 days (we
would like to keep at least one node available from each cluster).

Have a nice weekend,

$ oarsub -t exotic -l "{cluster='dahu'}/host=30+{cluster='yeti'}/host=3,walltime=216" --reservation '2021-08-06 19:00:00' -t deploy                                   
[ADMISSION RULE] Include exotic resources in the set of reservable resources (this does NOT exclude non-exotic resources).                                                                       
[ADMISSION RULE] Error: Walltime too big for this job, it is limited to 168 hours

The usual grid5000 contact is on vacation, falling back to his replacement to resolve this.

Mail sent today:

Hi Simon,

I was about to make the reservation and ran into the following problem:

$ oarsub -t exotic -l "{cluster='dahu'}/host=30+{cluster='yeti'}/host=3,walltime=216" --reservation '2021-08-06 19:00:00' -t deploy
[ADMISSION RULE] Include exotic resources in the set of reservable resources (this does NOT exclude non-exotic resources).
[ADMISSION RULE] Error: Walltime too big for this job, it is limited to 168 hours

Would you be so kind as to let me know how I can work around it? In the meantime I reserved for 163 hours (job 2019935) just to make sure the time slot is not inadvertently occupied by another request.

Thanks again for your help and have a wonderful day!

Reply:

On a une procédure pour ce genre de cas, je t'ai ajouté au groupe
"oar-unrestricted-adv-reservations" qui devrait lever toutes les
restrictions sur les réservations à l'avance de ressources. Tu devrais du
coup pouvoir refaire ta réservation avec le bon walltime.

J'ai mis une date d'expiration au 12 septembre sur ce groupe pour être sûr
que ça suffise, mais pense bien à refaire une demande d'utilisation
spéciale si tu as un nouveau besoin hors charte après celle d'août.

$ oarsub -t exotic -l "{cluster='dahu'}/host=30+{cluster='yeti'}/host=3,walltime=216" --reservation '2021-08-06 19:00:00' -t deploy
[ADMISSION RULE] Include exotic resources in the set of reservable resources (this does NOT exclude non-exotic resources).
[ADMISSION RULE] ldachary is granted the privilege to do unlimited reservations
[ADMISSION RULE] Computed global resource filter: -p "(deploy = 'YES') AND maintenance = 'NO'"
[ADMISSION_RULE] Computed resource request: -l {"(cluster='dahu') AND type = 'default'"}/host=30+{"(cluster='yeti') AND type = 'default'"}/host=3
Generate a job key...
OAR_JOB_ID=2019986
Reservation mode: waiting validation...
Reservation valid --> OK

dachary updated the task description. (Show Details)Jul 12 2021, 3:43 PM

dachary updated the task description. (Show Details)Jul 19 2021, 7:22 AM

Rehearse the run and make minor updates to make sure it runs right away this friday.

https://git.easter-eggs.org/biceps/biceps/-/tree/18c2bad480da19bd468c4be8b4bffa610ec6f88d

https://intranet.grid5000.fr/oar/Grenoble/monika.cgi?job=2025313

dachary updated the task description. (Show Details)Aug 2 2021, 10:34 AM

dachary updated the task description. (Show Details)Aug 12 2021, 7:28 AM

The run terminated August 11th @ 15:21 because of what appears to be a rare race condition. It was however mostly finished. The results show an unexpected degradation in the read performances. It deserves further investigation because it keeps degrading over time. The write performance are however stable and suggest the benchmark code itself may be responsible for this degradation. If the Ceph cluster was globally slowing down, both reads and writes would show a degradation in performance because previous benchmark results showed that there is a correlation between the two.

Bytes write   106.4 MB/s
Objects write 5.2 Kobject/s
Bytes read    94.6 MB/s
Objects read  23.1 Kobject/s
1014323 random reads take longer than 100ms (2.1987787007491675%)

https://git.easter-eggs.org/biceps/biceps/-/tree/4e998f180f1cc4ca00acefb552220b3992bd7a25

The benchmarks were modified to (i) use a fixed number of random / sequential readers instead of a random choice for better predictability, (ii) introduce throttling to cap the sequential reads speed to approximately 200MB/s. A run of read only was run:

ansible-playbook -i inventory tests-run.yml && ssh -t $runner direnv exec bench python bench/bench.py --reader-io 500 --rw-workers 0 --rand-ratio 5 --file-count-ro 0 --ro-workers 20 --file-size $((1 * 1024))

and at the same time rbd bench was run to continuously write on a single image, at ~200MB/s. The start of the rbd bench is a few minutes after the start of the read. It will run for the next 24h to verify that:

write speed is stable
read speed is stable
slow reads improved and stay under 2%

The number of slow random reads reaches ~3.5% presumably because there is too much write pressure (the throttling of writes was removed).

stats.csv                                                                                                                                                       100%   89KB 509.8KB/s   00:00    
too_long.csv                                                                                                                                                    100%  380KB   2.0MB/s   00:00    
Bytes write   0 B/s
Objects write 0 object/s
Bytes read    105.1 MB/s
Objects read  25.7 Kobject/s
16766 random reads take longer than 100ms (3.4325045859538785%)

Throttling writes to 120MBs to reduce the pressure:

ceph config set client rbd_qos_write_bps_limit $((120 * 1024 * 1024))

After 20 minutes or so:

Bytes write   0 B/s
Objects write 0 object/s
Bytes read    105.2 MB/s
Objects read  25.7 Kobject/s
26512 random reads take longer than 100ms (3.508214769647697%)

dachary closed this task as Resolved.Aug 23 2021, 12:25 PM

dachary updated the task description. (Show Details)

This task has been migrated to GitLab.

Running the benchmarks: August 6th, 2021, 9 daysClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Running the benchmarks: August 6th, 2021, 9 days
Closed, MigratedEdits Locked
Actions

Related Objects
Search...