Page MenuHomeSoftware Heritage

Running the benchmarks: August 6th, 2021, 9 days
Started, Work in Progress, NormalPublic

Description

Event Timeline

dachary changed the task status from Open to Work in Progress.Tue, Jul 6, 8:18 AM
dachary triaged this task as Normal priority.
dachary created this task.
dachary created this object in space S1 Public.

Special permission request sent:

Bonjour,

Account: https://api.grid5000.fr/stable/users/ user ldachary
Laboratory: Software Heritage Special Task Force Unit Detached
Project: Software Heritage

In the past months a novel object storage architecture was designed[0] and experimented on using the grid5000 grenoble cluster[1]. It allows for the efficient storage of 100 billions immutable small objects (median size of 4KB). It will be used by the Software Heritage project to keep accumulating the publicly available source code that is constantly growing. Software Heritage already published articles[2][3] and more are expected in the future. Their work would not be possible without this novel object storage architecture because the current solutions are either not efficient enough or too costly.

Request for resources:

The goal is to run a benchmark demonstrating the object storage architecture delivers the expected results in an experimental environment at scale. Running them over the week-end (60 hours) shows they behave as expected but they do not exhaust the resources of the cluster (using only 20% of the disk capacity). Running the benchmark during 9 days would allow to use approximately 100TB of storage instead of 20TB. It is still only a fraction of the target volume (10PB) but it may reveal issues that could not be observed on a smaller scale.

Cheers

[0] https://wiki.softwareheritage.org/wiki/A_practical_approach_to_efficiently_store_100_billions_small_objects_in_Ceph
[1] https://forge.softwareheritage.org/T3149
[2] https://www.softwareheritage.org/wp-content/uploads/2021/03/ieee-sw-gender-swh.pdf
[3] https://hal.archives-ouvertes.fr/hal-02543794

dachary renamed this task from Running the benchmarks: August, 10 day to Running the benchmarks: August 6th, 2021, 9 days.Sat, Jul 10, 7:58 AM
dachary updated the task description. (Show Details)

Received yesterday:

Hello Loïc,

Your request is approved.

You can reserve 30 dahu and 3 yeti nodes from August 6th for 9 days (we
would like to keep at least one node available from each cluster).

Have a nice weekend,

$ oarsub -t exotic -l "{cluster='dahu'}/host=30+{cluster='yeti'}/host=3,walltime=216" --reservation '2021-08-06 19:00:00' -t deploy                                   
[ADMISSION RULE] Include exotic resources in the set of reservable resources (this does NOT exclude non-exotic resources).                                                                       
[ADMISSION RULE] Error: Walltime too big for this job, it is limited to 168 hours

The usual grid5000 contact is on vacation, falling back to his replacement to resolve this.

Mail sent today:

Hi Simon,

I was about to make the reservation and ran into the following problem:

$ oarsub -t exotic -l "{cluster='dahu'}/host=30+{cluster='yeti'}/host=3,walltime=216" --reservation '2021-08-06 19:00:00' -t deploy
[ADMISSION RULE] Include exotic resources in the set of reservable resources (this does NOT exclude non-exotic resources).
[ADMISSION RULE] Error: Walltime too big for this job, it is limited to 168 hours

Would you be so kind as to let me know how I can work around it? In the meantime I reserved for 163 hours (job 2019935) just to make sure the time slot is not inadvertently occupied by another request.

Thanks again for your help and have a wonderful day!

Reply:

On a une procédure pour ce genre de cas, je t'ai ajouté au groupe
"oar-unrestricted-adv-reservations" qui devrait lever toutes les
restrictions sur les réservations à l'avance de ressources. Tu devrais du
coup pouvoir refaire ta réservation avec le bon walltime.

J'ai mis une date d'expiration au 12 septembre sur ce groupe pour être sûr
que ça suffise, mais pense bien à refaire une demande d'utilisation
spéciale si tu as un nouveau besoin hors charte après celle d'août.

$ oarsub -t exotic -l "{cluster='dahu'}/host=30+{cluster='yeti'}/host=3,walltime=216" --reservation '2021-08-06 19:00:00' -t deploy
[ADMISSION RULE] Include exotic resources in the set of reservable resources (this does NOT exclude non-exotic resources).
[ADMISSION RULE] ldachary is granted the privilege to do unlimited reservations
[ADMISSION RULE] Computed global resource filter: -p "(deploy = 'YES') AND maintenance = 'NO'"
[ADMISSION_RULE] Computed resource request: -l {"(cluster='dahu') AND type = 'default'"}/host=30+{"(cluster='yeti') AND type = 'default'"}/host=3
Generate a job key...
OAR_JOB_ID=2019986
Reservation mode: waiting validation...
Reservation valid --> OK