Persistent readonly perfect hash table: benchmarks
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	dachary
	Aug 29 2021, 1:26 PM

Description

Write benchmarks measuring performances for contents in the order of 100GB and 10 millions entries
Include the benchmarks in the python package
Functional tests of the benchmarks to run in the CI
Run the benchmarks on grid5000 hardware and archive the results

Outcome:

A repository tree containing (i) instructions to run the benchmarks, (ii) the software to run the benchmarks
The grid5000 results added to the repository at a tag marked with the date at which they were run

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3116 Roll out at least one operational mirror
Migrated	gitlab-migration	T3054 Scale out object storage design
Migrated	gitlab-migration	T3432 Add winery backend
Migrated	gitlab-migration	T3104 Persistent readonly perfect hash table
Migrated	gitlab-migration	T3521 Persistent readonly perfect hash table: benchmarks

Event Timeline

dachary triaged this task as Normal priority.Aug 29 2021, 1:26 PM

dachary created this task.

dachary created this object in space S1 Public.

dachary updated the task description. (Show Details)

dachary added a parent task: T3104: Persistent readonly perfect hash table.

dachary updated the task description. (Show Details)Aug 29 2021, 1:44 PM

vsellier added a subscriber: vsellier.Oct 18 2021, 8:45 AM

The implementation of the benchmarks is prepared at:

https://git.easter-eggs.org/biceps/swh-perfecthash/-/tree/wip-benchmark

dachary changed the task status from Open to Work in Progress.Oct 18 2021, 9:02 PM

Created a project in https://portal.fed4fire.eu/ with the intention of using grid5000. It is pending approval from an administrator (see T3670).

Running benchmarks directly on grid5000

oarsub -I -l "{cluster='dahu'}/host=1,walltime=1" -t deploy
kadeploy3 -f $OAR_NODE_FILE -e debian11-x64-base -k
ssh root@$(tail -1 $OAR_NODE_FILE)
mkfs.ext4 /dev/sdb1
mount /dev/sdb1 /mnt
apt-get install -y python3-venv libcmph-dev gcc git
git clone https://git.easter-eggs.org/biceps/swh-perfecthash/
python3 -m venv bench
source bench/bin/activate
pip install -r requirements.txt -r requirements-test.txt
cd swh-perfecthash
tox -e py3
time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((100 * 1024)) -k test_build_speed
rm -fr /mnt/pytest

time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed number of objects = 45973118 baseline 163.73826217651367, write_duration 300.58917450904846, build_duration 26.01908826828003, total_duration 326.6082627773285

Conclusions:

Writing the content of objects the takes longer because there are 45 millions python function calls, but it is acceptable
Creating the perfect hash table and writing it to file is measured in seconds in the worst case scenario, i.e. there are only small objects therefore millions of them

There is an error on mmap which was not detected, therefore no information on why it failed. This was fixed.

# PYTHONMALLOC=malloc valgrind --tool=memcheck .tox/py3/bin/pytest --basetem
p=/mnt/pytest -k test_build_speed --shard-size $((100 * 1024)) --object-max-size $((16 * 1024 * 1024)) swh/perfect
hash/tests/test_hash.py                                                                                           
==17519== Memcheck, a memory error detector                                                                       ==17519== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17519== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info                                     
==17519== Command: .tox/py3/bin/pytest --basetemp=/mnt/pytest -k test_build_speed --shard-size 102400 --object-max
-size 16777216 swh/perfecthash/tests/test_hash.py                                                                
==17519==                                                                                                         ============================================== test session starts ===============================================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0                                            
rootdir: /root/swh-perfecthash, configfile: pytest.ini                                                           
plugins: cov-3.0.0                                                                                               
collected 2 items / 1 deselected / 1 selected                                                                    
                                                                                                                 
swh/perfecthash/tests/test_hash.py ==17519== Invalid write of size 8                                              ==17519==    at 0x8DF92A1: memcpy (string_fortified.h:34)
==17519==    by 0x8DF92A1: shard_object_write (hash.c:104)                                                       
==17519==    by 0x8DF86E5: _cffi_f_shard_object_write (_hash_cffi.c:898)                                         
==17519==    by 0x53F389: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9)                                           
==17519==    by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)                                        ==17519==    by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==17519==    by 0x53BCFA: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==17519==    by 0x5106EC: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==17519==    by 0x53C360: PyObject_Call (in /usr/bin/python3.9)                                                  
==17519==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==17519==  Address 0x1ff is not stack'd, malloc'd or (recently) free'd                                           
==17519==
Fatal Python error: Segmentation fault

and with the debug activated:

============================================= test session starts ===============================================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0                                            
rootdir: /root/swh-perfecthash, configfile: pytest.ini                                                            plugins: cov-3.0.0
collected 2 items / 1 deselected / 1 selected                                                                    
                                                                                                                 
swh/perfecthash/tests/test_hash.py hnumber of objects = 12814, total size = 107373772352                         
shard_object_write: object_size = 7806490 n_object_size = 1882072536171151360                                    
shard_object_write: object_offset = 512                                                                          
==21356== Invalid write of size 8                                                                                 ==21356==    at 0x8DF92E1: memcpy (string_fortified.h:34)
==21356==    by 0x8DF92E1: shard_object_write (hash.c:104)                                                       
==21356==    by 0x8DF86F5: _cffi_f_shard_object_write (_hash_cffi.c:898)                                         
==21356==    by 0x53F389: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9)                                           
==21356==    by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)                                        ==21356==    by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==21356==    by 0x53BCFA: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==21356==    by 0x5106EC: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==21356==    by 0x53C360: PyObject_Call (in /usr/bin/python3.9)                                                  
==21356==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==21356==  Address 0x1ff is not stack'd, malloc'd or (recently) free'd                                           
==21356==
Fatal Python error: Segmentation fault

$ time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed
number of objects = 45973694, total size = 105903024192                                                          
baseline 165.74853587150574, write_duration 495.07564210891724, build_duration 24.210500478744507, total_duration  519.2861425876617


$ time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((100 * 1024 * 1024)) -k test_build_speed
number of objects = 2057, total size = 107374116576                                                               
baseline 165.85373330116272, write_duration 327.1912658214569, build_duration 0.0062100887298583984, total_duration 327.19747591018677

olasd moved this task from Backlog to RedHat collaboration on the Object storage board.Nov 2 2021, 4:21 PM

olasd edited projects, added Object storage (RedHat collaboration); removed Object storage.

dachary closed this task as Resolved.Dec 8 2021, 5:07 PM

This task has been migrated to GitLab.

Persistent readonly perfect hash table: benchmarksClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Persistent readonly perfect hash table: benchmarks
Closed, MigratedEdits Locked
Actions

Related Objects
Search...