Page MenuHomeSoftware Heritage

Persistent readonly perfect hash table: benchmarks
Closed, ResolvedPublic

Description

  • Write benchmarks measuring performances for contents in the order of 100GB and 10 millions entries
  • Include the benchmarks in the python package
  • Functional tests of the benchmarks to run in the CI
  • Run the benchmarks on grid5000 hardware and archive the results

Outcome:

  • A repository tree containing (i) instructions to run the benchmarks, (ii) the software to run the benchmarks
  • The grid5000 results added to the repository at a tag marked with the date at which they were run

Event Timeline

dachary triaged this task as Normal priority.Aug 29 2021, 1:26 PM
dachary created this task.
dachary created this object in space S1 Public.
dachary updated the task description. (Show Details)
dachary changed the task status from Open to Work in Progress.Oct 18 2021, 9:02 PM

Created a project in https://portal.fed4fire.eu/ with the intention of using grid5000. It is pending approval from an administrator (see T3670).

Running benchmarks directly on grid5000

  • oarsub -I -l "{cluster='dahu'}/host=1,walltime=1" -t deploy
  • kadeploy3 -f $OAR_NODE_FILE -e debian11-x64-base -k
  • ssh root@$(tail -1 $OAR_NODE_FILE)
  • mkfs.ext4 /dev/sdb1
  • mount /dev/sdb1 /mnt
  • apt-get install -y python3-venv libcmph-dev gcc git
  • git clone https://git.easter-eggs.org/biceps/swh-perfecthash/
  • python3 -m venv bench
  • source bench/bin/activate
  • pip install -r requirements.txt -r requirements-test.txt
  • cd swh-perfecthash
  • tox -e py3
  • time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((100 * 1024)) -k test_build_speed
  • rm -fr /mnt/pytest
  • time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed number of objects = 45973118 baseline 163.73826217651367, write_duration 300.58917450904846, build_duration 26.01908826828003, total_duration 326.6082627773285

Conclusions:

  • Writing the content of objects the takes longer because there are 45 millions python function calls, but it is acceptable
  • Creating the perfect hash table and writing it to file is measured in seconds in the worst case scenario, i.e. there are only small objects therefore millions of them

There is an error on mmap which was not detected, therefore no information on why it failed. This was fixed.

# PYTHONMALLOC=malloc valgrind --tool=memcheck .tox/py3/bin/pytest --basetem
p=/mnt/pytest -k test_build_speed --shard-size $((100 * 1024)) --object-max-size $((16 * 1024 * 1024)) swh/perfect
hash/tests/test_hash.py                                                                                           
==17519== Memcheck, a memory error detector                                                                       ==17519== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17519== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info                                     
==17519== Command: .tox/py3/bin/pytest --basetemp=/mnt/pytest -k test_build_speed --shard-size 102400 --object-max
-size 16777216 swh/perfecthash/tests/test_hash.py                                                                
==17519==                                                                                                         ============================================== test session starts ===============================================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0                                            
rootdir: /root/swh-perfecthash, configfile: pytest.ini                                                           
plugins: cov-3.0.0                                                                                               
collected 2 items / 1 deselected / 1 selected                                                                    
                                                                                                                 
swh/perfecthash/tests/test_hash.py ==17519== Invalid write of size 8                                              ==17519==    at 0x8DF92A1: memcpy (string_fortified.h:34)
==17519==    by 0x8DF92A1: shard_object_write (hash.c:104)                                                       
==17519==    by 0x8DF86E5: _cffi_f_shard_object_write (_hash_cffi.c:898)                                         
==17519==    by 0x53F389: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9)                                           
==17519==    by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)                                        ==17519==    by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==17519==    by 0x53BCFA: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==17519==    by 0x5106EC: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==17519==    by 0x53C360: PyObject_Call (in /usr/bin/python3.9)                                                  
==17519==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==17519==  Address 0x1ff is not stack'd, malloc'd or (recently) free'd                                           
==17519==
Fatal Python error: Segmentation fault

and with the debug activated:

============================================= test session starts ===============================================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0                                            
rootdir: /root/swh-perfecthash, configfile: pytest.ini                                                            plugins: cov-3.0.0
collected 2 items / 1 deselected / 1 selected                                                                    
                                                                                                                 
swh/perfecthash/tests/test_hash.py hnumber of objects = 12814, total size = 107373772352                         
shard_object_write: object_size = 7806490 n_object_size = 1882072536171151360                                    
shard_object_write: object_offset = 512                                                                          
==21356== Invalid write of size 8                                                                                 ==21356==    at 0x8DF92E1: memcpy (string_fortified.h:34)
==21356==    by 0x8DF92E1: shard_object_write (hash.c:104)                                                       
==21356==    by 0x8DF86F5: _cffi_f_shard_object_write (_hash_cffi.c:898)                                         
==21356==    by 0x53F389: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9)                                           
==21356==    by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)                                        ==21356==    by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==21356==    by 0x53BCFA: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==21356==    by 0x5106EC: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==21356==    by 0x53C360: PyObject_Call (in /usr/bin/python3.9)                                                  
==21356==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==21356==  Address 0x1ff is not stack'd, malloc'd or (recently) free'd                                           
==21356==
Fatal Python error: Segmentation fault
$ time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed
number of objects = 45973694, total size = 105903024192                                                          
baseline 165.74853587150574, write_duration 495.07564210891724, build_duration 24.210500478744507, total_duration  519.2861425876617


$ time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((100 * 1024 * 1024)) -k test_build_speed
number of objects = 2057, total size = 107374116576                                                               
baseline 165.85373330116272, write_duration 327.1912658214569, build_duration 0.0062100887298583984, total_duration 327.19747591018677