Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T2213 Storage | ||
Migrated | gitlab-migration | T2214 Scale-out graph and database storage in production | ||
Migrated | gitlab-migration | T1892 Cassandra as a storage backend | ||
Migrated | gitlab-migration | T3357 Perform some tests of the cassandra storage on Grid5000 | ||
Migrated | gitlab-migration | T3573 [cassandra] directory and content read benchmarks |
Event Timeline
This is the results of the different runs:
- run a directory_ls on a big directory
- postgresql (saam)
vsellier@saam ~ % python3 directory_ls.py c864e846cb339a94da9fd91ae12cabcf083a8685 c864e846cb339a94da9fd91ae12cabcf083a8685: 8.9192s
- cassandra
10 directory_ls was launch to test the impacts of having the data in cache:
run id | one-by-one[1] | concurrent [2] |
1 | 233.6952s | 184.1209s |
2 | 102.1767s | 89.1716s |
3 | 41.1162s | 33.0051s |
4 | 25.4258s | 21.7695s |
5 | 18.2770s | 19.0004s |
6 | 16.8347s | 14.6394s |
7 | 15.4968s | 13.4750s |
8 | 14.1320s | 12.1201s |
9 | 13.3825s | 10.6304s |
10 | 13.1336s | 11.1261s |
[1] D6228 and D6229 reverted
[2] master of swh-storage https://archive.softwareheritage.org/swh:1:snp:dd585ae36b25c37fc9a4b5ab16fb4d0482a075a7;origin=https://forge.softwareheritage.org/source/swh-storage.git
The concurrent version is globally faster but the performances are not crazy compared to the postgresql version
- run 2000 directory_ls with 50 threads in parallel
IN PROGRESS
time cat directory_2000.lst | parallel --bar -j50 python3 directory_ls.py {} | tee directory_ls_[concurrent|one-by-one]-2000.output
- one-by-one:
real 11m34.692s user 19m6.793s sys 2m12.372s
The time is mostly the loading time of the longest repository:
bc0a1450e393f47cb34a6f1f1a6ee9206c739579: 670.7532s
- concurrent :
real 9m4.044s user 20m12.418s sys 2m13.689s
bc0a1450e393f47cb34a6f1f1a6ee9206c739579: 520.5311s
2 flame graphs of the previous directory_ls:
- one-by-one
first run (cache cold):
last run (cache hot):
- concurrent:
first run (cache cold):
last run (cache hot):
Test of a the new D6269 patch:
- directory_ls on c864e846cb339a94da9fd91ae12cabcf083a8685
run | duration |
---|---|
1 | 23.0080s |
2 | 10.0942s |
3 | 11.0577s |
4 | 7.4534s |
5 | 7.1187s |
6 | 7.2937s |
7 | 6.4423s |
8 | 6.3384s |
9 | 6.2031s |
10 | 6.0875s |
- directory_ls on bc0a1450e393f47cb34a6f1f1a6ee9206c739579 (the biggest directory in the 2000 ids)
run | duration |
---|---|
1 | 67.8805s |
2 | 38.4308s |
3 | 33.2637s |
4 | 30.7730s |
5 | 30.1840s |
6 | 27.8511s |
7 | 28.2692s |
8 | 27.8125s |
9 | 27.4106s |
10 | 27.6957s |
and the associated flamegraphes:
first run:
last run:
- 2000 directory_ls with 50 thread in parallel:
real 1m40.011s user 22m5.676s sys 2m22.493s
bc0a1450e393f47cb34a6f1f1a6ee9206c739579: 70.3515s
much better than previously
Some flame graphs of storage was performed during the ingestion with 50 workers in //
It seems most of them are doing a lot of directory_add
svgpp 1-2021-10-14T06:12:58Z.svg | grep storage.py 10:01:28 directory_add (swh/storage/cassandra/storage.py:517) (112 directory_add (swh/storage/cassandra/storage.py:517) (77 directory_add (swh/storage/cassandra/storage.py:517) (88 directory_add (swh/storage/cassandra/storage.py:517) (123 content_add (swh/storage/cassandra/storage.py:285) (43 samples, directory_add (swh/storage/cassandra/storage.py:517) (78 directory_add (swh/storage/cassandra/storage.py:517) (104 directory_add (swh/storage/cassandra/storage.py:517) (49 directory_add (swh/storage/cassandra/storage.py:517) (73 directory_add (swh/storage/cassandra/storage.py:517) (76 directory_add (swh/storage/cassandra/storage.py:517) (45 directory_add (swh/storage/cassandra/storage.py:517) (69 directory_add (swh/storage/cassandra/storage.py:517) (62 directory_add (swh/storage/cassandra/storage.py:517) (93 directory_add (swh/storage/cassandra/storage.py:517) (42 directory_add (swh/storage/cassandra/storage.py:517) (104 directory_add (swh/storage/cassandra/storage.py:517) (40 directory_add (swh/storage/cassandra/storage.py:517) (42 directory_add (swh/storage/cassandra/storage.py:517) (54 directory_add (swh/storage/cassandra/storage.py:517) (73 directory_add (swh/storage/cassandra/storage.py:517) (145 directory_add (swh/storage/cassandra/storage.py:517) (68 directory_add (swh/storage/cassandra/storage.py:517) (51 directory_add (swh/storage/cassandra/storage.py:517) (52 directory_add (swh/storage/cassandra/storage.py:517) (45 directory_add (swh/storage/cassandra/storage.py:517) (67 content_add (swh/storage/cassandra/storage.py:285) (48 samples, directory_add (swh/storage/cassandra/storage.py:517) (235 directory_add (swh/storage/cassandra/storage.py:517) (67 directory_add (swh/storage/cassandra/storage.py:517) (53 directory_add (swh/storage/cassandra/storage.py:517) (210 directory_add (swh/storage/cassandra/storage.py:517) (45 directory_add (swh/storage/cassandra/storage.py:517) (129 directory_add (swh/storage/cassandra/storage.py:517) (46 directory_add (swh/storage/cassandra/storage.py:517) (41 directory_add (swh/storage/cassandra/storage.py:517) (69 directory_add (swh/storage/cassandra/storage.py:517) (70
It sounds it's the longest operation, but it looks strange there are no occurrences of the functions called by the loader filters. I don't know if it's due to the frequency of the py-spy captures (reduced due to the number of threads)
I will try another approach by logging the method call durations on the loader side (need to rebase and adapt P1155)
What "directory_entries_insert_algo" did you use for this?
It seems most of them are doing a lot of directory_add
Do you mean they spend a lot of time in directory_add, or that they call it a lot?
It's not surprising this endpoint is the longest to run, it has a lot of data to insert.
It was not easy to know if it's a lot of call or long running calls because it's regular sample and we don't have this granularity.
There is no more work schedule on this task so I change it's status to resolved.