[cassandra] directory and content read benchmarks
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vsellier
	Sep 15 2021, 11:11 AM

Description

Top issue around directory_ls and content retrieval performance issue

Will be used to track the tests of D6228, D6229 and potential others

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T2213 Storage
Migrated	gitlab-migration	T2214 Scale-out graph and database storage in production
Migrated	gitlab-migration	T1892 Cassandra as a storage backend
Migrated	gitlab-migration	T3357 Perform some tests of the cassandra storage on Grid5000
Migrated	gitlab-migration	T3573 [cassandra] directory and content read benchmarks

Event Timeline

vsellier changed the task status from Open to Work in Progress.Sep 15 2021, 11:11 AM

vsellier triaged this task as Normal priority.

vsellier created this task.

The directory_ls and indirectly the get_content performace was tested with this small script: P1163
A cold restart (all buffer cleared, cassandra restarted) is done between each tests (P1164)

This is the results of the different runs:

run a directory_ls on a big directory

postgresql (saam)

vsellier@saam ~ % python3 directory_ls.py c864e846cb339a94da9fd91ae12cabcf083a8685 
c864e846cb339a94da9fd91ae12cabcf083a8685: 8.9192s

cassandra

10 directory_ls was launch to test the impacts of having the data in cache:

run id	one-by-one[1]	concurrent [2]
1	233.6952s	184.1209s
2	102.1767s	89.1716s
3	41.1162s	33.0051s
4	25.4258s	21.7695s
5	18.2770s	19.0004s
6	16.8347s	14.6394s
7	15.4968s	13.4750s
8	14.1320s	12.1201s
9	13.3825s	10.6304s
10	13.1336s	11.1261s

[1] D6228 and D6229 reverted
[2] master of swh-storage https://archive.softwareheritage.org/swh:1:snp:dd585ae36b25c37fc9a4b5ab16fb4d0482a075a7;origin=https://forge.softwareheritage.org/source/swh-storage.git

The concurrent version is globally faster but the performances are not crazy compared to the postgresql version

run 2000 directory_ls with 50 threads in parallel

IN PROGRESS

time cat directory_2000.lst | parallel --bar -j50 python3 directory_ls.py {} | tee directory_ls_[concurrent|one-by-one]-2000.output

one-by-one:

real    11m34.692s
user    19m6.793s
sys     2m12.372s

The time is mostly the loading time of the longest repository:

bc0a1450e393f47cb34a6f1f1a6ee9206c739579: 670.7532s

concurrent :

real    9m4.044s
user    20m12.418s
sys     2m13.689s

bc0a1450e393f47cb34a6f1f1a6ee9206c739579: 520.5311s

2 flame graphs of the previous directory_ls:

one-by-one

first run (cache cold):

c864e846cb339a94da9fd91ae12cabcf083a8685-one-by-one-1.svg113 KBDownload

last run (cache hot):

c864e846cb339a94da9fd91ae12cabcf083a8685-one-by-one-10.svg212 KBDownload

concurrent:

first run (cache cold):

c864e846cb339a94da9fd91ae12cabcf083a8685-concurrent-1.svg166 KBDownload

last run (cache hot):

c864e846cb339a94da9fd91ae12cabcf083a8685-concurrent-10.svg200 KBDownload

Test of a the new D6269 patch:

directory_ls on c864e846cb339a94da9fd91ae12cabcf083a8685

run	duration
1	23.0080s
2	10.0942s
3	11.0577s
4	7.4534s
5	7.1187s
6	7.2937s
7	6.4423s
8	6.3384s
9	6.2031s
10	6.0875s

directory_ls on bc0a1450e393f47cb34a6f1f1a6ee9206c739579 (the biggest directory in the 2000 ids)

run	duration
1	67.8805s
2	38.4308s
3	33.2637s
4	30.7730s
5	30.1840s
6	27.8511s
7	28.2692s
8	27.8125s
9	27.4106s
10	27.6957s

and the associated flamegraphes:

first run:

bc0a1450e393f47cb34a6f1f1a6ee9206c739579-D6269-1.svg68 KBDownload

last run:

bc0a1450e393f47cb34a6f1f1a6ee9206c739579-D6269-10.svg61 KBDownload

2000 directory_ls with 50 thread in parallel:

real    1m40.011s
user    22m5.676s
sys     2m22.493s

bc0a1450e393f47cb34a6f1f1a6ee9206c739579: 70.3515s

much better than previously

Some flame graphs of storage was performed during the ingestion with 50 workers in //

1-2021-10-14T06:12:58Z.svg796 KBDownload

It seems most of them are doing a lot of directory_add

svgpp 1-2021-10-14T06:12:58Z.svg | grep storage.py                                                                                                                                   10:01:28
                directory_add (swh/storage/cassandra/storage.py:517) (112
                directory_add (swh/storage/cassandra/storage.py:517) (77
                directory_add (swh/storage/cassandra/storage.py:517) (88
                directory_add (swh/storage/cassandra/storage.py:517) (123
                content_add (swh/storage/cassandra/storage.py:285) (43 samples,
                directory_add (swh/storage/cassandra/storage.py:517) (78
                directory_add (swh/storage/cassandra/storage.py:517) (104
                directory_add (swh/storage/cassandra/storage.py:517) (49
                directory_add (swh/storage/cassandra/storage.py:517) (73
                directory_add (swh/storage/cassandra/storage.py:517) (76
                directory_add (swh/storage/cassandra/storage.py:517) (45
                directory_add (swh/storage/cassandra/storage.py:517) (69
                directory_add (swh/storage/cassandra/storage.py:517) (62
                directory_add (swh/storage/cassandra/storage.py:517) (93
                directory_add (swh/storage/cassandra/storage.py:517) (42
                directory_add (swh/storage/cassandra/storage.py:517) (104
                directory_add (swh/storage/cassandra/storage.py:517) (40
                directory_add (swh/storage/cassandra/storage.py:517) (42
                directory_add (swh/storage/cassandra/storage.py:517) (54
                directory_add (swh/storage/cassandra/storage.py:517) (73
                directory_add (swh/storage/cassandra/storage.py:517) (145
                directory_add (swh/storage/cassandra/storage.py:517) (68
                directory_add (swh/storage/cassandra/storage.py:517) (51
                directory_add (swh/storage/cassandra/storage.py:517) (52
                directory_add (swh/storage/cassandra/storage.py:517) (45
                directory_add (swh/storage/cassandra/storage.py:517) (67
                content_add (swh/storage/cassandra/storage.py:285) (48 samples,
                directory_add (swh/storage/cassandra/storage.py:517) (235
                directory_add (swh/storage/cassandra/storage.py:517) (67
                directory_add (swh/storage/cassandra/storage.py:517) (53
                directory_add (swh/storage/cassandra/storage.py:517) (210
                directory_add (swh/storage/cassandra/storage.py:517) (45
                directory_add (swh/storage/cassandra/storage.py:517) (129
                directory_add (swh/storage/cassandra/storage.py:517) (46
                directory_add (swh/storage/cassandra/storage.py:517) (41
                directory_add (swh/storage/cassandra/storage.py:517) (69
                directory_add (swh/storage/cassandra/storage.py:517) (70

It sounds it's the longest operation, but it looks strange there are no occurrences of the functions called by the loader filters. I don't know if it's due to the frequency of the py-spy captures (reduced due to the number of threads)

I will try another approach by logging the method call durations on the loader side (need to rebase and adapt P1155)

What "directory_entries_insert_algo" did you use for this?

It seems most of them are doing a lot of directory_add

Do you mean they spend a lot of time in directory_add, or that they call it a lot?

It's not surprising this endpoint is the longest to run, it has a lot of data to insert.

It was not easy to know if it's a lot of call or long running calls because it's regular sample and we don't have this granularity.

There is no more work schedule on this task so I change it's status to resolved.

This task has been migrated to GitLab.

[cassandra] directory and content read benchmarksClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

[cassandra] directory and content read benchmarks
Closed, MigratedEdits Locked
Actions

Related Objects
Search...