Page MenuHomeSoftware Heritage

Run experiments against the MongoDB backend
Closed, MigratedEdits Locked

Description

Collect data from initial version.

  • with _id only index
  • with sha1 index

Event Timeline

jayeshv triaged this task as Normal priority.Sep 6 2021, 10:20 AM
jayeshv created this task.

First test 07/09/2021
Against the simple/experimental data model version. (https://forge.softwareheritage.org/rDPROV3e009a2f77de1d4d00eb52f838537c7af327f010)
Computer: Dell 7400

mongo: 1k Without index

python client.py -n 2 -C mongo/config.yml

(provenance) ➜  revisions git:(master) ✗ python server.py sample_1k.csv 
INFO:__main__:Reading revisions from sample_1k.csv
INFO:__main__:0             0.0 rev/s
INFO:__main__:DONE sending 1000 revisions
INFO:__main__:134          26.8 rev/s
INFO:__main__:220          16.7 rev/s
INFO:__main__:290          13.9 rev/s
INFO:__main__:361          13.6 rev/s
INFO:__main__:434          14.6 rev/s
INFO:__main__:443           0.4 rev/s
INFO:__main__:445           0.2 rev/s
INFO:__main__:479           6.7 rev/s
INFO:__main__:543          12.6 rev/s
INFO:__main__:589           9.1 rev/s
INFO:__main__:629           7.9 rev/s
INFO:__main__:667           7.6 rev/s
INFO:__main__:706           7.0 rev/s
INFO:__main__:733           5.3 rev/s
INFO:__main__:785          10.2 rev/s
INFO:__main__:824           7.4 rev/s
INFO:__main__:843           3.1 rev/s
INFO:__main__:859           2.8 rev/s
INFO:__main__:867           1.0 rev/s
INFO:__main__:873           1.1 rev/s
INFO:__main__:895           4.4 rev/s
INFO:__main__:951          11.0 rev/s
INFO:__main__:991           7.9 rev/s
INFO:__main__:Received confirmation for all 1000 revisions in 167.32s (5.98)

stats:

"db" : "testdb",
"collections" : 3,
"views" : 0,
"objects" : 15416,
"avgObjSize" : 484.503762324857,
"dataSize" : 7469110.0,
"storageSize" : 3407872.0,
"freeStorageSize" : 1474560.0,
"indexes" : 3,
"indexSize" : 434176.0,
"indexFreeStorageSize" : 184320.0,
"totalSize" : 3842048.0,
"totalFreeStorageSize" : 1658880.0,
"scaleFactor" : 1.0,
"fsUsedSize" : 93724790784.0,
"fsTotalSize" : 982221160448.0,
"ok" : 1.0

Mongo: 1k with sha1 index on content, revision and directory

python client.py -n 2 -C mongo/config.yml

(provenance) ➜  revisions git:(master) ✗ python server.py sample_1k.csv 
INFO:__main__:Reading revisions from sample_1k.csv
INFO:__main__:DONE sending 1000 revisions
INFO:__main__:31            6.2 rev/s
INFO:__main__:167          24.1 rev/s
INFO:__main__:221          10.7 rev/s
INFO:__main__:292          14.1 rev/s
INFO:__main__:385          18.6 rev/s
INFO:__main__:443           3.2 rev/s
INFO:__main__:500          11.4 rev/s
INFO:__main__:661          32.2 rev/s
INFO:__main__:805          28.6 rev/s
INFO:__main__:867          11.9 rev/s
INFO:__main__:Received confirmation for all 1000 revisions in 68.68s (14.56)

stats:

"db" : "testdb",
"collections" : 3,
"views" : 0,
"objects" : 13822,
"avgObjSize" : 557.92598755607,
"dataSize" : 7711653.0,
"storageSize" : 2314240.0,
"freeStorageSize" : 475136.0,
"indexes" : 6,
"indexSize" : 1183744.0,
"indexFreeStorageSize" : 548864.0,
"totalSize" : 3497984.0,
"totalFreeStorageSize" : 1024000.0,
"scaleFactor" : 1.0,
"fsUsedSize" : 93728763904.0,
"fsTotalSize" : 982221160448.0,
"ok" : 1.0

Mongo: 10k with sha1 index on content, revision and directory

python client.py -n 4 -C mongo/config.yml

....
INFO:__main__:9436          2.7 rev/s
INFO:__main__:9447          1.9 rev/s
INFO:__main__:9479          6.4 rev/s
INFO:__main__:9521          8.3 rev/s
INFO:__main__:9568          9.2 rev/s
INFO:__main__:9613          8.9 rev/s
INFO:__main__:9644          5.9 rev/s
INFO:__main__:9653          1.7 rev/s
INFO:__main__:9665          2.0 rev/s
INFO:__main__:9714          9.7 rev/s
INFO:__main__:9777         12.6 rev/s
INFO:__main__:9830         10.5 rev/s
INFO:__main__:9888         11.5 rev/s
INFO:__main__:9957         13.8 rev/s
INFO:__main__:9989          6.1 rev/s
INFO:__main__:Received confirmation for all 10000 revisions in 851.00s (11.75)

stats:

"db" : "testdb",
"collections" : 3,
"views" : 0,
"objects" : 855182,
"avgObjSize" : 250.810574825008,
"dataSize" : 214488689.0,
"storageSize" : 75210752.0,
"freeStorageSize" : 18665472.0,
"indexes" : 6,
"indexSize" : 59744256.0,
"indexFreeStorageSize" : 24248320.0,
"totalSize" : 134955008.0,
"totalFreeStorageSize" : 42913792.0,
"scaleFactor" : 1.0,
"fsUsedSize" : 93863395328.0,
"fsTotalSize" : 982221160448.0,
"ok" : 1.0

Postgres: 10k with sha1 index on content, revision and directory

python client.py -n 4 -C config.yml

...
INFO:__main__:8414         10.6 rev/s
INFO:__main__:8528         22.6 rev/s
INFO:__main__:8604         15.1 rev/s
INFO:__main__:8739         26.8 rev/s
INFO:__main__:8914         34.9 rev/s
INFO:__main__:9046         26.3 rev/s
INFO:__main__:9199         30.3 rev/s
INFO:__main__:9256         10.8 rev/s
INFO:__main__:9289          6.5 rev/s
INFO:__main__:9340          9.7 rev/s
INFO:__main__:9383          8.4 rev/s
INFO:__main__:9415          5.9 rev/s
INFO:__main__:9429          2.5 rev/s
INFO:__main__:9475          7.7 rev/s
INFO:__main__:9510          7.0 rev/s
INFO:__main__:9588         15.6 rev/s
INFO:__main__:9680         18.2 rev/s
INFO:__main__:9770         18.0 rev/s
INFO:__main__:9899         25.1 rev/s
INFO:__main__:Received confirmation for all 10000 revisions in 550.84s (18.15)

stats:

size: 513 MB

Postgres: 10k with sha1 index (Embedded arrays without bulk write)
python client.py -n 4 -C config.yml

In desktop6
confirmation for all 10000 revisions in 770.72s (12.97)

db.content.countDocuments({})
824488

db.directory.countDocuments({})
11190

db stats

"db" : "testdb",
"collections" : 4,
"views" : 0,
"objects" : 845638,
"avgObjSize" : 252.6618635870195,
"dataSize" : 213660473,
"storageSize" : 71585792,
"freeStorageSize" : 15814656,
"indexes" : 8,
"indexSize" : 59584512,
"indexFreeStorageSize" : 24367104,
"totalSize" : 131170304,
"totalFreeStorageSize" : 40181760,
"scaleFactor" : 1,
"fsUsedSize" : 1198588350464,
"fsTotalSize" : 3936911937536,
"ok" : 1

Postgres: 10k with sha1 index (Embedded arrays with bulk write on content and directory)
python client.py -n 4 -C config.yml

In desktop6
confirmation for all 10000 revisions in 712.77s (14.03)

db.content.countDocuments({})
818750

db.directory.countDocuments({})
11090

db stats

"db" : "testdb",
"collections" : 4,
"views" : 0,
"objects" : 839840,
"avgObjSize" : 254.44709706610783,
"dataSize" : 213694850,
"storageSize" : 73601024,
"freeStorageSize" : 19116032,
"indexes" : 8,
"indexSize" : 60559360,
"indexFreeStorageSize" : 24989696,
"totalSize" : 134160384,
"totalFreeStorageSize" : 44105728,
"scaleFactor" : 1,
"fsUsedSize" : 1198591688704,
"fsTotalSize" : 3936911937536,
"ok" : 1

2M with unique sha1 index (Embedded arrays with bulk write on content and directory)
(Exited after 542,000 of memory errors)

python client.py -n 8 -C config.yml

In desktop6

db.revision.countDocuments({})
542774

db.content.countDocuments({})
1097460

db.stats()

{
        "db" : "testdb1",
        "collections" : 4,
        "views" : 0,
        "objects" : 1806579,
        "avgObjSize" : 5303.092894360003,
        "dataSize" : 9580456258,
        "storageSize" : 2703589376,
        "freeStorageSize" : 588664832,
        "indexes" : 8,
        "indexSize" : 159612928,
        "indexFreeStorageSize" : 49549312,
        "totalSize" : 2863202304,
        "totalFreeStorageSize" : 638214144,
        "scaleFactor" : 1,
        "fsUsedSize" : 1203805884416,
        "fsTotalSize" : 3936911937536,
        "ok" : 1
}