I just ran it on Azure. It has a different schema (the "revision" table with split into "revision" and "revision_parent") so the benchmarks are not exactly comparable.
I still use 16 workers, all running on the same machine, and with no compression

Jan 8 2020, 4:54 PM · Compressed graph service, Storage manager

Dec 20 2019

olasd changed the status of T2033: Run Cassandra storage backend with production data, a subtask of T1892: Cassandra as a storage backend, from Open to Work in Progress.

Dec 20 2019, 6:56 PM · meta-task, Storage manager

olasd changed the status of T2033: Run Cassandra storage backend with production data from Open to Work in Progress.

A cassandra cluster has been deployed on cassandra[01-06].euwest.azure.internal.softwareheritage.org.

Dec 20 2019, 6:56 PM · Storage manager

vlorentz changed the status of T1910: Redesign origin search using a dedicated component (swh-search) from Open to Work in Progress.

Dec 20 2019, 3:26 PM · Archive search, Storage manager

vlorentz changed the status of T1910: Redesign origin search using a dedicated component (swh-search), a subtask of T1892: Cassandra as a storage backend, from Open to Work in Progress.

Dec 20 2019, 3:26 PM · meta-task, Storage manager

Nov 27 2019

zack updated subscribers of T1926: FUSE filesystem to navigate the archive.

proposed CLI interface:

swh [ -C config.yml ] graph mount PID DIR

will mount the content of the given PID to the given local DIR.

Nov 27 2019, 3:59 PM · Software Heritage filesystem

Nov 25 2019

olasd closed T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs as Resolved by committing rDSTO2cac3392eb49: Implement origin lookup by sha1.

Nov 25 2019, 3:30 PM · Compressed graph service, Storage manager

vlorentz added a revision to T1912: Support origin pagination without origin ids: D2343: Add Storage.content_get_partition endpoint, to replace content_get_range..

Nov 25 2019, 11:49 AM · Web app, Storage manager

Nov 22 2019

olasd added a revision to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs: D2346: Implement origin lookup by sha1.

Nov 22 2019, 6:29 PM · Compressed graph service, Storage manager

olasd added a comment to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs.

Launched on somerset:

Nov 22 2019, 5:40 PM · Compressed graph service, Storage manager

olasd added a comment to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs.

We should consider just adding a btree index on sha1(url) and see where that takes us.

Nov 22 2019, 5:37 PM · Compressed graph service, Storage manager

vlorentz added a comment to T2053: support graph export for the cassandra backend.

Forgot to mention it here, but it's done now.

Nov 22 2019, 10:55 AM · Compressed graph service, Storage manager

Nov 21 2019

vlorentz added a revision to T1912: Support origin pagination without origin ids: D2324: Add endpoint 'origin_list', that will replace 'origin_get_range'..

Nov 21 2019, 3:15 PM · Web app, Storage manager

Nov 18 2019

zack raised the priority of T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs from Normal to High.

Nov 18 2019, 5:50 PM · Compressed graph service, Storage manager

zack added a project to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs: Compressed graph service.

Nov 18 2019, 5:50 PM · Compressed graph service, Storage manager

Nov 14 2019

vlorentz reopened T2034: Unbreak journal clients, a subtask of T2033: Run Cassandra storage backend with production data, as Open.

Nov 14 2019, 3:41 PM · Storage manager

ardumont closed D2272: swh.storage.schemata: Drop schemata from storage.

Nov 14 2019, 1:31 PM · Storage manager, Lister

olasd accepted D2272: swh.storage.schemata: Drop schemata from storage.

Nov 14 2019, 12:12 PM · Storage manager, Lister

ardumont added a comment to T2076: Add tests for SQL migrations.

+1 on this

Nov 14 2019, 11:09 AM · Storage manager

swh-public-ci added a comment to D2272: swh.storage.schemata: Drop schemata from storage.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/778/ for more details.

Nov 14 2019, 10:40 AM · Storage manager, Lister

ardumont updated the summary of D2272: swh.storage.schemata: Drop schemata from storage.

Nov 14 2019, 10:39 AM · Storage manager, Lister

Nov 12 2019

olasd added a comment to T2076: Add tests for SQL migrations.

https://github.com/omniti-labs/pg_extractor is a decent way to extract the schema from a running database into something that's easily diffable

Nov 12 2019, 7:02 PM · Storage manager

Nov 11 2019

vlorentz added a revision to T2019: race condition during concurrent loading of the same objects from multiple origins: D2248: [WIP] In case of race condition in content_add, raise SerializationFailure instead of HashCollision..

Nov 11 2019, 12:08 AM · Storage manager

Nov 8 2019

vlorentz triaged T2076: Add tests for SQL migrations as Normal priority.

Nov 8 2019, 5:32 PM · Storage manager

vlorentz added a subtask for T2075: Implement metadata authority specification: T1737: Define and specify metadata providers.

Nov 8 2019, 5:20 PM · Storage manager, Metadata workflow

vlorentz triaged T2075: Implement metadata authority specification as Normal priority.

Nov 8 2019, 5:20 PM · Storage manager, Metadata workflow

vlorentz triaged T2074: Publish extrinsic metadata to swh-journal/Kafka as Normal priority.

Nov 8 2019, 5:18 PM · Storage manager, Journal, Metadata workflow

vlorentz claimed T2053: support graph export for the cassandra backend.

Nov 8 2019, 11:52 AM · Compressed graph service, Storage manager

Nov 7 2019

vlorentz added a comment to T2053: support graph export for the cassandra backend.

Probably not. I'm working on adding support for other objects.

Nov 7 2019, 5:24 PM · Compressed graph service, Storage manager

zack added a comment to T2053: support graph export for the cassandra backend.

In T2053#38352, @vlorentz wrote:

Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.

Nov 7 2019, 5:18 PM · Compressed graph service, Storage manager

vlorentz added a comment to T2053: support graph export for the cassandra backend.

Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.

Nov 7 2019, 4:47 PM · Compressed graph service, Storage manager

vlorentz added a comment to T2053: support graph export for the cassandra backend.

Throughput improved to 34k/s just by not querying unneeded fields.

Nov 7 2019, 3:29 PM · Compressed graph service, Storage manager

Nov 5 2019

zack added a comment to T2053: support graph export for the cassandra backend.

Looks good, thanks !

Nov 5 2019, 2:05 PM · Compressed graph service, Storage manager

zack updated the task description for T2053: support graph export for the cassandra backend.

Nov 5 2019, 2:00 PM · Compressed graph service, Storage manager

olasd closed T1891: Make 'type' an attribute of origin visits, not origins as Resolved.

This has now been deployed in all components as well as in production databases. Bye bye origin['type'].

Nov 5 2019, 1:18 PM · Web app, Storage manager

Nov 4 2019

vlorentz added a comment to T2053: support graph export for the cassandra backend.

I wrote a prototype for exporting revisions: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/cassandra_stream_graph.py

Nov 4 2019, 1:53 PM · Compressed graph service, Storage manager

olasd closed T2052: Publish swh-search on PyPI, a subtask of T1910: Redesign origin search using a dedicated component (swh-search), as Resolved.

Nov 4 2019, 12:35 PM · Archive search, Storage manager

vlorentz closed T2034: Unbreak journal clients, a subtask of T2033: Run Cassandra storage backend with production data, as Resolved.

Nov 4 2019, 12:22 PM · Storage manager

Oct 31 2019

zack triaged T2053: support graph export for the cassandra backend as Normal priority.

Oct 31 2019, 2:09 PM · Compressed graph service, Storage manager

Oct 28 2019

vlorentz updated the task description for T1912: Support origin pagination without origin ids.

Oct 28 2019, 12:01 PM · Web app, Storage manager

vlorentz added a project to T1910: Redesign origin search using a dedicated component (swh-search): Archive search.

Oct 28 2019, 11:46 AM · Archive search, Storage manager

Oct 24 2019

anlambert updated the task description for T1891: Make 'type' an attribute of origin visits, not origins.

Oct 24 2019, 2:16 AM · Web app, Storage manager

Oct 22 2019

anlambert added projects to T1891: Make 'type' an attribute of origin visits, not origins: Storage manager, Web app.

Oct 22 2019, 3:58 PM · Web app, Storage manager

Oct 19 2019

zack triaged T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs as Normal priority.

Oct 19 2019, 2:45 PM · Compressed graph service, Storage manager

Oct 18 2019

olasd closed T49: DB schema: add missing unicity constraint on origin (type, url) as Resolved.

The missing unique constraint has now been added.

Oct 18 2019, 4:36 PM · Restricted Project, Storage manager

Oct 11 2019

vlorentz updated the task description for T1912: Support origin pagination without origin ids.

Oct 11 2019, 2:53 PM · Web app, Storage manager

Oct 10 2019

vlorentz triaged T2034: Unbreak journal clients as High priority.

Oct 10 2019, 12:06 PM · Journal

vlorentz triaged T2033: Run Cassandra storage backend with production data as Low priority.

Oct 10 2019, 12:01 PM · Storage manager

vlorentz added a project to T1912: Support origin pagination without origin ids: Web app.

Oct 10 2019, 11:59 AM · Web app, Storage manager

vlorentz updated the task description for T1892: Cassandra as a storage backend.

Oct 10 2019, 11:59 AM · meta-task, Storage manager

vlorentz updated the task description for T1892: Cassandra as a storage backend.

Oct 10 2019, 11:59 AM · meta-task, Storage manager

Oct 8 2019

swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/681/ for more details.

Oct 8 2019, 4:46 PM · Storage manager

ardumont added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

This revision was not accepted when it landed; it landed in state Needs Review.

Oct 8 2019, 4:44 PM · Storage manager

ardumont closed D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Oct 8 2019, 4:42 PM · Storage manager

ardumont closed D2084: swh.storage.filter: Add filtering storage implementation.

Oct 8 2019, 4:42 PM · Storage manager

ardumont closed D2083: swh.storage: Test get_storage implementation.

Oct 8 2019, 4:42 PM · Storage manager

ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Adapt according to last missing review points

Oct 8 2019, 4:41 PM · Storage manager

ardumont added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

You missed this comment:

Oct 8 2019, 4:36 PM · Storage manager

vlorentz accepted D2083: swh.storage: Test get_storage implementation.

Oct 8 2019, 4:32 PM · Storage manager

vlorentz requested changes to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

You missed this comment:

Oct 8 2019, 4:31 PM · Storage manager

swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/680/ for more details.

Oct 8 2019, 4:29 PM · Storage manager

ardumont updated the summary of D2083: swh.storage: Test get_storage implementation.

Oct 8 2019, 4:25 PM · Storage manager

swh-public-ci added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/679/ for more details.

Oct 8 2019, 4:24 PM · Storage manager

ardumont updated the summary of D2083: swh.storage: Test get_storage implementation.

Oct 8 2019, 4:24 PM · Storage manager

ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Rebase on latest development

Oct 8 2019, 4:17 PM · Storage manager

ardumont updated the diff for D2084: swh.storage.filter: Add filtering storage implementation.

Rebase to latest diff

Oct 8 2019, 4:16 PM · Storage manager

swh-public-ci added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/676/ for more details.

Oct 8 2019, 4:10 PM · Storage manager

swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/675/ for more details.

Oct 8 2019, 4:05 PM · Storage manager

ardumont updated the diff for D2084: swh.storage.filter: Add filtering storage implementation.

Fix commit message

Oct 8 2019, 4:01 PM · Storage manager

ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Fix ci mypy which is not happy, local mypy was ok

Oct 8 2019, 4:00 PM · Storage manager

ardumont added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

BUILD has failed

Oct 8 2019, 3:57 PM · Storage manager

Harbormaster failed remote builds in B8179: Diff 7009 for D2085: swh.storage.buffer: Add buffering proxy storage implementation!

Oct 8 2019, 3:55 PM · Storage manager

swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build has FAILED

Oct 8 2019, 3:55 PM · Storage manager

ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Fix code-block syntax
Add some more assertions on storage writing or not

Oct 8 2019, 3:54 PM · Storage manager

swh-public-ci added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/673/ for more details.

Oct 8 2019, 3:40 PM · Storage manager

ardumont updated the diff for D2084: swh.storage.filter: Add filtering storage implementation.

Adapt according to review and mypy check

Oct 8 2019, 3:35 PM · Storage manager

ardumont added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

In D2084#48380, @vlorentz wrote:

Accepting, because @ardumont promised to fix it on IRC ;)

Oct 8 2019, 3:13 PM · Storage manager

vlorentz accepted D2084: swh.storage.filter: Add filtering storage implementation.

Accepting, because @ardumont promised to fix it on IRC ;)

Oct 8 2019, 3:13 PM · Storage manager

vlorentz added inline comments to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Oct 8 2019, 3:10 PM · Storage manager

vlorentz requested changes to D2084: swh.storage.filter: Add filtering storage implementation.

Actually, I still have some nitpicking to do: instead of having sample_data as a large fixture returning a dict of various value types; could you split it into smaller fixtures, each returning a single type of values?

Oct 8 2019, 3:07 PM · Storage manager

Advanced SearchUse ResultsEdit QueryHide Query

Jan 22 2020

Jan 21 2020

Jan 15 2020

Jan 14 2020

Jan 8 2020

Dec 20 2019

Nov 27 2019

Nov 25 2019

Nov 22 2019

Nov 21 2019

Nov 18 2019

Nov 14 2019

Nov 12 2019

Nov 11 2019

Nov 8 2019

Nov 7 2019

Nov 5 2019

Nov 4 2019

Oct 31 2019

Oct 28 2019

Oct 24 2019

Oct 22 2019

Oct 19 2019

Oct 18 2019

Oct 11 2019

Oct 10 2019

Oct 8 2019

Advanced Search
Use Results
Edit Query
Hide Query