Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 22 2020

vlorentz renamed T1892: Cassandra as a storage backend from Cassandra as a storage backend (meta-task) to Cassandra as a storage backend.
Jan 22 2020, 4:23 PM · meta-task, Storage manager
vlorentz updated the task description for T2243: Add Debian package python3-cassandra.
Jan 22 2020, 3:22 PM · Storage manager
vlorentz triaged T2243: Add Debian package python3-cassandra as Normal priority.
Jan 22 2020, 3:16 PM · Storage manager
vlorentz closed T1731: Intrinsic identifiers for origins as Resolved.

in addition we will also need to modify storage to store and allow retrieval of hashed origin URLs.

Jan 22 2020, 2:18 PM · Storage manager, Data Model
vlorentz closed T1731: Intrinsic identifiers for origins, a subtask of T1892: Cassandra as a storage backend, as Resolved.
Jan 22 2020, 2:18 PM · meta-task, Storage manager

Jan 21 2020

ardumont updated the task description for T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed.
Jan 21 2020, 11:37 AM · Storage manager
ardumont triaged T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed as Normal priority.
Jan 21 2020, 11:36 AM · Storage manager
ardumont added a project to T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed: Storage manager.
Jan 21 2020, 11:36 AM · Storage manager

Jan 15 2020

vlorentz added a parent task for T2183: Switch webapp0 to use swh-search instead of postgresql search.: T2185: Make webapp0 use Cassandra as storage backend..
Jan 15 2020, 3:08 PM · Archive search, Storage manager
vlorentz added a subtask for T2185: Make webapp0 use Cassandra as storage backend.: T2183: Switch webapp0 to use swh-search instead of postgresql search..
Jan 15 2020, 3:08 PM · Storage manager
vlorentz triaged T2186: Merge swh-storage-cassandra in swh-storage master as Normal priority.
Jan 15 2020, 3:06 PM · Storage manager
vlorentz updated the task description for T2185: Make webapp0 use Cassandra as storage backend..
Jan 15 2020, 3:05 PM · Storage manager
vlorentz triaged T2185: Make webapp0 use Cassandra as storage backend. as Normal priority.
Jan 15 2020, 3:05 PM · Storage manager
vlorentz changed the status of T2184: Replay origins to ElasticSearch's "origin" index from Open to Work in Progress.
Jan 15 2020, 3:02 PM · Archive search, Storage manager
vlorentz changed the status of T2184: Replay origins to ElasticSearch's "origin" index, a subtask of T2183: Switch webapp0 to use swh-search instead of postgresql search., from Open to Work in Progress.
Jan 15 2020, 3:02 PM · Archive search, Storage manager
vlorentz triaged T2184: Replay origins to ElasticSearch's "origin" index as Normal priority.
Jan 15 2020, 3:02 PM · Archive search, Storage manager
vlorentz added a subtask for T2183: Switch webapp0 to use swh-search instead of postgresql search.: T2167: Deploy swh-search.
Jan 15 2020, 3:01 PM · Archive search, Storage manager
vlorentz triaged T2183: Switch webapp0 to use swh-search instead of postgresql search. as Normal priority.
Jan 15 2020, 3:01 PM · Archive search, Storage manager
vlorentz triaged T2182: Switch production swh-web to use swh-search instead of postgresql search. as Normal priority.
Jan 15 2020, 3:01 PM · System administration, Archive search, Storage manager

Jan 14 2020

vlorentz removed a subtask for T2033: Run Cassandra storage backend with production data: T2034: Unbreak journal clients.
Jan 14 2020, 3:10 PM · Storage manager

Jan 8 2020

vlorentz added a comment to T2053: support graph export for the cassandra backend.

I just ran it on Azure. It has a different schema (the "revision" table with split into "revision" and "revision_parent") so the benchmarks are not exactly comparable.
I still use 16 workers, all running on the same machine, and with no compression

Jan 8 2020, 4:54 PM · Compressed graph service, Storage manager

Dec 20 2019

olasd changed the status of T2033: Run Cassandra storage backend with production data, a subtask of T1892: Cassandra as a storage backend, from Open to Work in Progress.
Dec 20 2019, 6:56 PM · meta-task, Storage manager
olasd changed the status of T2033: Run Cassandra storage backend with production data from Open to Work in Progress.

A cassandra cluster has been deployed on cassandra[01-06].euwest.azure.internal.softwareheritage.org.

Dec 20 2019, 6:56 PM · Storage manager
vlorentz changed the status of T1910: Redesign origin search using a dedicated component (swh-search) from Open to Work in Progress.
Dec 20 2019, 3:26 PM · Archive search, Storage manager
vlorentz changed the status of T1910: Redesign origin search using a dedicated component (swh-search), a subtask of T1892: Cassandra as a storage backend, from Open to Work in Progress.
Dec 20 2019, 3:26 PM · meta-task, Storage manager

Nov 27 2019

zack updated subscribers of T1926: FUSE filesystem to navigate the archive.

proposed CLI interface:

swh [ -C config.yml ] graph mount PID DIR

will mount the content of the given PID to the given local DIR.

Nov 27 2019, 3:59 PM · Software Heritage filesystem

Nov 25 2019

olasd closed T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs as Resolved by committing rDSTO2cac3392eb49: Implement origin lookup by sha1.
Nov 25 2019, 3:30 PM · Compressed graph service, Storage manager
vlorentz added a revision to T1912: Support origin pagination without origin ids: D2343: Add Storage.content_get_partition endpoint, to replace content_get_range..
Nov 25 2019, 11:49 AM · Web app, Storage manager

Nov 22 2019

olasd added a revision to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs: D2346: Implement origin lookup by sha1.
Nov 22 2019, 6:29 PM · Compressed graph service, Storage manager
olasd added a comment to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs.

Launched on somerset:

Nov 22 2019, 5:40 PM · Compressed graph service, Storage manager
olasd added a comment to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs.

We should consider just adding a btree index on sha1(url) and see where that takes us.

Nov 22 2019, 5:37 PM · Compressed graph service, Storage manager
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Forgot to mention it here, but it's done now.

Nov 22 2019, 10:55 AM · Compressed graph service, Storage manager

Nov 21 2019

vlorentz added a revision to T1912: Support origin pagination without origin ids: D2324: Add endpoint 'origin_list', that will replace 'origin_get_range'..
Nov 21 2019, 3:15 PM · Web app, Storage manager

Nov 18 2019

zack raised the priority of T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs from Normal to High.
Nov 18 2019, 5:50 PM · Compressed graph service, Storage manager
zack added a project to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs: Compressed graph service.
Nov 18 2019, 5:50 PM · Compressed graph service, Storage manager

Nov 14 2019

vlorentz reopened T2034: Unbreak journal clients, a subtask of T2033: Run Cassandra storage backend with production data, as Open.
Nov 14 2019, 3:41 PM · Storage manager
ardumont closed D2272: swh.storage.schemata: Drop schemata from storage.
Nov 14 2019, 1:31 PM · Storage manager, Lister
olasd accepted D2272: swh.storage.schemata: Drop schemata from storage.
Nov 14 2019, 12:12 PM · Storage manager, Lister
ardumont added a comment to T2076: Add tests for SQL migrations.

+1 on this

Nov 14 2019, 11:09 AM · Storage manager
swh-public-ci added a comment to D2272: swh.storage.schemata: Drop schemata from storage.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/778/ for more details.

Nov 14 2019, 10:40 AM · Storage manager, Lister
ardumont updated the summary of D2272: swh.storage.schemata: Drop schemata from storage.
Nov 14 2019, 10:39 AM · Storage manager, Lister

Nov 12 2019

olasd added a comment to T2076: Add tests for SQL migrations.

https://github.com/omniti-labs/pg_extractor is a decent way to extract the schema from a running database into something that's easily diffable

Nov 12 2019, 7:02 PM · Storage manager

Nov 11 2019

vlorentz added a revision to T2019: race condition during concurrent loading of the same objects from multiple origins: D2248: [WIP] In case of race condition in content_add, raise SerializationFailure instead of HashCollision..
Nov 11 2019, 12:08 AM · Storage manager

Nov 8 2019

vlorentz triaged T2076: Add tests for SQL migrations as Normal priority.
Nov 8 2019, 5:32 PM · Storage manager
vlorentz added a subtask for T2075: Implement metadata authority specification: T1737: Define and specify metadata providers.
Nov 8 2019, 5:20 PM · Storage manager, Metadata workflow
vlorentz triaged T2075: Implement metadata authority specification as Normal priority.
Nov 8 2019, 5:20 PM · Storage manager, Metadata workflow
vlorentz triaged T2074: Publish extrinsic metadata to swh-journal/Kafka as Normal priority.
Nov 8 2019, 5:18 PM · Storage manager, Journal, Metadata workflow
vlorentz claimed T2053: support graph export for the cassandra backend.
Nov 8 2019, 11:52 AM · Compressed graph service, Storage manager

Nov 7 2019

vlorentz added a comment to T2053: support graph export for the cassandra backend.

Probably not. I'm working on adding support for other objects.

Nov 7 2019, 5:24 PM · Compressed graph service, Storage manager
zack added a comment to T2053: support graph export for the cassandra backend.

Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.

Nov 7 2019, 5:18 PM · Compressed graph service, Storage manager
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.

Nov 7 2019, 4:47 PM · Compressed graph service, Storage manager
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Throughput improved to 34k/s just by not querying unneeded fields.

Nov 7 2019, 3:29 PM · Compressed graph service, Storage manager

Nov 5 2019

zack added a comment to T2053: support graph export for the cassandra backend.

Looks good, thanks !

Nov 5 2019, 2:05 PM · Compressed graph service, Storage manager
zack updated the task description for T2053: support graph export for the cassandra backend.
Nov 5 2019, 2:00 PM · Compressed graph service, Storage manager
olasd closed T1891: Make 'type' an attribute of origin visits, not origins as Resolved.

This has now been deployed in all components as well as in production databases. Bye bye origin['type'].

Nov 5 2019, 1:18 PM · Web app, Storage manager

Nov 4 2019

vlorentz added a comment to T2053: support graph export for the cassandra backend.

I wrote a prototype for exporting revisions: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/cassandra_stream_graph.py

Nov 4 2019, 1:53 PM · Compressed graph service, Storage manager
olasd closed T2052: Publish swh-search on PyPI, a subtask of T1910: Redesign origin search using a dedicated component (swh-search), as Resolved.
Nov 4 2019, 12:35 PM · Archive search, Storage manager
vlorentz closed T2034: Unbreak journal clients, a subtask of T2033: Run Cassandra storage backend with production data, as Resolved.
Nov 4 2019, 12:22 PM · Storage manager

Oct 31 2019

zack triaged T2053: support graph export for the cassandra backend as Normal priority.
Oct 31 2019, 2:09 PM · Compressed graph service, Storage manager

Oct 28 2019

vlorentz updated the task description for T1912: Support origin pagination without origin ids.
Oct 28 2019, 12:01 PM · Web app, Storage manager
vlorentz added a project to T1910: Redesign origin search using a dedicated component (swh-search): Archive search.
Oct 28 2019, 11:46 AM · Archive search, Storage manager

Oct 24 2019

anlambert updated the task description for T1891: Make 'type' an attribute of origin visits, not origins.
Oct 24 2019, 2:16 AM · Web app, Storage manager

Oct 22 2019

anlambert added projects to T1891: Make 'type' an attribute of origin visits, not origins: Storage manager, Web app.
Oct 22 2019, 3:58 PM · Web app, Storage manager

Oct 19 2019

zack triaged T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs as Normal priority.
Oct 19 2019, 2:45 PM · Compressed graph service, Storage manager

Oct 18 2019

olasd closed T49: DB schema: add missing unicity constraint on origin (type, url) as Resolved.

The missing unique constraint has now been added.

Oct 18 2019, 4:36 PM · Restricted Project, Storage manager

Oct 11 2019

vlorentz updated the task description for T1912: Support origin pagination without origin ids.
Oct 11 2019, 2:53 PM · Web app, Storage manager

Oct 10 2019

vlorentz triaged T2034: Unbreak journal clients as High priority.
Oct 10 2019, 12:06 PM · Journal
vlorentz triaged T2033: Run Cassandra storage backend with production data as Low priority.
Oct 10 2019, 12:01 PM · Storage manager
vlorentz added a project to T1912: Support origin pagination without origin ids: Web app.
Oct 10 2019, 11:59 AM · Web app, Storage manager
vlorentz updated the task description for T1892: Cassandra as a storage backend.
Oct 10 2019, 11:59 AM · meta-task, Storage manager
vlorentz updated the task description for T1892: Cassandra as a storage backend.
Oct 10 2019, 11:59 AM · meta-task, Storage manager

Oct 8 2019

swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/681/ for more details.

Oct 8 2019, 4:46 PM · Storage manager
ardumont added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

This revision was not accepted when it landed; it landed in state Needs Review.

Oct 8 2019, 4:44 PM · Storage manager
ardumont closed D2085: swh.storage.buffer: Add buffering proxy storage implementation.
Oct 8 2019, 4:42 PM · Storage manager
ardumont closed D2084: swh.storage.filter: Add filtering storage implementation.
Oct 8 2019, 4:42 PM · Storage manager
ardumont closed D2083: swh.storage: Test get_storage implementation.
Oct 8 2019, 4:42 PM · Storage manager
ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Adapt according to last missing review points

Oct 8 2019, 4:41 PM · Storage manager
ardumont added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

You missed this comment:

Oct 8 2019, 4:36 PM · Storage manager
vlorentz accepted D2083: swh.storage: Test get_storage implementation.
Oct 8 2019, 4:32 PM · Storage manager
vlorentz requested changes to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

You missed this comment:

Oct 8 2019, 4:31 PM · Storage manager
swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/680/ for more details.

Oct 8 2019, 4:29 PM · Storage manager
ardumont updated the summary of D2083: swh.storage: Test get_storage implementation.
Oct 8 2019, 4:25 PM · Storage manager
swh-public-ci added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/679/ for more details.

Oct 8 2019, 4:24 PM · Storage manager
ardumont updated the summary of D2083: swh.storage: Test get_storage implementation.
Oct 8 2019, 4:24 PM · Storage manager
ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Rebase on latest development

Oct 8 2019, 4:17 PM · Storage manager
ardumont updated the diff for D2084: swh.storage.filter: Add filtering storage implementation.

Rebase to latest diff

Oct 8 2019, 4:16 PM · Storage manager
swh-public-ci added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/676/ for more details.

Oct 8 2019, 4:10 PM · Storage manager
swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/675/ for more details.

Oct 8 2019, 4:05 PM · Storage manager
ardumont updated the diff for D2084: swh.storage.filter: Add filtering storage implementation.

Fix commit message

Oct 8 2019, 4:01 PM · Storage manager
ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Fix ci mypy which is not happy, local mypy was ok

Oct 8 2019, 4:00 PM · Storage manager
ardumont added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

BUILD has failed

Oct 8 2019, 3:57 PM · Storage manager
Harbormaster failed remote builds in B8179: Diff 7009 for D2085: swh.storage.buffer: Add buffering proxy storage implementation!
Oct 8 2019, 3:55 PM · Storage manager
swh-public-ci added a comment to D2085: swh.storage.buffer: Add buffering proxy storage implementation.

Build has FAILED

Oct 8 2019, 3:55 PM · Storage manager
ardumont updated the diff for D2085: swh.storage.buffer: Add buffering proxy storage implementation.
  • Fix code-block syntax
  • Add some more assertions on storage writing or not
Oct 8 2019, 3:54 PM · Storage manager
swh-public-ci added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/673/ for more details.

Oct 8 2019, 3:40 PM · Storage manager
ardumont updated the diff for D2084: swh.storage.filter: Add filtering storage implementation.

Adapt according to review and mypy check

Oct 8 2019, 3:35 PM · Storage manager
ardumont added a comment to D2084: swh.storage.filter: Add filtering storage implementation.

Accepting, because @ardumont promised to fix it on IRC ;)

Oct 8 2019, 3:13 PM · Storage manager
vlorentz accepted D2084: swh.storage.filter: Add filtering storage implementation.

Accepting, because @ardumont promised to fix it on IRC ;)

Oct 8 2019, 3:13 PM · Storage manager
vlorentz added inline comments to D2085: swh.storage.buffer: Add buffering proxy storage implementation.
Oct 8 2019, 3:10 PM · Storage manager
vlorentz requested changes to D2084: swh.storage.filter: Add filtering storage implementation.

Actually, I still have some nitpicking to do: instead of having sample_data as a large fixture returning a dict of various value types; could you split it into smaller fixtures, each returning a single type of values?

Oct 8 2019, 3:07 PM · Storage manager