Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 25 2021

vlorentz added a parent task for T3357: Perform some tests of the cassandra storage on Grid5000: T1892: Cassandra as a storage backend.
Jun 25 2021, 4:32 PM · System administration, Storage manager
vlorentz added a subtask for T1892: Cassandra as a storage backend: T3357: Perform some tests of the cassandra storage on Grid5000.
Jun 25 2021, 4:32 PM · meta-task, Storage manager
vlorentz added a comment to T3396: cassandra - allow to configure the consistency level used by the queries.

I'm not quite sure what this is about.

Jun 25 2021, 1:53 PM · System administration, Storage manager

Jun 22 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

An array with the possible node count relative to the replication factor was added on the hedgedoc document : https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw?both

Jun 22 2021, 9:47 AM · System administration, Storage manager

Jun 18 2021

vsellier renamed T3396: cassandra - allow to configure the consistency level used by the queries from cassandra - allow to configure the consitency level used by the queries to cassandra - allow to configure the consistency level used by the queries.
Jun 18 2021, 7:24 PM · System administration, Storage manager
vsellier renamed T3396: cassandra - allow to configure the consistency level used by the queries from cassandra - allow to configure the consitency level used for the queries to cassandra - allow to configure the consitency level used by the queries.
Jun 18 2021, 5:22 PM · System administration, Storage manager
vsellier updated subscribers of T3396: cassandra - allow to configure the consistency level used by the queries.

@vlorentz If you have an idea on how to implement that, I take it ;), I'm not sure if I have not missed something

Jun 18 2021, 5:22 PM · System administration, Storage manager
vsellier triaged T3396: cassandra - allow to configure the consistency level used by the queries as Normal priority.
Jun 18 2021, 5:19 PM · System administration, Storage manager
vsellier updated the task description for T3395: cassandra - Timeouts during revision import.
Jun 18 2021, 4:57 PM · System administration, Storage manager
vsellier triaged T3395: cassandra - Timeouts during revision import as Normal priority.
Jun 18 2021, 4:57 PM · System administration, Storage manager
vsellier updated the task description for T3394: cassandra - origin url hashing encoding issue.
Jun 18 2021, 4:49 PM · System administration, Storage manager
vsellier triaged T3394: cassandra - origin url hashing encoding issue as Normal priority.
Jun 18 2021, 4:49 PM · System administration, Storage manager
vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Several tests were executed with cassandra node on the parasilo cluster [1]
The configuration was always the same to calibrate the runs:

  • ZFS is used to manage to datasets
  • the commitlogs in the 200Go SSD drive
  • the data in the 4 600Gb HDD configured in RAID0
  • Default memory configuration (8Go / default GC (not g1))
  • Cassandra configuration : [2]
Jun 18 2021, 4:44 PM · System administration, Storage manager

Jun 16 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Some notes on how to perform common actions with cassandra: https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw

Jun 16 2021, 11:09 AM · System administration, Storage manager

Jun 15 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

The environment can be stopped and rebuild as long as the disk remained reserved on the servers.

Jun 15 2021, 10:50 AM · System administration, Storage manager
vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.
Jun 15 2021, 10:29 AM · System administration, Storage manager
vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.
Jun 15 2021, 10:29 AM · System administration, Storage manager

Jun 10 2021

vsellier added a comment to T3357: Perform some tests of the cassandra storage on Grid5000.

Some status about the automation:

  • Cassandra nodes are ok (os installation, zfs configuration according to the defined environment except a problem during the first initialization with new disks, startup, cluster configuration)
  • swh-storage node is ok (os installation, gunicorn/swh-storage installation and startup)
  • cassandra database initialization :
root@parasilo-3:~#  nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.16.97.3  78.85 KiB   256     31.6%             49d46dd8-4640-45eb-9d4c-b6b16fc954ab  rack1
UN  172.16.97.5  105.45 KiB  256     26.0%             47e99bb4-4846-4e03-a06c-53ea2862172d  rack1
UN  172.16.97.4  98.35 KiB   256     18.1%             e2aeff29-c89a-4c7a-9352-77aaf78e91b3  rack1
UN  172.16.97.2  78.85 KiB   256     24.3%             edd1b72b-4c35-44bd-b7e5-316f41a156c4  rack1
root@parasilo-3:~# cqlsh 172.16.97.3
Connected to swh-storage at 172.16.97.3:9042
[cqlsh 6.0.0 | Cassandra 4.0 | CQL spec 3.4.5 | Native protocol v5]
cqlsh> desc KEYSPACES
Jun 10 2021, 7:02 PM · System administration, Storage manager

Jun 3 2021

vsellier updated subscribers of T3357: Perform some tests of the cassandra storage on Grid5000.

I played with grid5000 to experiment how the jobs work and how to initialize the reserved nodes.

Jun 3 2021, 7:30 PM · System administration, Storage manager
ardumont moved T3357: Perform some tests of the cassandra storage on Grid5000 from Backlog to in-progress on the System administration board.
Jun 3 2021, 6:19 PM · System administration, Storage manager

Jun 2 2021

vsellier changed the status of T3357: Perform some tests of the cassandra storage on Grid5000 from Open to Work in Progress.
Jun 2 2021, 6:25 PM · System administration, Storage manager

May 26 2021

vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, from Work in Progress to Open.
May 26 2021, 11:26 AM · Storage manager, Archive content
vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field from Work in Progress to Open.
May 26 2021, 11:26 AM · System administration, Storage manager
vlorentz added a comment to T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.

Script: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/migrate_extra_headers.py

May 26 2021, 11:25 AM · System administration, Storage manager
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage, a subtask of T2214: Scale-out graph and database storage in production, as Resolved.
May 26 2021, 11:25 AM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage, a subtask of T1892: Cassandra as a storage backend, as Resolved.
May 26 2021, 11:25 AM · meta-task, Storage manager
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage as Resolved.
May 26 2021, 11:25 AM · Storage manager
vlorentz closed T3314: Test swh.storage.cassandra with ScyllaDB, a subtask of T1892: Cassandra as a storage backend, as Resolved.
May 26 2021, 11:24 AM · meta-task, Storage manager
vlorentz closed T3314: Test swh.storage.cassandra with ScyllaDB as Resolved.
May 26 2021, 11:24 AM · Storage manager
vlorentz added a revision to T3314: Test swh.storage.cassandra with ScyllaDB: D5750: cassandra: Add support for ScyllaDB.
May 26 2021, 11:24 AM · Storage manager

May 19 2021

vlorentz added a project to T3333: Document the different storage backends: Documentation.
May 19 2021, 11:12 AM · Documentation, Storage manager
douardda triaged T3333: Document the different storage backends as Normal priority.
May 19 2021, 10:57 AM · Documentation, Storage manager

May 7 2021

vlorentz claimed T3314: Test swh.storage.cassandra with ScyllaDB.
May 7 2021, 12:07 PM · Storage manager
vlorentz added a subtask for T1892: Cassandra as a storage backend: T3314: Test swh.storage.cassandra with ScyllaDB.
May 7 2021, 12:06 PM · meta-task, Storage manager
vlorentz added a parent task for T3314: Test swh.storage.cassandra with ScyllaDB: T1892: Cassandra as a storage backend.
May 7 2021, 12:06 PM · Storage manager
vlorentz triaged T3314: Test swh.storage.cassandra with ScyllaDB as Normal priority.
May 7 2021, 12:05 PM · Storage manager

May 3 2021

anlambert closed T1117: Origin search is *slow* when you look for very common words as Resolved.

Closing this as resolved now the search feature is using elasticsearch in production.

May 3 2021, 1:05 PM · Web app, Storage manager

Apr 28 2021

vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, from Open to Work in Progress.
Apr 28 2021, 12:43 PM · Storage manager, Archive content
vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field from Open to Work in Progress.
Apr 28 2021, 12:43 PM · System administration, Storage manager

Apr 26 2021

vlorentz claimed T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.
Apr 26 2021, 10:30 AM · System administration, Storage manager
vlorentz added revisions to T2602: Investigate how to upgrade the schema of the Cassandra storage: D5584: cassandra: Add a test of a 'complex' migration, with a PK update, D5582: cassandra: Add 'allow_overwrite' option, to allow updating objects.
Apr 26 2021, 10:25 AM · Storage manager

Apr 23 2021

vlorentz assigned T3135: Improve integrity of ingested content to olasd.
Apr 23 2021, 4:50 PM · Storage manager, Roadmap 2021, meta-task
vlorentz moved T2214: Scale-out graph and database storage in production from Backlog to Work in progress on the Roadmap 2021 board.
Apr 23 2021, 4:47 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz added a parent task for T2564: migrate existing revisions metadata extra_headers to actual extra_headers field: T3089: Remove the 'metadata' column of the 'revision' table.
Apr 23 2021, 9:58 AM · System administration, Storage manager
vlorentz added a subtask for T3089: Remove the 'metadata' column of the 'revision' table: T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.
Apr 23 2021, 9:58 AM · Storage manager, Archive content

Apr 21 2021

ardumont moved T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from deployed/landed/monitoring to done on the System administration board.
Apr 21 2021, 6:58 PM · System administration, Storage manager

Apr 19 2021

vlorentz added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

I just discussed the multiplexer-based migration process I described above with ardumont/olasd/vsellier.

Apr 19 2021, 3:22 PM · Storage manager
vlorentz added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

Doesn't this deserve a state-of-the-art kind of thing?

Apr 19 2021, 3:22 PM · Storage manager
douardda added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

Doesn't this deserve a state-of-the-art kind of thing? Are there documentation material on the subject? How does other (big) cassandra users handle this?

Apr 19 2021, 2:14 PM · Storage manager
olasd added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

For the harder cases, that involve changes to the PK, we could do something like this:

  • create a new table with a new name (eg. revision_v[n+1]; like we do in swh-search except Cassandra does not support aliases)
  • start an extra storage backend, that reads from that table instead of the old one (eg. revision_v[n]), and also reads from all the other tables as usual
  • have a multiplexing storage proxy (like we have for the objstorage), that queries this new backend (which reads from v[n+1]), and falls back to the old backend (which reads from v[n])
Apr 19 2021, 1:59 PM · Storage manager
vlorentz removed a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2471: NPM package angular-ts-manage fails to be properly loaded.
Apr 19 2021, 12:43 PM · Storage manager, Archive content

Apr 16 2021

vlorentz added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

What we can do, however:

Apr 16 2021, 1:45 PM · Storage manager
vlorentz added a subtask for T1892: Cassandra as a storage backend: T2602: Investigate how to upgrade the schema of the Cassandra storage.
Apr 16 2021, 1:36 PM · meta-task, Storage manager
vlorentz added a parent task for T2602: Investigate how to upgrade the schema of the Cassandra storage: T1892: Cassandra as a storage backend.
Apr 16 2021, 1:36 PM · Storage manager

Apr 15 2021

vlorentz placed T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage up for grabs.
Apr 15 2021, 3:17 PM · Storage manager, Extrinsic metadata
vlorentz added a parent task for T2564: migrate existing revisions metadata extra_headers to actual extra_headers field: T3090: Make loaders not rely on the 'metadata' column of the 'revision' table.
Apr 15 2021, 3:15 PM · System administration, Storage manager
vlorentz closed T3090: Make loaders not rely on the 'metadata' column of the 'revision' table, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, as Resolved.
Apr 15 2021, 3:15 PM · Storage manager, Archive content
vlorentz closed T3142: Make loaders write to the ExtId storage, a subtask of T3143: Migrate revision metadata to extid in the storage, as Resolved.
Apr 15 2021, 3:15 PM · System administration, Storage manager, Core Loader

Apr 14 2021

KShivendu closed T2316: Align row deduplication of all _add endpoints on release_add as Resolved.
Apr 14 2021, 5:59 PM · Easy hack, Storage manager

Apr 12 2021

olasd updated the task description for T3245: List all the objects that should be impacted by a given takedown request.
Apr 12 2021, 4:24 PM · Storage manager
olasd changed the status of T3245: List all the objects that should be impacted by a given takedown request from Open to Work in Progress.
Apr 12 2021, 4:24 PM · Storage manager

Apr 9 2021

anlambert added a comment to T3145: Docs : Postgres DB schema missing .

Schema image is now properly displayed: https://docs.softwareheritage.org/devel/swh-storage/sql-storage.html#sql-storage

Apr 9 2021, 3:17 PM · Storage manager, Documentation
ardumont closed T3145: Docs : Postgres DB schema missing as Resolved.
Apr 9 2021, 2:23 PM · Storage manager, Documentation
ardumont added a comment to T3145: Docs : Postgres DB schema missing .

Thanks @faux @KShivendu @anlambert, team work ;)

Apr 9 2021, 2:23 PM · Storage manager, Documentation
ardumont merged T3227: DB Schema link broken in docs under swh-storage. into T3145: Docs : Postgres DB schema missing .
Apr 9 2021, 2:22 PM · Storage manager, Documentation

Apr 6 2021

vlorentz merged task T3185: Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables into T2513: Copy metadata on revisions to the extrinsic metadata storage.
Apr 6 2021, 5:14 PM · System administration, Storage manager
vlorentz added a comment to T3143: Migrate revision metadata to extid in the storage.

if you remember the crash times (.zsh_history?), we could find a range of candidate SWHIDs...

Apr 6 2021, 5:12 PM · System administration, Storage manager, Core Loader
olasd closed T3143: Migrate revision metadata to extid in the storage as Resolved.

The migration script has now run to completion (took around a week).

Apr 6 2021, 4:53 PM · System administration, Storage manager, Core Loader
olasd added a revision to T3143: Migrate revision metadata to extid in the storage: D5430: Add sha512 as a valid field in dsc metadata.
Apr 6 2021, 4:48 PM · System administration, Storage manager, Core Loader
vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T3201: Mirror: unsupported Unicode escape sequence.
Apr 6 2021, 2:20 PM · Storage manager, Archive content
vlorentz added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

@KShivendu The linked script is a start. As it is, it requires direct access to the DB; so you need to create abstractions for it in swh-storage and swh-web

Apr 6 2021, 12:50 PM · Easy hack, Storage manager, Object storage
vlorentz closed T1377: in-memory storage: compute all counters as Resolved.

ok, thanks. It's actually tested in test_stat_counters in swh-storage/swh/storage/tests/storage_tests.py, which is used to test all four classes.

Apr 6 2021, 12:47 PM · Easy hack, Storage manager

Apr 5 2021

KShivendu added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

Hi guys. Any pointers on where to start?

Apr 5 2021, 1:57 PM · Easy hack, Storage manager, Object storage
KShivendu added a comment to T1377: in-memory storage: compute all counters.

I might be wrong but, I think it has been completed. Check out these :

Apr 5 2021, 12:24 PM · Easy hack, Storage manager

Apr 3 2021

vlorentz closed T2290: Implement origin_metadata endpoints in swh/storage/cassandra/ as Resolved.

No longer relevant

Apr 3 2021, 9:06 AM · Easy hack, Storage manager

Apr 1 2021

vlorentz updated the task description for T1892: Cassandra as a storage backend.
Apr 1 2021, 11:48 AM · meta-task, Storage manager
vlorentz updated the task description for T1892: Cassandra as a storage backend.
Apr 1 2021, 11:48 AM · meta-task, Storage manager
vlorentz updated the task description for T1892: Cassandra as a storage backend.
Apr 1 2021, 11:48 AM · meta-task, Storage manager
vlorentz added a subtask for T1117: Origin search is *slow* when you look for very common words: T2590: Finish the indexer -> swh-search pipeline.
Apr 1 2021, 10:51 AM · Web app, Storage manager

Mar 30 2021

olasd changed the status of T3143: Migrate revision metadata to extid in the storage from Open to Work in Progress.
Mar 30 2021, 7:43 PM · System administration, Storage manager, Core Loader
olasd added a comment to T3143: Migrate revision metadata to extid in the storage.

I've deployed the extid schema changes on all storages, and I've started the migration script on getty.

Mar 30 2021, 7:42 PM · System administration, Storage manager, Core Loader
vsellier added a project to T3143: Migrate revision metadata to extid in the storage: System administration.
Mar 30 2021, 5:26 PM · System administration, Storage manager, Core Loader
vlorentz added a project to T3143: Migrate revision metadata to extid in the storage: Storage manager.
Mar 30 2021, 4:57 PM · System administration, Storage manager, Core Loader

Mar 29 2021

vlorentz renamed T3185: Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables from Migrate extrinsic metadata to Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables.
Mar 29 2021, 4:06 PM · System administration, Storage manager
vlorentz triaged T3185: Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables as Normal priority.
Mar 29 2021, 4:05 PM · System administration, Storage manager

Mar 25 2021

vlorentz closed T1910: Redesign origin search using a dedicated component (swh-search), a subtask of T1117: Origin search is *slow* when you look for very common words, as Resolved.
Mar 25 2021, 11:16 AM · Web app, Storage manager
vlorentz closed T1910: Redesign origin search using a dedicated component (swh-search) as Resolved.
Mar 25 2021, 11:16 AM · Archive search, Storage manager
vlorentz closed T1910: Redesign origin search using a dedicated component (swh-search), a subtask of T1892: Cassandra as a storage backend, as Resolved.
Mar 25 2021, 11:16 AM · meta-task, Storage manager

Mar 23 2021

vlorentz added a comment to T2686: Use hashes for all kafka keys.

(and we should keep the origin topic; we already have an ExtSWHID for origins anyway)

Mar 23 2021, 2:55 PM · Data Model, Storage manager
olasd added a comment to T2686: Use hashes for all kafka keys.

The following objects remain:

Mar 23 2021, 2:47 PM · Data Model, Storage manager
vlorentz closed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, a subtask of T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases, as Resolved.
Mar 23 2021, 2:33 PM · Package Loader, Storage manager, Extrinsic metadata
vlorentz closed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, a subtask of T2686: Use hashes for all kafka keys, as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager
vlorentz closed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata, a subtask of T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra, a subtask of T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields, as Resolved.
Mar 23 2021, 2:32 PM · Storage manager, Extrinsic metadata
vlorentz closed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra, a subtask of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, as Resolved.
Mar 23 2021, 2:32 PM · Storage manager, Extrinsic metadata
vlorentz closed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra as Resolved.
Mar 23 2021, 2:32 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, a subtask of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, as Resolved.
Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql as Resolved.

After a lot of back and forth, and the release of swh.model v2.3.0 and swh.storage v0.26.0, this is now all done and deployed in staging and production.

Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata