Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 2 2021

vsellier changed the status of T3357: Perform some tests of the cassandra storage on Grid5000 from Open to Work in Progress.
Jun 2 2021, 6:25 PM · System administration, Storage manager

May 26 2021

vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, from Work in Progress to Open.
May 26 2021, 11:26 AM · Storage manager, Archive content
vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field from Work in Progress to Open.
May 26 2021, 11:26 AM · System administration, Storage manager
vlorentz added a comment to T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.

Script: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/migrate_extra_headers.py

May 26 2021, 11:25 AM · System administration, Storage manager
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage, a subtask of T2214: Scale-out graph and database storage in production, as Resolved.
May 26 2021, 11:25 AM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage, a subtask of T1892: Cassandra as a storage backend, as Resolved.
May 26 2021, 11:25 AM · meta-task, Storage manager
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage as Resolved.
May 26 2021, 11:25 AM · Storage manager
vlorentz closed T3314: Test swh.storage.cassandra with ScyllaDB, a subtask of T1892: Cassandra as a storage backend, as Resolved.
May 26 2021, 11:24 AM · meta-task, Storage manager
vlorentz closed T3314: Test swh.storage.cassandra with ScyllaDB as Resolved.
May 26 2021, 11:24 AM · Storage manager
vlorentz added a revision to T3314: Test swh.storage.cassandra with ScyllaDB: D5750: cassandra: Add support for ScyllaDB.
May 26 2021, 11:24 AM · Storage manager

May 19 2021

vlorentz added a project to T3333: Document the different storage backends: Documentation.
May 19 2021, 11:12 AM · Documentation, Storage manager
douardda triaged T3333: Document the different storage backends as Normal priority.
May 19 2021, 10:57 AM · Documentation, Storage manager

May 7 2021

vlorentz claimed T3314: Test swh.storage.cassandra with ScyllaDB.
May 7 2021, 12:07 PM · Storage manager
vlorentz added a subtask for T1892: Cassandra as a storage backend: T3314: Test swh.storage.cassandra with ScyllaDB.
May 7 2021, 12:06 PM · meta-task, Storage manager
vlorentz added a parent task for T3314: Test swh.storage.cassandra with ScyllaDB: T1892: Cassandra as a storage backend.
May 7 2021, 12:06 PM · Storage manager
vlorentz triaged T3314: Test swh.storage.cassandra with ScyllaDB as Normal priority.
May 7 2021, 12:05 PM · Storage manager

May 3 2021

anlambert closed T1117: Origin search is *slow* when you look for very common words as Resolved.

Closing this as resolved now the search feature is using elasticsearch in production.

May 3 2021, 1:05 PM · Web app, Storage manager

Apr 28 2021

vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, from Open to Work in Progress.
Apr 28 2021, 12:43 PM · Storage manager, Archive content
vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field from Open to Work in Progress.
Apr 28 2021, 12:43 PM · System administration, Storage manager

Apr 26 2021

vlorentz claimed T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.
Apr 26 2021, 10:30 AM · System administration, Storage manager
vlorentz added revisions to T2602: Investigate how to upgrade the schema of the Cassandra storage: D5584: cassandra: Add a test of a 'complex' migration, with a PK update, D5582: cassandra: Add 'allow_overwrite' option, to allow updating objects.
Apr 26 2021, 10:25 AM · Storage manager

Apr 23 2021

vlorentz assigned T3135: Improve integrity of ingested content to olasd.
Apr 23 2021, 4:50 PM · Storage manager, Roadmap 2021, meta-task
vlorentz moved T2214: Scale-out graph and database storage in production from Backlog to Work in progress on the Roadmap 2021 board.
Apr 23 2021, 4:47 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz added a parent task for T2564: migrate existing revisions metadata extra_headers to actual extra_headers field: T3089: Remove the 'metadata' column of the 'revision' table.
Apr 23 2021, 9:58 AM · System administration, Storage manager
vlorentz added a subtask for T3089: Remove the 'metadata' column of the 'revision' table: T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.
Apr 23 2021, 9:58 AM · Storage manager, Archive content

Apr 21 2021

ardumont moved T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from deployed/landed/monitoring to done on the System administration board.
Apr 21 2021, 6:58 PM · System administration, Storage manager

Apr 19 2021

vlorentz added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

I just discussed the multiplexer-based migration process I described above with ardumont/olasd/vsellier.

Apr 19 2021, 3:22 PM · Storage manager
vlorentz added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

Doesn't this deserve a state-of-the-art kind of thing?

Apr 19 2021, 3:22 PM · Storage manager
douardda added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

Doesn't this deserve a state-of-the-art kind of thing? Are there documentation material on the subject? How does other (big) cassandra users handle this?

Apr 19 2021, 2:14 PM · Storage manager
olasd added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

For the harder cases, that involve changes to the PK, we could do something like this:

  • create a new table with a new name (eg. revision_v[n+1]; like we do in swh-search except Cassandra does not support aliases)
  • start an extra storage backend, that reads from that table instead of the old one (eg. revision_v[n]), and also reads from all the other tables as usual
  • have a multiplexing storage proxy (like we have for the objstorage), that queries this new backend (which reads from v[n+1]), and falls back to the old backend (which reads from v[n])
Apr 19 2021, 1:59 PM · Storage manager
vlorentz removed a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2471: NPM package angular-ts-manage fails to be properly loaded.
Apr 19 2021, 12:43 PM · Storage manager, Archive content

Apr 16 2021

vlorentz added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

What we can do, however:

Apr 16 2021, 1:45 PM · Storage manager
vlorentz added a subtask for T1892: Cassandra as a storage backend: T2602: Investigate how to upgrade the schema of the Cassandra storage.
Apr 16 2021, 1:36 PM · meta-task, Storage manager
vlorentz added a parent task for T2602: Investigate how to upgrade the schema of the Cassandra storage: T1892: Cassandra as a storage backend.
Apr 16 2021, 1:36 PM · Storage manager

Apr 15 2021

vlorentz placed T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage up for grabs.
Apr 15 2021, 3:17 PM · Storage manager, Extrinsic metadata
vlorentz added a parent task for T2564: migrate existing revisions metadata extra_headers to actual extra_headers field: T3090: Make loaders not rely on the 'metadata' column of the 'revision' table.
Apr 15 2021, 3:15 PM · System administration, Storage manager
vlorentz closed T3090: Make loaders not rely on the 'metadata' column of the 'revision' table, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, as Resolved.
Apr 15 2021, 3:15 PM · Storage manager, Archive content
vlorentz closed T3142: Make loaders write to the ExtId storage, a subtask of T3143: Migrate revision metadata to extid in the storage, as Resolved.
Apr 15 2021, 3:15 PM · System administration, Storage manager, Core Loader

Apr 14 2021

KShivendu closed T2316: Align row deduplication of all _add endpoints on release_add as Resolved.
Apr 14 2021, 5:59 PM · Easy hack, Storage manager

Apr 12 2021

olasd updated the task description for T3245: List all the objects that should be impacted by a given takedown request.
Apr 12 2021, 4:24 PM · Storage manager
olasd changed the status of T3245: List all the objects that should be impacted by a given takedown request from Open to Work in Progress.
Apr 12 2021, 4:24 PM · Storage manager

Apr 9 2021

anlambert added a comment to T3145: Docs : Postgres DB schema missing .

Schema image is now properly displayed: https://docs.softwareheritage.org/devel/swh-storage/sql-storage.html#sql-storage

Apr 9 2021, 3:17 PM · Storage manager, Documentation
ardumont closed T3145: Docs : Postgres DB schema missing as Resolved.
Apr 9 2021, 2:23 PM · Storage manager, Documentation
ardumont added a comment to T3145: Docs : Postgres DB schema missing .

Thanks @faux @KShivendu @anlambert, team work ;)

Apr 9 2021, 2:23 PM · Storage manager, Documentation
ardumont merged T3227: DB Schema link broken in docs under swh-storage. into T3145: Docs : Postgres DB schema missing .
Apr 9 2021, 2:22 PM · Storage manager, Documentation

Apr 6 2021

vlorentz merged task T3185: Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables into T2513: Copy metadata on revisions to the extrinsic metadata storage.
Apr 6 2021, 5:14 PM · System administration, Storage manager
vlorentz added a comment to T3143: Migrate revision metadata to extid in the storage.

if you remember the crash times (.zsh_history?), we could find a range of candidate SWHIDs...

Apr 6 2021, 5:12 PM · System administration, Storage manager, Core Loader
olasd closed T3143: Migrate revision metadata to extid in the storage as Resolved.

The migration script has now run to completion (took around a week).

Apr 6 2021, 4:53 PM · System administration, Storage manager, Core Loader
olasd added a revision to T3143: Migrate revision metadata to extid in the storage: D5430: Add sha512 as a valid field in dsc metadata.
Apr 6 2021, 4:48 PM · System administration, Storage manager, Core Loader
vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T3201: Mirror: unsupported Unicode escape sequence.
Apr 6 2021, 2:20 PM · Storage manager, Archive content
vlorentz added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

@KShivendu The linked script is a start. As it is, it requires direct access to the DB; so you need to create abstractions for it in swh-storage and swh-web

Apr 6 2021, 12:50 PM · Easy hack, Storage manager, Object storage
vlorentz closed T1377: in-memory storage: compute all counters as Resolved.

ok, thanks. It's actually tested in test_stat_counters in swh-storage/swh/storage/tests/storage_tests.py, which is used to test all four classes.

Apr 6 2021, 12:47 PM · Easy hack, Storage manager

Apr 5 2021

KShivendu added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

Hi guys. Any pointers on where to start?

Apr 5 2021, 1:57 PM · Easy hack, Storage manager, Object storage
KShivendu added a comment to T1377: in-memory storage: compute all counters.

I might be wrong but, I think it has been completed. Check out these :

Apr 5 2021, 12:24 PM · Easy hack, Storage manager

Apr 3 2021

vlorentz closed T2290: Implement origin_metadata endpoints in swh/storage/cassandra/ as Resolved.

No longer relevant

Apr 3 2021, 9:06 AM · Easy hack, Storage manager

Apr 1 2021

vlorentz updated the task description for T1892: Cassandra as a storage backend.
Apr 1 2021, 11:48 AM · meta-task, Storage manager
vlorentz updated the task description for T1892: Cassandra as a storage backend.
Apr 1 2021, 11:48 AM · meta-task, Storage manager
vlorentz updated the task description for T1892: Cassandra as a storage backend.
Apr 1 2021, 11:48 AM · meta-task, Storage manager
vlorentz added a subtask for T1117: Origin search is *slow* when you look for very common words: T2590: Finish the indexer -> swh-search pipeline.
Apr 1 2021, 10:51 AM · Web app, Storage manager

Mar 30 2021

olasd changed the status of T3143: Migrate revision metadata to extid in the storage from Open to Work in Progress.
Mar 30 2021, 7:43 PM · System administration, Storage manager, Core Loader
olasd added a comment to T3143: Migrate revision metadata to extid in the storage.

I've deployed the extid schema changes on all storages, and I've started the migration script on getty.

Mar 30 2021, 7:42 PM · System administration, Storage manager, Core Loader
vsellier added a project to T3143: Migrate revision metadata to extid in the storage: System administration.
Mar 30 2021, 5:26 PM · System administration, Storage manager, Core Loader
vlorentz added a project to T3143: Migrate revision metadata to extid in the storage: Storage manager.
Mar 30 2021, 4:57 PM · System administration, Storage manager, Core Loader

Mar 29 2021

vlorentz renamed T3185: Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables from Migrate extrinsic metadata to Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables.
Mar 29 2021, 4:06 PM · System administration, Storage manager
vlorentz triaged T3185: Migrate extrinsic metadata from 'revision' to 'raw_extrinsic_metadata' tables as Normal priority.
Mar 29 2021, 4:05 PM · System administration, Storage manager

Mar 25 2021

vlorentz closed T1910: Redesign origin search using a dedicated component (swh-search), a subtask of T1117: Origin search is *slow* when you look for very common words, as Resolved.
Mar 25 2021, 11:16 AM · Web app, Storage manager
vlorentz closed T1910: Redesign origin search using a dedicated component (swh-search) as Resolved.
Mar 25 2021, 11:16 AM · Archive search, Storage manager
vlorentz closed T1910: Redesign origin search using a dedicated component (swh-search), a subtask of T1892: Cassandra as a storage backend, as Resolved.
Mar 25 2021, 11:16 AM · meta-task, Storage manager

Mar 23 2021

vlorentz added a comment to T2686: Use hashes for all kafka keys.

(and we should keep the origin topic; we already have an ExtSWHID for origins anyway)

Mar 23 2021, 2:55 PM · Data Model, Storage manager
olasd added a comment to T2686: Use hashes for all kafka keys.

The following objects remain:

Mar 23 2021, 2:47 PM · Data Model, Storage manager
vlorentz closed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, a subtask of T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases, as Resolved.
Mar 23 2021, 2:33 PM · Package Loader, Storage manager, Extrinsic metadata
vlorentz closed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, a subtask of T2686: Use hashes for all kafka keys, as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager
vlorentz closed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata, a subtask of T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata as Resolved.
Mar 23 2021, 2:33 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra, a subtask of T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields, as Resolved.
Mar 23 2021, 2:32 PM · Storage manager, Extrinsic metadata
vlorentz closed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra, a subtask of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, as Resolved.
Mar 23 2021, 2:32 PM · Storage manager, Extrinsic metadata
vlorentz closed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra as Resolved.
Mar 23 2021, 2:32 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, a subtask of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, as Resolved.
Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql as Resolved.

After a lot of back and forth, and the release of swh.model v2.3.0 and swh.storage v0.26.0, this is now all done and deployed in staging and production.

Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, a subtask of T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields, as Resolved.
Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields, a subtask of T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, as Resolved.
Mar 23 2021, 2:25 PM · Data Model, Storage manager, Extrinsic metadata
olasd closed T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields as Resolved.

After the release of swh.model v2, this is now done.

Mar 23 2021, 2:25 PM · Storage manager, Extrinsic metadata

Mar 19 2021

vlorentz triaged T3135: Improve integrity of ingested content as Normal priority.
Mar 19 2021, 4:23 PM · Storage manager, Roadmap 2021, meta-task

Mar 17 2021

KShivendu updated the task description for T3145: Docs : Postgres DB schema missing .
Mar 17 2021, 8:56 AM · Storage manager, Documentation
KShivendu updated the task description for T3145: Docs : Postgres DB schema missing .
Mar 17 2021, 8:56 AM · Storage manager, Documentation
KShivendu triaged T3145: Docs : Postgres DB schema missing as Normal priority.
Mar 17 2021, 8:46 AM · Storage manager, Documentation

Mar 15 2021

rdicosmo added a subtask for T3135: Improve integrity of ingested content: T399: (Re-)Compute data checksums before insertion.
Mar 15 2021, 8:48 PM · Storage manager, Roadmap 2021, meta-task
rdicosmo added a parent task for T399: (Re-)Compute data checksums before insertion: T3135: Improve integrity of ingested content.
Mar 15 2021, 8:48 PM · Storage manager
rdicosmo created T3135: Improve integrity of ingested content.
Mar 15 2021, 8:47 PM · Storage manager, Roadmap 2021, meta-task
rdicosmo added a comment to T3092: Define the requirements for an on-premise Cassandra cluster.

Let's organise a call next week to explore the options, including the new opportunities of testing that emerged recently.

Mar 15 2021, 1:57 PM · System administration, Storage manager
vlorentz added a comment to T3092: Define the requirements for an on-premise Cassandra cluster.

@rdicosmo I have not, good idea. While they are probably too expansive to use as the main storage instead of SSDs (either via a regular FS or by using a Pmem-aware Cassandra fork), we could use them in addition to the above requirements.

Mar 15 2021, 1:48 PM · System administration, Storage manager
rdicosmo added a comment to T3092: Define the requirements for an on-premise Cassandra cluster.

Did you consider PMem (and other configurations for Intel Optane memory) in your discussion? It offers a very interesting price/performance ratio.
There are machines on Grid5000 available to test this technology if needed.

Mar 15 2021, 1:21 PM · System administration, Storage manager
vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2471: NPM package angular-ts-manage fails to be properly loaded.
Mar 15 2021, 12:32 PM · Storage manager, Archive content
vlorentz closed T3092: Define the requirements for an on-premise Cassandra cluster as Resolved.
Mar 15 2021, 11:34 AM · System administration, Storage manager
vlorentz closed T3092: Define the requirements for an on-premise Cassandra cluster, a subtask of T3091: Order hardware for an on-premise Cassandra cluster, as Resolved.
Mar 15 2021, 11:34 AM · System administration, Storage manager

Mar 11 2021

douardda closed T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders as Resolved.
Mar 11 2021, 2:55 PM · Storage manager

Mar 10 2021

rdicosmo added a parent task for T1892: Cassandra as a storage backend: T2214: Scale-out graph and database storage in production.
Mar 10 2021, 4:40 PM · meta-task, Storage manager
rdicosmo added a subtask for T2214: Scale-out graph and database storage in production: T1892: Cassandra as a storage backend.
Mar 10 2021, 4:40 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
rdicosmo edited projects for T2214: Scale-out graph and database storage in production, added: Roadmap 2021; removed Roadmap 2020.
Mar 10 2021, 4:39 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager