douardda renamed T3841: regularly scrub all the data stores of swh from regularly scrub all the data sources of swh to regularly scrub all the data stores of swh.

Jan 11 2022, 12:32 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager

douardda removed a project from T3841: regularly scrub all the data stores of swh: Roadmap 2021.

Jan 11 2022, 12:31 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager

douardda triaged T3841: regularly scrub all the data stores of swh as Normal priority.

Jan 11 2022, 12:31 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager

Jan 7 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6848: Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel, D6890: git_objects: Use raw offset_bytes to format dates, and remove format_offset().

Jan 7 2022, 1:54 PM · Data Model, Storage manager

vlorentz added revisions to T3753: Store original git manifests: D6801: model: Add a raw_manifest attribute, D6811: model: Exclude 'raw_manifest' from dictionaries when it is null, D6847: hypothesis_strategies: Generate raw_manifest, D6848: Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel.

Jan 7 2022, 1:54 PM · Data Model, Storage manager

Jan 6 2022

vlorentz added revisions to T3577: Parallel loaders performances : D6888: cassandra: Rewrite content_missing to run queries concurrently., D6885: cassandra: Use concurrent queries in *_missing() instead of naive grouping.

Jan 6 2022, 5:32 PM · System administration, Storage manager

Dec 22 2021

vlorentz closed T3585: Fix inconsistencies of the Cassandra backend with postgres as Wontfix.

Dec 22 2021, 2:34 PM · meta-task, Storage manager

vlorentz closed T3585: Fix inconsistencies of the Cassandra backend with postgres, a subtask of T1892: Cassandra as a storage backend, as Wontfix.

Dec 22 2021, 2:34 PM · meta-task, Storage manager

Dec 7 2021

vlorentz added a revision to T3752: Store/represent time offsets as strings: D6776: Add attribute TimestampWithTimezone.offset_bytes, to store raw Git offsets.

Dec 7 2021, 4:51 PM · Data Model, Storage manager

anlambert closed T3776: cassandra tests are failing in the swh-environment build as Resolved by committing rDSTO615fb99eb708: test_cassandra: Fix failing tests since swh-model update.

Dec 7 2021, 1:56 PM · Storage manager

anlambert added a revision to T3776: cassandra tests are failing in the swh-environment build: D6768: test_cassandra: Fix failing tests since swh-model update.

Dec 7 2021, 1:38 PM · Storage manager

vsellier triaged T3776: cassandra tests are failing in the swh-environment build as High priority.

Dec 7 2021, 1:16 PM · Storage manager

Dec 2 2021

vlorentz updated the task description for T3752: Store/represent time offsets as strings.

Dec 2 2021, 4:04 PM · Data Model, Storage manager

vlorentz closed T3586: Figure out what to do with 'misordered' directories in Cassandra, a subtask of T3585: Fix inconsistencies of the Cassandra backend with postgres, as Resolved.

Dec 2 2021, 3:14 PM · meta-task, Storage manager

vlorentz closed T3586: Figure out what to do with 'misordered' directories in Cassandra as Resolved.

We don't care anymore, this will be handled by T3753.

Dec 2 2021, 3:14 PM · Data Model, Storage manager

vlorentz removed a parent task for T3752: Store/represent time offsets as strings: T3753: Store original git manifests.

Dec 2 2021, 3:01 PM · Data Model, Storage manager

vlorentz removed a subtask for T3753: Store original git manifests: T3752: Store/represent time offsets as strings.

Dec 2 2021, 3:01 PM · Data Model, Storage manager

vlorentz added a parent task for T3752: Store/represent time offsets as strings: T3753: Store original git manifests.

Dec 2 2021, 3:00 PM · Data Model, Storage manager

vlorentz added a subtask for T3753: Store original git manifests: T3752: Store/represent time offsets as strings.

Dec 2 2021, 3:00 PM · Data Model, Storage manager

vlorentz updated the task description for T3752: Store/represent time offsets as strings.

Dec 2 2021, 2:59 PM · Data Model, Storage manager

vlorentz updated the task description for T3752: Store/represent time offsets as strings.

Dec 2 2021, 2:55 PM · Data Model, Storage manager

vlorentz updated the task description for T3752: Store/represent time offsets as strings.

Dec 2 2021, 2:52 PM · Data Model, Storage manager

vlorentz updated the task description for T3753: Store original git manifests.

Dec 2 2021, 2:48 PM · Data Model, Storage manager

vlorentz updated the task description for T3752: Store/represent time offsets as strings.

Dec 2 2021, 2:22 PM · Data Model, Storage manager

vsellier closed T3357: Perform some tests of the cassandra storage on Grid5000, a subtask of T1892: Cassandra as a storage backend, as Resolved.

Dec 2 2021, 10:10 AM · meta-task, Storage manager

vsellier closed T3357: Perform some tests of the cassandra storage on Grid5000 as Resolved.

The slide of the restrospective of the experiment are available at : https://hedgedoc.softwareheritage.org/VOP9qh1MTqm4DjPQfFgNbQ

Dec 2 2021, 10:10 AM · System administration, Storage manager

vsellier closed T3573: [cassandra] directory and content read benchmarks, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Dec 2 2021, 10:08 AM · System administration, Storage manager

vsellier closed T3573: [cassandra] directory and content read benchmarks as Resolved.

It was not easy to know if it's a lot of call or long running calls because it's regular sample and we don't have this granularity.

Dec 2 2021, 10:08 AM · System administration, Storage manager

Dec 1 2021

zack moved T2053: support graph export for the cassandra backend from Backlog to Deployed on the Compressed graph service board.

Dec 1 2021, 4:37 PM · Compressed graph service, Storage manager

zack moved T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs from Backlog to Deployed on the Compressed graph service board.

Dec 1 2021, 4:35 PM · Compressed graph service, Storage manager

Nov 26 2021

vlorentz removed a project from T3752: Store/represent time offsets as strings: meta-task.

Nov 26 2021, 5:19 PM · Data Model, Storage manager

vlorentz removed a project from T3753: Store original git manifests: meta-task.

Nov 26 2021, 5:19 PM · Data Model, Storage manager

vlorentz claimed T3594: Faithfully store weird git objects.

Nov 26 2021, 4:43 PM · meta-task, Data Model, Storage manager

vlorentz claimed T3753: Store original git manifests.

Nov 26 2021, 4:43 PM · Data Model, Storage manager

vlorentz triaged T3753: Store original git manifests as Normal priority.

Nov 26 2021, 4:43 PM · Data Model, Storage manager

vlorentz triaged T3752: Store/represent time offsets as strings as Normal priority.

Nov 26 2021, 4:42 PM · Data Model, Storage manager

vlorentz closed T3598: Support revisions with "extra headers" not at the end, a subtask of T3594: Faithfully store weird git objects, as Wontfix.

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager

vlorentz closed T3598: Support revisions with "extra headers" not at the end as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · Data Model, Storage manager

vlorentz closed T3596: Support "weird" permissions in directories as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager

vlorentz closed T3596: Support "weird" permissions in directories, a subtask of T3594: Faithfully store weird git objects, as Wontfix.

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager

vlorentz closed T3595: Support disordered directory entries in git, a subtask of T3594: Faithfully store weird git objects, as Wontfix.

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager

vlorentz closed T3595: Support disordered directory entries in git as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager

vlorentz added a comment to T3594: Faithfully store weird git objects.

Copy of an email I sent today:

Nov 26 2021, 4:40 PM · meta-task, Data Model, Storage manager

vlorentz added a revision to T399: (Re-)Compute data checksums before insertion: D6281: converters: Recompute hashes and check they match the originals.

Nov 26 2021, 3:52 PM · Storage manager

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6693: Add support for a redis-based reporter for failed replayed objects.

Nov 26 2021, 1:33 PM · Storage manager

Nov 15 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.

Nov 15 2021, 9:46 AM · System administration, Storage manager

Nov 8 2021

vsellier closed T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Nov 8 2021, 9:55 AM · System administration, Storage manager

Oct 27 2021

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6571: Add support for a redis-based reporting for invalid mirrorred objects.

Oct 27 2021, 6:24 PM · Storage manager

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6565: Pass the object_type to JournalClient.value_serializer().

Oct 27 2021, 4:19 PM · Storage manager

vlorentz updated the task description for T3594: Faithfully store weird git objects.

Oct 27 2021, 2:08 PM · meta-task, Data Model, Storage manager

vlorentz updated the task description for T3594: Faithfully store weird git objects.

Oct 27 2021, 2:03 PM · meta-task, Data Model, Storage manager

vlorentz updated the task description for T3594: Faithfully store weird git objects.

Oct 27 2021, 2:03 PM · meta-task, Data Model, Storage manager

Oct 26 2021

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6554: [WIP] Add a (redis-based) validation error reporting facility.

Oct 26 2021, 5:48 PM · Storage manager

douardda triaged T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror) as High priority.

Oct 26 2021, 5:41 PM · Storage manager

Oct 22 2021

ardumont added a comment to T3595: Support disordered directory entries in git.

I came across a rather small repository [1] which i believe raise the same issue.
So it may help to keep its reference to ease the testing of the improvment discussed here.
Feel free to dismiss if not that useful.

Oct 22 2021, 1:54 PM · meta-task, Data Model, Storage manager

vsellier reopened T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Work in Progress.

Oct 22 2021, 11:49 AM · System administration, Storage manager

vsellier closed T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Oct 22 2021, 11:48 AM · System administration, Storage manager

Oct 21 2021

vsellier added revisions to T3577: Parallel loaders performances : D6423: cassandra: Add alternative algorithms to list missing objects, D6494: cassandra: Fix incomplete check of content existence in object_find_by_sha1_git, D6495: cassandra: Rewrite content_missing to run queries concurrently..

Oct 21 2021, 2:58 PM · System administration, Storage manager

vsellier closed T3577: Parallel loaders performances , a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.

Oct 21 2021, 2:56 PM · System administration, Storage manager

vsellier closed T3577: Parallel loaders performances as Resolved.

Stopping here the investigations as the limit is now the hardware running cassandra (HDD). Great improvements were made compared to the first tries

Oct 21 2021, 2:56 PM · System administration, Storage manager

vsellier added a comment to T3577: Parallel loaders performances .

Changing the cache size doesn't seem very effective for the particular workload of the loaders:
Exception some contextual differences due probably due to the different origin loaded, the performances are quite similar.
The hit ratios between the different configurations are also very close

Oct 21 2021, 2:36 PM · System administration, Storage manager

vlorentz added a revision to T3135: Improve integrity of ingested content: D6504: converters: Fix detection of tree entries with non-standard commit/tree mode..

Oct 21 2021, 10:57 AM · Storage manager, Roadmap 2021, meta-task

Oct 20 2021

vsellier added a comment to T3577: Parallel loaders performances .

Closing here the tests of the workers.

Oct 20 2021, 5:43 PM · System administration, Storage manager

vsellier added a comment to T3577: Parallel loaders performances .

This is the impacts of the different algorithm usage for retrieving the missing objects.
The configuration is the same as previously, the storage is patched with D6423, D6494 and D6495.
The first without D6423 is a version patched witj D6494 and D6495 only

Oct 20 2021, 4:56 PM · System administration, Storage manager

vsellier added a comment to T3577: Parallel loaders performances .

This is the impact of configuration the add batch insert size. Tested values were 500, 1000 and 2000.
A full repair of a cassandra node was in progress due to a datalost (disk erased by someone else's experiment after the disk reservation expiration)

Oct 20 2021, 2:50 PM · System administration, Storage manager

Oct 15 2021

vlorentz updated the task description for T3594: Faithfully store weird git objects.

Oct 15 2021, 2:38 PM · meta-task, Data Model, Storage manager

Oct 14 2021

olasd updated subscribers of T1617: Experiment with generation numbers to improve revisions walk performance.

@vlorentz mentioned this idea in the context of T3655 (git loader global deduplication).

Oct 14 2021, 12:00 PM · Storage manager

vlorentz added a comment to T3573: [cassandra] directory and content read benchmarks.

What "directory_entries_insert_algo" did you use for this?

Oct 14 2021, 11:32 AM · System administration, Storage manager

vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

Some flame graphs of storage was performed during the ingestion with 50 workers in //

Oct 14 2021, 10:08 AM · System administration, Storage manager

Oct 12 2021

vsellier added a comment to T3577: Parallel loaders performances .

Some runs with the fix:
It globally improves the stability of the benchmark by reducing the timeouts.

Oct 12 2021, 6:27 PM · System administration, Storage manager

Oct 11 2021

vlorentz updated the task description for T3595: Support disordered directory entries in git.

Oct 11 2021, 2:49 PM · meta-task, Data Model, Storage manager

Oct 6 2021

vsellier added a comment to T3577: Parallel loaders performances .

The loader were finally stabilized. It was due to a wrong celery configuration.
Changing the pool configuration from solo to prefork solved the problem even if the concurrency is kept to one.
Solo looked indicated in environment like the POC but for obvious reasons, it was not working as expected:

Oct 6 2021, 2:11 AM · System administration, Storage manager

Oct 1 2021

vsellier added a comment to T3577: Parallel loaders performances .

intermediary status:

the bench lab is easily deployable on g5k on several workers to distribute the load [1]
it's working well when the load is not so high. When the number of worker is increased, it seems the workers have some issues to talk with rabbitmq:

[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-p9ds5                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-n6pvm                    
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-mrcjj                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-7bn4s                                                                                       
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-lg2bd

and also an unexplained time drift:

[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-lxjpl may mean clocks are out of sync.  Current drift is 
[loaders-77cdd444df-flcv9 loaders] 356 seconds.  [orig: 2021-09-30 23:46:55.447181 recv: 2021-09-30 23:40:59.633444]                                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders]                                                                                                                                                                            
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-jd6v9 may mean clocks are out of sync.  Current drift is                                                                                              
[loaders-77cdd444df-flcv9 loaders] 355 seconds.  [orig: 2021-09-30 23:46:55.447552 recv: 2021-09-30 23:41:00.723983]                                  
[loaders-77cdd444df-flcv9 loaders]

Oct 1 2021, 5:07 PM · System administration, Storage manager

vlorentz added a comment to T3552: Fix corrupted releases, revisions, and directories in the storage.

https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/analyze_consistency_failures.py

Oct 1 2021, 5:06 PM · Storage manager

Advanced SearchUse ResultsEdit QueryHide Query

Jan 28 2022

Jan 24 2022

Jan 21 2022

Jan 20 2022

Jan 19 2022

Jan 18 2022

Jan 13 2022

Jan 12 2022

Jan 11 2022

Jan 7 2022

Jan 6 2022

Dec 22 2021

Dec 7 2021

Dec 2 2021

Dec 1 2021

Nov 26 2021

Nov 15 2021

Nov 8 2021

Oct 27 2021

Oct 26 2021

Oct 22 2021

Oct 21 2021

Oct 20 2021

Oct 15 2021

Oct 14 2021

Oct 12 2021

Oct 11 2021

Oct 6 2021

Oct 1 2021

Advanced Search
Use Results
Edit Query
Hide Query