Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 28 2022

douardda closed T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror) as Resolved.
Jan 28 2022, 9:38 AM · Storage manager

Jan 24 2022

douardda triaged T3882: Handle updated kafka messages for the storage replayer as High priority.
Jan 24 2022, 5:22 PM · Storage manager, Mirror
vlorentz added a revision to T3878: Fix existing corrupt objects: D6957: Add recover_corrupt_objects.py.
Jan 24 2022, 11:02 AM · Storage manager
vlorentz triaged T3878: Fix existing corrupt objects as Normal priority.
Jan 24 2022, 11:01 AM · Storage manager

Jan 21 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D7008: Stop using the deprecated 'TimestampWithTimezone.offset' attribute, D7007: Stop using the deprecated 'TimestampWithTimezone.offset' attribute, D7006: Stop using the deprecated 'TimestampWithTimezone.offset' attribute, D7005: Add method 'TimestampWithTimezone.offset_minutes', D7003: journal: Document the new format for gitdate..
Jan 21 2022, 1:29 PM · Data Model, Storage manager

Jan 20 2022

ardumont closed T3869: Deploy storage v0.41.2 as Resolved.
Jan 20 2022, 3:51 PM · System administration, Storage manager
ardumont moved T3869: Deploy storage v0.41.2 from in-progress to deployed/landed/monitoring on the System administration board.
Jan 20 2022, 3:42 PM · System administration, Storage manager
ardumont added a comment to T3869: Deploy storage v0.41.2.

migration done.

Jan 20 2022, 3:42 PM · System administration, Storage manager
ardumont updated the task description for T3869: Deploy storage v0.41.2.
Jan 20 2022, 3:00 PM · System administration, Storage manager
ardumont added a comment to T3869: Deploy storage v0.41.2.

belvedere migration status: first index on directory created, ongoing index creation for revision, and then release.

Jan 20 2022, 3:00 PM · System administration, Storage manager
ardumont added a comment to T3869: Deploy storage v0.41.2.

staging, prod: storage deployed and service restarted (it's not dependent on the sql migration to be complete).

Jan 20 2022, 11:52 AM · System administration, Storage manager
ardumont updated the task description for T3869: Deploy storage v0.41.2.
Jan 20 2022, 10:48 AM · System administration, Storage manager
ardumont added a comment to T3869: Deploy storage v0.41.2.
  • staging db already migrated during the deployment of T3861.
  • production db migration ongoing
Jan 20 2022, 10:37 AM · System administration, Storage manager
ardumont changed the status of T3869: Deploy storage v0.41.2 from Open to Work in Progress.
Jan 20 2022, 10:33 AM · System administration, Storage manager
ardumont updated the task description for T3869: Deploy storage v0.41.2.
Jan 20 2022, 10:33 AM · System administration, Storage manager
ardumont triaged T3869: Deploy storage v0.41.2 as Normal priority.
Jan 20 2022, 10:33 AM · System administration, Storage manager

Jan 19 2022

olasd closed T3819: Deploy swh.model 4.1.0 / swh.storage 0.41.0 to production, a subtask of T3752: Store/represent time offsets as strings, as Resolved.
Jan 19 2022, 7:12 PM · Data Model, Storage manager

Jan 18 2022

olasd merged task T2449: Consider switching timestamp offset storage to strings/byte arrays into T3752: Store/represent time offsets as strings.
Jan 18 2022, 12:27 PM · Storage manager, Data Model
olasd merged T2449: Consider switching timestamp offset storage to strings/byte arrays into T3752: Store/represent time offsets as strings.
Jan 18 2022, 12:27 PM · Data Model, Storage manager
olasd added a subtask for T3752: Store/represent time offsets as strings: T3819: Deploy swh.model 4.1.0 / swh.storage 0.41.0 to production.
Jan 18 2022, 12:26 PM · Data Model, Storage manager

Jan 13 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6940: tests: Use 'offset_bytes' instead of 'offset'/'negative_utc', D6939: Stop passing 'offset' and 'negative_utc' to TimestampWithTimezone(), D6938: tests: Replace 'offset' and 'negative_utc' with 'offset_bytes', D6937: Remove 'offset' and 'negative_utc', D6935: deposit: Remove 'negative_utc' from test data.
Jan 13 2022, 12:26 PM · Data Model, Storage manager
vlorentz added a revision to T3752: Store/represent time offsets as strings: D6936: TimestampWithTimezone: Make 'offset' and 'negative_utc' optional.
Jan 13 2022, 12:16 PM · Data Model, Storage manager
vlorentz added a revision to T3752: Store/represent time offsets as strings: D6929: Remove 'negative_utc'..
Jan 13 2022, 11:28 AM · Data Model, Storage manager

Jan 12 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6927: Remove special handling of negative_utc, D6923: converters: Write raw_manifest of Directory objects, D6894: converters: Write object_bytes and raw_manifest on revisions and releases.
Jan 12 2022, 3:24 PM · Data Model, Storage manager

Jan 11 2022

vlorentz added a revision to T3752: Store/represent time offsets as strings: D6915: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor.
Jan 11 2022, 3:25 PM · Data Model, Storage manager
vlorentz added revisions to T3752: Store/represent time offsets as strings: D6911: Remove strdate_to_timestamp, D6913: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor, D6910: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor, D6909: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor, D6908: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor.
Jan 11 2022, 2:15 PM · Data Model, Storage manager
douardda renamed T3841: regularly scrub all the data stores of swh from regularly scrub all the data sources of swh to regularly scrub all the data stores of swh.
Jan 11 2022, 12:32 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager
douardda removed a project from T3841: regularly scrub all the data stores of swh: Roadmap 2021.
Jan 11 2022, 12:31 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager
douardda triaged T3841: regularly scrub all the data stores of swh as Normal priority.
Jan 11 2022, 12:31 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager

Jan 7 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6848: Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel, D6890: git_objects: Use raw offset_bytes to format dates, and remove format_offset().
Jan 7 2022, 1:54 PM · Data Model, Storage manager
vlorentz added revisions to T3753: Store original git manifests: D6801: model: Add a raw_manifest attribute, D6811: model: Exclude 'raw_manifest' from dictionaries when it is null, D6847: hypothesis_strategies: Generate raw_manifest, D6848: Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel.
Jan 7 2022, 1:54 PM · Data Model, Storage manager

Jan 6 2022

vlorentz added revisions to T3577: Parallel loaders performances : D6888: cassandra: Rewrite content_missing to run queries concurrently., D6885: cassandra: Use concurrent queries in *_missing() instead of naive grouping.
Jan 6 2022, 5:32 PM · System administration, Storage manager

Dec 22 2021

vlorentz closed T3585: Fix inconsistencies of the Cassandra backend with postgres as Wontfix.
Dec 22 2021, 2:34 PM · meta-task, Storage manager
vlorentz closed T3585: Fix inconsistencies of the Cassandra backend with postgres, a subtask of T1892: Cassandra as a storage backend, as Wontfix.
Dec 22 2021, 2:34 PM · meta-task, Storage manager

Dec 7 2021

vlorentz added a revision to T3752: Store/represent time offsets as strings: D6776: Add attribute TimestampWithTimezone.offset_bytes, to store raw Git offsets.
Dec 7 2021, 4:51 PM · Data Model, Storage manager
anlambert closed T3776: cassandra tests are failing in the swh-environment build as Resolved by committing rDSTO615fb99eb708: test_cassandra: Fix failing tests since swh-model update.
Dec 7 2021, 1:56 PM · Storage manager
anlambert added a revision to T3776: cassandra tests are failing in the swh-environment build: D6768: test_cassandra: Fix failing tests since swh-model update.
Dec 7 2021, 1:38 PM · Storage manager
vsellier triaged T3776: cassandra tests are failing in the swh-environment build as High priority.
Dec 7 2021, 1:16 PM · Storage manager

Dec 2 2021

vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 4:04 PM · Data Model, Storage manager
vlorentz closed T3586: Figure out what to do with 'misordered' directories in Cassandra, a subtask of T3585: Fix inconsistencies of the Cassandra backend with postgres, as Resolved.
Dec 2 2021, 3:14 PM · meta-task, Storage manager
vlorentz closed T3586: Figure out what to do with 'misordered' directories in Cassandra as Resolved.

We don't care anymore, this will be handled by T3753.

Dec 2 2021, 3:14 PM · Data Model, Storage manager
vlorentz removed a parent task for T3752: Store/represent time offsets as strings: T3753: Store original git manifests.
Dec 2 2021, 3:01 PM · Data Model, Storage manager
vlorentz removed a subtask for T3753: Store original git manifests: T3752: Store/represent time offsets as strings.
Dec 2 2021, 3:01 PM · Data Model, Storage manager
vlorentz added a parent task for T3752: Store/represent time offsets as strings: T3753: Store original git manifests.
Dec 2 2021, 3:00 PM · Data Model, Storage manager
vlorentz added a subtask for T3753: Store original git manifests: T3752: Store/represent time offsets as strings.
Dec 2 2021, 3:00 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:59 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:55 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:52 PM · Data Model, Storage manager
vlorentz updated the task description for T3753: Store original git manifests.
Dec 2 2021, 2:48 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:22 PM · Data Model, Storage manager
vsellier closed T3357: Perform some tests of the cassandra storage on Grid5000, a subtask of T1892: Cassandra as a storage backend, as Resolved.
Dec 2 2021, 10:10 AM · meta-task, Storage manager
vsellier closed T3357: Perform some tests of the cassandra storage on Grid5000 as Resolved.

The slide of the restrospective of the experiment are available at : https://hedgedoc.softwareheritage.org/VOP9qh1MTqm4DjPQfFgNbQ

Dec 2 2021, 10:10 AM · System administration, Storage manager
vsellier closed T3573: [cassandra] directory and content read benchmarks, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Dec 2 2021, 10:08 AM · System administration, Storage manager
vsellier closed T3573: [cassandra] directory and content read benchmarks as Resolved.

It was not easy to know if it's a lot of call or long running calls because it's regular sample and we don't have this granularity.

Dec 2 2021, 10:08 AM · System administration, Storage manager

Dec 1 2021

zack moved T2053: support graph export for the cassandra backend from Backlog to Deployed on the Compressed graph service board.
Dec 1 2021, 4:37 PM · Compressed graph service, Storage manager
zack moved T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs from Backlog to Deployed on the Compressed graph service board.
Dec 1 2021, 4:35 PM · Compressed graph service, Storage manager

Nov 26 2021

vlorentz removed a project from T3752: Store/represent time offsets as strings: meta-task.
Nov 26 2021, 5:19 PM · Data Model, Storage manager
vlorentz removed a project from T3753: Store original git manifests: meta-task.
Nov 26 2021, 5:19 PM · Data Model, Storage manager
vlorentz claimed T3594: Faithfully store weird git objects.
Nov 26 2021, 4:43 PM · meta-task, Data Model, Storage manager
vlorentz claimed T3753: Store original git manifests.
Nov 26 2021, 4:43 PM · Data Model, Storage manager
vlorentz triaged T3753: Store original git manifests as Normal priority.
Nov 26 2021, 4:43 PM · Data Model, Storage manager
vlorentz triaged T3752: Store/represent time offsets as strings as Normal priority.
Nov 26 2021, 4:42 PM · Data Model, Storage manager
vlorentz closed T3598: Support revisions with "extra headers" not at the end, a subtask of T3594: Faithfully store weird git objects, as Wontfix.
Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3598: Support revisions with "extra headers" not at the end as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · Data Model, Storage manager
vlorentz closed T3596: Support "weird" permissions in directories as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3596: Support "weird" permissions in directories, a subtask of T3594: Faithfully store weird git objects, as Wontfix.
Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3595: Support disordered directory entries in git, a subtask of T3594: Faithfully store weird git objects, as Wontfix.
Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3595: Support disordered directory entries in git as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz added a comment to T3594: Faithfully store weird git objects.

Copy of an email I sent today:

Nov 26 2021, 4:40 PM · meta-task, Data Model, Storage manager
vlorentz added a revision to T399: (Re-)Compute data checksums before insertion: D6281: converters: Recompute hashes and check they match the originals.
Nov 26 2021, 3:52 PM · Storage manager
douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6693: Add support for a redis-based reporter for failed replayed objects.
Nov 26 2021, 1:33 PM · Storage manager

Nov 15 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.
Nov 15 2021, 9:46 AM · System administration, Storage manager

Nov 8 2021

vsellier closed T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Nov 8 2021, 9:55 AM · System administration, Storage manager

Oct 27 2021

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6571: Add support for a redis-based reporting for invalid mirrorred objects.
Oct 27 2021, 6:24 PM · Storage manager
douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6565: Pass the object_type to JournalClient.value_serializer().
Oct 27 2021, 4:19 PM · Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 27 2021, 2:08 PM · meta-task, Data Model, Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 27 2021, 2:03 PM · meta-task, Data Model, Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 27 2021, 2:03 PM · meta-task, Data Model, Storage manager

Oct 26 2021

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6554: [WIP] Add a (redis-based) validation error reporting facility.
Oct 26 2021, 5:48 PM · Storage manager
douardda triaged T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror) as High priority.
Oct 26 2021, 5:41 PM · Storage manager

Oct 22 2021

ardumont added a comment to T3595: Support disordered directory entries in git.

I came across a rather small repository [1] which i believe raise the same issue.
So it may help to keep its reference to ease the testing of the improvment discussed here.
Feel free to dismiss if not that useful.

Oct 22 2021, 1:54 PM · meta-task, Data Model, Storage manager
vsellier reopened T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Work in Progress.
Oct 22 2021, 11:49 AM · System administration, Storage manager
vsellier closed T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Oct 22 2021, 11:48 AM · System administration, Storage manager

Oct 21 2021

vsellier added revisions to T3577: Parallel loaders performances : D6423: cassandra: Add alternative algorithms to list missing objects, D6494: cassandra: Fix incomplete check of content existence in object_find_by_sha1_git, D6495: cassandra: Rewrite content_missing to run queries concurrently..
Oct 21 2021, 2:58 PM · System administration, Storage manager
vsellier closed T3577: Parallel loaders performances , a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Oct 21 2021, 2:56 PM · System administration, Storage manager
vsellier closed T3577: Parallel loaders performances as Resolved.

Stopping here the investigations as the limit is now the hardware running cassandra (HDD). Great improvements were made compared to the first tries

Oct 21 2021, 2:56 PM · System administration, Storage manager
vsellier added a comment to T3577: Parallel loaders performances .

Changing the cache size doesn't seem very effective for the particular workload of the loaders:
Exception some contextual differences due probably due to the different origin loaded, the performances are quite similar.
The hit ratios between the different configurations are also very close

Oct 21 2021, 2:36 PM · System administration, Storage manager
vlorentz added a revision to T3135: Improve integrity of ingested content: D6504: converters: Fix detection of tree entries with non-standard commit/tree mode..
Oct 21 2021, 10:57 AM · Storage manager, Roadmap 2021, meta-task

Oct 20 2021

vsellier added a comment to T3577: Parallel loaders performances .

Closing here the tests of the workers.

Oct 20 2021, 5:43 PM · System administration, Storage manager
vsellier added a comment to T3577: Parallel loaders performances .

This is the impacts of the different algorithm usage for retrieving the missing objects.
The configuration is the same as previously, the storage is patched with D6423, D6494 and D6495.
The first without D6423 is a version patched witj D6494 and D6495 only

Oct 20 2021, 4:56 PM · System administration, Storage manager
vsellier added a comment to T3577: Parallel loaders performances .

This is the impact of configuration the add batch insert size. Tested values were 500, 1000 and 2000.
A full repair of a cassandra node was in progress due to a datalost (disk erased by someone else's experiment after the disk reservation expiration)

Oct 20 2021, 2:50 PM · System administration, Storage manager

Oct 15 2021

vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 15 2021, 2:38 PM · meta-task, Data Model, Storage manager

Oct 14 2021

olasd updated subscribers of T1617: Experiment with generation numbers to improve revisions walk performance.

@vlorentz mentioned this idea in the context of T3655 (git loader global deduplication).

Oct 14 2021, 12:00 PM · Storage manager
vlorentz added a comment to T3573: [cassandra] directory and content read benchmarks.

What "directory_entries_insert_algo" did you use for this?

Oct 14 2021, 11:32 AM · System administration, Storage manager
vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

Some flame graphs of storage was performed during the ingestion with 50 workers in //

Oct 14 2021, 10:08 AM · System administration, Storage manager

Oct 12 2021

vsellier added a comment to T3577: Parallel loaders performances .

Some runs with the fix:
It globally improves the stability of the benchmark by reducing the timeouts.

Oct 12 2021, 6:27 PM · System administration, Storage manager

Oct 11 2021

vlorentz updated the task description for T3595: Support disordered directory entries in git.
Oct 11 2021, 2:49 PM · meta-task, Data Model, Storage manager

Oct 6 2021

vsellier added a comment to T3577: Parallel loaders performances .

The loader were finally stabilized. It was due to a wrong celery configuration.
Changing the pool configuration from solo to prefork solved the problem even if the concurrency is kept to one.
Solo looked indicated in environment like the POC but for obvious reasons, it was not working as expected:

Oct 6 2021, 2:11 AM · System administration, Storage manager

Oct 1 2021

vsellier added a comment to T3577: Parallel loaders performances .

intermediary status:

  • the bench lab is easily deployable on g5k on several workers to distribute the load [1]
  • it's working well when the load is not so high. When the number of worker is increased, it seems the workers have some issues to talk with rabbitmq:
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-p9ds5                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-n6pvm                    
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-mrcjj                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-7bn4s                                                                                       
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-lg2bd

and also an unexplained time drift:

[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-lxjpl may mean clocks are out of sync.  Current drift is 
[loaders-77cdd444df-flcv9 loaders] 356 seconds.  [orig: 2021-09-30 23:46:55.447181 recv: 2021-09-30 23:40:59.633444]                                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders]                                                                                                                                                                            
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-jd6v9 may mean clocks are out of sync.  Current drift is                                                                                              
[loaders-77cdd444df-flcv9 loaders] 355 seconds.  [orig: 2021-09-30 23:46:55.447552 recv: 2021-09-30 23:41:00.723983]                                  
[loaders-77cdd444df-flcv9 loaders]
Oct 1 2021, 5:07 PM · System administration, Storage manager
vlorentz added a comment to T3552: Fix corrupted releases, revisions, and directories in the storage.

https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/analyze_consistency_failures.py

Oct 1 2021, 5:06 PM · Storage manager