Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 13 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6940: tests: Use 'offset_bytes' instead of 'offset'/'negative_utc', D6939: Stop passing 'offset' and 'negative_utc' to TimestampWithTimezone(), D6938: tests: Replace 'offset' and 'negative_utc' with 'offset_bytes', D6937: Remove 'offset' and 'negative_utc', D6935: deposit: Remove 'negative_utc' from test data.
Jan 13 2022, 12:26 PM · Data Model, Storage manager
vlorentz added a revision to T3752: Store/represent time offsets as strings: D6936: TimestampWithTimezone: Make 'offset' and 'negative_utc' optional.
Jan 13 2022, 12:16 PM · Data Model, Storage manager
vlorentz added a revision to T3752: Store/represent time offsets as strings: D6929: Remove 'negative_utc'..
Jan 13 2022, 11:28 AM · Data Model, Storage manager

Jan 12 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6927: Remove special handling of negative_utc, D6923: converters: Write raw_manifest of Directory objects, D6894: converters: Write object_bytes and raw_manifest on revisions and releases.
Jan 12 2022, 3:24 PM · Data Model, Storage manager

Jan 11 2022

vlorentz added a revision to T3752: Store/represent time offsets as strings: D6915: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor.
Jan 11 2022, 3:25 PM · Data Model, Storage manager
vlorentz added revisions to T3752: Store/represent time offsets as strings: D6911: Remove strdate_to_timestamp, D6913: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor, D6910: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor, D6909: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor, D6908: tests: Use TimestampWithTimezone.from_datetime() instead of the constructor.
Jan 11 2022, 2:15 PM · Data Model, Storage manager
douardda renamed T3841: regularly scrub all the data stores of swh from regularly scrub all the data sources of swh to regularly scrub all the data stores of swh.
Jan 11 2022, 12:32 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager
douardda removed a project from T3841: regularly scrub all the data stores of swh: Roadmap 2021.
Jan 11 2022, 12:31 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager
douardda triaged T3841: regularly scrub all the data stores of swh as Normal priority.
Jan 11 2022, 12:31 PM · Datastore Scrubber, meta-task, Roadmap 2022, Storage manager

Jan 7 2022

vlorentz added revisions to T3752: Store/represent time offsets as strings: D6848: Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel, D6890: git_objects: Use raw offset_bytes to format dates, and remove format_offset().
Jan 7 2022, 1:54 PM · Data Model, Storage manager
vlorentz added revisions to T3753: Store original git manifests: D6801: model: Add a raw_manifest attribute, D6811: model: Exclude 'raw_manifest' from dictionaries when it is null, D6847: hypothesis_strategies: Generate raw_manifest, D6848: Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel.
Jan 7 2022, 1:54 PM · Data Model, Storage manager

Jan 6 2022

vlorentz added revisions to T3577: Parallel loaders performances : D6888: cassandra: Rewrite content_missing to run queries concurrently., D6885: cassandra: Use concurrent queries in *_missing() instead of naive grouping.
Jan 6 2022, 5:32 PM · System administration, Storage manager

Dec 22 2021

vlorentz closed T3585: Fix inconsistencies of the Cassandra backend with postgres as Wontfix.
Dec 22 2021, 2:34 PM · meta-task, Storage manager
vlorentz closed T3585: Fix inconsistencies of the Cassandra backend with postgres, a subtask of T1892: Cassandra as a storage backend, as Wontfix.
Dec 22 2021, 2:34 PM · meta-task, Storage manager

Dec 7 2021

vlorentz added a revision to T3752: Store/represent time offsets as strings: D6776: Add attribute TimestampWithTimezone.offset_bytes, to store raw Git offsets.
Dec 7 2021, 4:51 PM · Data Model, Storage manager
anlambert closed T3776: cassandra tests are failing in the swh-environment build as Resolved by committing rDSTO615fb99eb708: test_cassandra: Fix failing tests since swh-model update.
Dec 7 2021, 1:56 PM · Storage manager
anlambert added a revision to T3776: cassandra tests are failing in the swh-environment build: D6768: test_cassandra: Fix failing tests since swh-model update.
Dec 7 2021, 1:38 PM · Storage manager
vsellier triaged T3776: cassandra tests are failing in the swh-environment build as High priority.
Dec 7 2021, 1:16 PM · Storage manager

Dec 2 2021

vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 4:04 PM · Data Model, Storage manager
vlorentz closed T3586: Figure out what to do with 'misordered' directories in Cassandra, a subtask of T3585: Fix inconsistencies of the Cassandra backend with postgres, as Resolved.
Dec 2 2021, 3:14 PM · meta-task, Storage manager
vlorentz closed T3586: Figure out what to do with 'misordered' directories in Cassandra as Resolved.

We don't care anymore, this will be handled by T3753.

Dec 2 2021, 3:14 PM · Data Model, Storage manager
vlorentz removed a parent task for T3752: Store/represent time offsets as strings: T3753: Store original git manifests.
Dec 2 2021, 3:01 PM · Data Model, Storage manager
vlorentz removed a subtask for T3753: Store original git manifests: T3752: Store/represent time offsets as strings.
Dec 2 2021, 3:01 PM · Data Model, Storage manager
vlorentz added a parent task for T3752: Store/represent time offsets as strings: T3753: Store original git manifests.
Dec 2 2021, 3:00 PM · Data Model, Storage manager
vlorentz added a subtask for T3753: Store original git manifests: T3752: Store/represent time offsets as strings.
Dec 2 2021, 3:00 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:59 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:55 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:52 PM · Data Model, Storage manager
vlorentz updated the task description for T3753: Store original git manifests.
Dec 2 2021, 2:48 PM · Data Model, Storage manager
vlorentz updated the task description for T3752: Store/represent time offsets as strings.
Dec 2 2021, 2:22 PM · Data Model, Storage manager
vsellier closed T3357: Perform some tests of the cassandra storage on Grid5000, a subtask of T1892: Cassandra as a storage backend, as Resolved.
Dec 2 2021, 10:10 AM · meta-task, Storage manager
vsellier closed T3357: Perform some tests of the cassandra storage on Grid5000 as Resolved.

The slide of the restrospective of the experiment are available at : https://hedgedoc.softwareheritage.org/VOP9qh1MTqm4DjPQfFgNbQ

Dec 2 2021, 10:10 AM · System administration, Storage manager
vsellier closed T3573: [cassandra] directory and content read benchmarks, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Dec 2 2021, 10:08 AM · System administration, Storage manager
vsellier closed T3573: [cassandra] directory and content read benchmarks as Resolved.

It was not easy to know if it's a lot of call or long running calls because it's regular sample and we don't have this granularity.

Dec 2 2021, 10:08 AM · System administration, Storage manager

Dec 1 2021

zack moved T2053: support graph export for the cassandra backend from Backlog to Deployed on the Compressed graph service board.
Dec 1 2021, 4:37 PM · Compressed graph service, Storage manager
zack moved T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs from Backlog to Deployed on the Compressed graph service board.
Dec 1 2021, 4:35 PM · Compressed graph service, Storage manager

Nov 26 2021

vlorentz removed a project from T3752: Store/represent time offsets as strings: meta-task.
Nov 26 2021, 5:19 PM · Data Model, Storage manager
vlorentz removed a project from T3753: Store original git manifests: meta-task.
Nov 26 2021, 5:19 PM · Data Model, Storage manager
vlorentz claimed T3594: Faithfully store weird git objects.
Nov 26 2021, 4:43 PM · meta-task, Data Model, Storage manager
vlorentz claimed T3753: Store original git manifests.
Nov 26 2021, 4:43 PM · Data Model, Storage manager
vlorentz triaged T3753: Store original git manifests as Normal priority.
Nov 26 2021, 4:43 PM · Data Model, Storage manager
vlorentz triaged T3752: Store/represent time offsets as strings as Normal priority.
Nov 26 2021, 4:42 PM · Data Model, Storage manager
vlorentz closed T3598: Support revisions with "extra headers" not at the end, a subtask of T3594: Faithfully store weird git objects, as Wontfix.
Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3598: Support revisions with "extra headers" not at the end as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · Data Model, Storage manager
vlorentz closed T3596: Support "weird" permissions in directories as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3596: Support "weird" permissions in directories, a subtask of T3594: Faithfully store weird git objects, as Wontfix.
Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3595: Support disordered directory entries in git, a subtask of T3594: Faithfully store weird git objects, as Wontfix.
Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz closed T3595: Support disordered directory entries in git as Wontfix.

We decided to store manifests instead. T3594#74385

Nov 26 2021, 4:41 PM · meta-task, Data Model, Storage manager
vlorentz added a comment to T3594: Faithfully store weird git objects.

Copy of an email I sent today:

Nov 26 2021, 4:40 PM · meta-task, Data Model, Storage manager
vlorentz added a revision to T399: (Re-)Compute data checksums before insertion: D6281: converters: Recompute hashes and check they match the originals.
Nov 26 2021, 3:52 PM · Storage manager
douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6693: Add support for a redis-based reporter for failed replayed objects.
Nov 26 2021, 1:33 PM · Storage manager

Nov 15 2021

vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.
Nov 15 2021, 9:46 AM · System administration, Storage manager

Nov 8 2021

vsellier closed T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Nov 8 2021, 9:55 AM · System administration, Storage manager

Oct 27 2021

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6571: Add support for a redis-based reporting for invalid mirrorred objects.
Oct 27 2021, 6:24 PM · Storage manager
douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6565: Pass the object_type to JournalClient.value_serializer().
Oct 27 2021, 4:19 PM · Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 27 2021, 2:08 PM · meta-task, Data Model, Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 27 2021, 2:03 PM · meta-task, Data Model, Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 27 2021, 2:03 PM · meta-task, Data Model, Storage manager

Oct 26 2021

douardda added a revision to T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror): D6554: [WIP] Add a (redis-based) validation error reporting facility.
Oct 26 2021, 5:48 PM · Storage manager
douardda triaged T3693: Provide a mecanism to report (with persistence) objects that fails to get replayed (mirror) as High priority.
Oct 26 2021, 5:41 PM · Storage manager

Oct 22 2021

ardumont added a comment to T3595: Support disordered directory entries in git.

I came across a rather small repository [1] which i believe raise the same issue.
So it may help to keep its reference to ease the testing of the improvment discussed here.
Feel free to dismiss if not that useful.

Oct 22 2021, 1:54 PM · meta-task, Data Model, Storage manager
vsellier reopened T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Work in Progress.
Oct 22 2021, 11:49 AM · System administration, Storage manager
vsellier closed T3683: cassandra - benchmark the vault, a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Oct 22 2021, 11:48 AM · System administration, Storage manager

Oct 21 2021

vsellier added revisions to T3577: Parallel loaders performances : D6423: cassandra: Add alternative algorithms to list missing objects, D6494: cassandra: Fix incomplete check of content existence in object_find_by_sha1_git, D6495: cassandra: Rewrite content_missing to run queries concurrently..
Oct 21 2021, 2:58 PM · System administration, Storage manager
vsellier closed T3577: Parallel loaders performances , a subtask of T3357: Perform some tests of the cassandra storage on Grid5000, as Resolved.
Oct 21 2021, 2:56 PM · System administration, Storage manager
vsellier closed T3577: Parallel loaders performances as Resolved.

Stopping here the investigations as the limit is now the hardware running cassandra (HDD). Great improvements were made compared to the first tries

Oct 21 2021, 2:56 PM · System administration, Storage manager
vsellier added a comment to T3577: Parallel loaders performances .

Changing the cache size doesn't seem very effective for the particular workload of the loaders:
Exception some contextual differences due probably due to the different origin loaded, the performances are quite similar.
The hit ratios between the different configurations are also very close

Oct 21 2021, 2:36 PM · System administration, Storage manager
vlorentz added a revision to T3135: Improve integrity of ingested content: D6504: converters: Fix detection of tree entries with non-standard commit/tree mode..
Oct 21 2021, 10:57 AM · Storage manager, Roadmap 2021, meta-task

Oct 20 2021

vsellier added a comment to T3577: Parallel loaders performances .

Closing here the tests of the workers.

Oct 20 2021, 5:43 PM · System administration, Storage manager
vsellier added a comment to T3577: Parallel loaders performances .

This is the impacts of the different algorithm usage for retrieving the missing objects.
The configuration is the same as previously, the storage is patched with D6423, D6494 and D6495.
The first without D6423 is a version patched witj D6494 and D6495 only

Oct 20 2021, 4:56 PM · System administration, Storage manager
vsellier added a comment to T3577: Parallel loaders performances .

This is the impact of configuration the add batch insert size. Tested values were 500, 1000 and 2000.
A full repair of a cassandra node was in progress due to a datalost (disk erased by someone else's experiment after the disk reservation expiration)

Oct 20 2021, 2:50 PM · System administration, Storage manager

Oct 15 2021

vlorentz updated the task description for T3594: Faithfully store weird git objects.
Oct 15 2021, 2:38 PM · meta-task, Data Model, Storage manager

Oct 14 2021

olasd updated subscribers of T1617: Experiment with generation numbers to improve revisions walk performance.

@vlorentz mentioned this idea in the context of T3655 (git loader global deduplication).

Oct 14 2021, 12:00 PM · Storage manager
vlorentz added a comment to T3573: [cassandra] directory and content read benchmarks.

What "directory_entries_insert_algo" did you use for this?

Oct 14 2021, 11:32 AM · System administration, Storage manager
vsellier added a comment to T3573: [cassandra] directory and content read benchmarks.

Some flame graphs of storage was performed during the ingestion with 50 workers in //

Oct 14 2021, 10:08 AM · System administration, Storage manager

Oct 12 2021

vsellier added a comment to T3577: Parallel loaders performances .

Some runs with the fix:
It globally improves the stability of the benchmark by reducing the timeouts.

Oct 12 2021, 6:27 PM · System administration, Storage manager

Oct 11 2021

vlorentz updated the task description for T3595: Support disordered directory entries in git.
Oct 11 2021, 2:49 PM · meta-task, Data Model, Storage manager

Oct 6 2021

vsellier added a comment to T3577: Parallel loaders performances .

The loader were finally stabilized. It was due to a wrong celery configuration.
Changing the pool configuration from solo to prefork solved the problem even if the concurrency is kept to one.
Solo looked indicated in environment like the POC but for obvious reasons, it was not working as expected:

Oct 6 2021, 2:11 AM · System administration, Storage manager

Oct 1 2021

vsellier added a comment to T3577: Parallel loaders performances .

intermediary status:

  • the bench lab is easily deployable on g5k on several workers to distribute the load [1]
  • it's working well when the load is not so high. When the number of worker is increased, it seems the workers have some issues to talk with rabbitmq:
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-p9ds5                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-n6pvm                    
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-mrcjj                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-7bn4s                                                                                       
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-lg2bd

and also an unexplained time drift:

[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-lxjpl may mean clocks are out of sync.  Current drift is 
[loaders-77cdd444df-flcv9 loaders] 356 seconds.  [orig: 2021-09-30 23:46:55.447181 recv: 2021-09-30 23:40:59.633444]                                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders]                                                                                                                                                                            
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-jd6v9 may mean clocks are out of sync.  Current drift is                                                                                              
[loaders-77cdd444df-flcv9 loaders] 355 seconds.  [orig: 2021-09-30 23:46:55.447552 recv: 2021-09-30 23:41:00.723983]                                  
[loaders-77cdd444df-flcv9 loaders]
Oct 1 2021, 5:07 PM · System administration, Storage manager
vlorentz added a comment to T3552: Fix corrupted releases, revisions, and directories in the storage.

https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/analyze_consistency_failures.py

Oct 1 2021, 5:06 PM · Storage manager
vlorentz changed the status of T3552: Fix corrupted releases, revisions, and directories in the storage from Open to Work in Progress.
Oct 1 2021, 5:04 PM · Storage manager
vlorentz claimed T3552: Fix corrupted releases, revisions, and directories in the storage.
Oct 1 2021, 5:04 PM · Storage manager

Sep 24 2021

vlorentz added a parent task for T3594: Faithfully store weird git objects: T3552: Fix corrupted releases, revisions, and directories in the storage.
Sep 24 2021, 3:13 PM · meta-task, Data Model, Storage manager
vlorentz added a subtask for T3552: Fix corrupted releases, revisions, and directories in the storage: T3594: Faithfully store weird git objects.
Sep 24 2021, 3:13 PM · Storage manager

Sep 23 2021

vlorentz renamed T3552: Fix corrupted releases, revisions, and directories in the storage from Fix corrupted releases and revisions in the storage to Fix corrupted releases, revisions, and directories in the storage.
Sep 23 2021, 3:55 PM · Storage manager
vlorentz added a parent task for T3607: Document consistency guarantees of the loaders with respect to the storage: T3604: Document the architecture of all major packages/components.
Sep 23 2021, 3:00 PM · Data Model, Storage manager, Package Loader, Core Loader, Documentation
vlorentz triaged T3607: Document consistency guarantees of the loaders with respect to the storage as Normal priority.
Sep 23 2021, 3:00 PM · Data Model, Storage manager, Package Loader, Core Loader, Documentation
vlorentz added a parent task for T3333: Document the different storage backends: T3604: Document the architecture of all major packages/components.
Sep 23 2021, 2:51 PM · Documentation, Storage manager

Sep 22 2021

vlorentz triaged T3598: Support revisions with "extra headers" not at the end as Low priority.
Sep 22 2021, 4:00 PM · Data Model, Storage manager
vlorentz added a comment to T3596: Support "weird" permissions in directories.

Complete proposal for the above solution:

Sep 22 2021, 2:56 PM · meta-task, Data Model, Storage manager
vlorentz added a comment to T3595: Support disordered directory entries in git.

Complete proposal to implement the above solution:

Sep 22 2021, 2:51 PM · meta-task, Data Model, Storage manager
vlorentz updated the task description for T3586: Figure out what to do with 'misordered' directories in Cassandra.
Sep 22 2021, 1:44 PM · Data Model, Storage manager
vlorentz closed T3582: cassandra: Use 'git ordering' for directory entries, a subtask of T3585: Fix inconsistencies of the Cassandra backend with postgres, as Wontfix.
Sep 22 2021, 1:44 PM · meta-task, Storage manager
vlorentz closed T3582: cassandra: Use 'git ordering' for directory entries as Wontfix.

Heh actually that's not an issue, the directory_get_entries documentation does not guarantee an order.

Sep 22 2021, 1:44 PM · Storage manager
vlorentz updated the task description for T3594: Faithfully store weird git objects.
Sep 22 2021, 1:42 PM · meta-task, Data Model, Storage manager
vlorentz added a comment to T3596: Support "weird" permissions in directories.

Possible solution: store them as an ascii string instead of an integer.

Sep 22 2021, 1:38 PM · meta-task, Data Model, Storage manager
vlorentz added a comment to T3595: Support disordered directory entries in git.

Possible solution: store a rank along with each directory entry, but ignore it unless we are reconstructing a git object or computing a SWHID (v1?)

Sep 22 2021, 1:37 PM · meta-task, Data Model, Storage manager
vlorentz triaged T3596: Support "weird" permissions in directories as Normal priority.
Sep 22 2021, 1:36 PM · meta-task, Data Model, Storage manager
vlorentz updated the task description for T3595: Support disordered directory entries in git.
Sep 22 2021, 1:34 PM · meta-task, Data Model, Storage manager
vlorentz triaged T3595: Support disordered directory entries in git as Normal priority.
Sep 22 2021, 1:34 PM · meta-task, Data Model, Storage manager