Page MenuHomeSoftware Heritage
Feed All Stories

Oct 2 2021

borisbaldassari requested review of D6393: maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME].
Oct 2 2021, 8:52 PM
borisbaldassari added a revision to T1724: Maven Central repository support: D6393: maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME].
Oct 2 2021, 8:49 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
ardumont updated the title for P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work from [draft-ml-devel] About reducing the loader-git memory consumption to [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 2 2021, 6:17 PM
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 2 2021, 6:16 PM
ardumont merged task T3459: save code now: some requests are not getting updated into T3458: save code now: Requests are not getting updated from time to time.
Oct 2 2021, 1:04 PM · Save Code Now
ardumont merged T3459: save code now: some requests are not getting updated into T3458: save code now: Requests are not getting updated from time to time.
Oct 2 2021, 1:04 PM · Save Code Now
ardumont added a comment to T3625: Reduce git loader memory footprint.

A draft note to send to the #swh-devel ml is been drafted [1]
Open as draft for review first.

Oct 2 2021, 11:05 AM · Git loader
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 2 2021, 11:03 AM
ardumont created P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 2 2021, 10:56 AM
zack raised the priority of T3623: Run swh-graph with gunicorn to support multiple/parallel requests from Normal to High.
Oct 2 2021, 8:06 AM · Compressed graph service, System administration
zack raised the priority of T3624: Update swh-graph from 0.3.0 to 0.5.0 on granet from Normal to High.
Oct 2 2021, 8:06 AM · Compressed graph service, System administration
zack renamed T3623: Run swh-graph with gunicorn to support multiple/parallel requests from Run swh-graph with gunicorn to Run swh-graph with gunicorn to support multiple/parallel requests.
Oct 2 2021, 8:01 AM · Compressed graph service, System administration

Oct 1 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.
  1. "nonce" header is *after* gpgsig
  2. double "author" field in the original, and another commit with three "committer"....
  3. "mergetag" headers with an extra newline at the end (current versions of the loader strip it, looks like older ones didn't)
  4. "author xxx <yyy@gmail.com> <type 'int'> -0200" in original commit (dulwich obviously can't parse this)
Oct 1 2021, 8:50 PM · Archive content, Restricted Project
ardumont added a comment to T3625: Reduce git loader memory footprint.

[3] Another idea that was only discussed would be to make certain we first start by
ingesting in order tag references (under the assumption that we will then ingest mostly
in natural order the repository). Then focus on the remaining references (because mostly
there is a high probability that if we start with HEAD and/or master at firstz, we will
end up with the overall repository).

Oct 1 2021, 6:56 PM · Git loader
ardumont added a revision to T3625: Reduce git loader memory footprint: D6392: git: Ingest ordered tags then ordered branches references.
Oct 1 2021, 6:55 PM · Git loader
ardumont updated subscribers of T3625: Reduce git loader memory footprint.

D6377 actually increased the memory footprint to the point of getting ingestion killed
fast. So closed!

Oct 1 2021, 6:24 PM · Git loader
ardumont added a revision to T3625: Reduce git loader memory footprint: D6377: wip: git: Group objects per type early to drop the packfile reference asap.
Oct 1 2021, 6:07 PM · Git loader
ardumont updated the summary of D6377: wip: git: Group objects per type early to drop the packfile reference asap.
Oct 1 2021, 6:07 PM
ardumont renamed T3625: Reduce git loader memory footprint from Improve git loader memory footprint to Reduce git loader memory footprint.
Oct 1 2021, 6:06 PM · Git loader
ardumont updated the task description for T3625: Reduce git loader memory footprint.
Oct 1 2021, 6:06 PM · Git loader
ardumont closed T3583: check icinga alert for svn save-code-now as Resolved.

It resolved itself, it's green again.

Oct 1 2021, 6:04 PM · Scheduling utilities, Save Code Now, Monitoring
ardumont closed T3583: check icinga alert for svn save-code-now, a subtask of T3458: save code now: Requests are not getting updated from time to time, as Resolved.
Oct 1 2021, 6:04 PM · Save Code Now
ardumont closed D6391: Add specific test to the filtering branch function.
Oct 1 2021, 5:49 PM
ardumont committed rDLDG8d371e815cc6: Add specific test to the filtering branch function (authored by ardumont).
Add specific test to the filtering branch function
Oct 1 2021, 5:48 PM
dachary added a comment to T3104: Persistent readonly perfect hash table.

SWH I guess: I don't see the difference whether it's embedded in swh-objstorage, winery or a dedicated package.

Oct 1 2021, 5:47 PM · Object storage (RedHat collaboration)
anlambert accepted D6391: Add specific test to the filtering branch function.
Oct 1 2021, 5:32 PM
swh-public-ci added a comment to D6391: Add specific test to the filtering branch function.

Build is green

Oct 1 2021, 5:31 PM
ardumont updated the diff for D6391: Add specific test to the filtering branch function.

Thanks!

Oct 1 2021, 5:29 PM
ardumont added inline comments to D6391: Add specific test to the filtering branch function.
Oct 1 2021, 5:26 PM
vlorentz added inline comments to D6391: Add specific test to the filtering branch function.
Oct 1 2021, 5:25 PM
vlorentz accepted D6391: Add specific test to the filtering branch function.
Oct 1 2021, 5:24 PM
ardumont requested review of D6391: Add specific test to the filtering branch function.
Oct 1 2021, 5:20 PM
vsellier added a comment to T3577: Parallel loaders performances .

intermediary status:

  • the bench lab is easily deployable on g5k on several workers to distribute the load [1]
  • it's working well when the load is not so high. When the number of worker is increased, it seems the workers have some issues to talk with rabbitmq:
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-p9ds5                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-n6pvm                    
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-mrcjj                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-7bn4s                                                                                       
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,449: INFO/MainProcess] missed heartbeat from celery@loaders-77cdd444df-lg2bd

and also an unexplained time drift:

[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-lxjpl may mean clocks are out of sync.  Current drift is 
[loaders-77cdd444df-flcv9 loaders] 356 seconds.  [orig: 2021-09-30 23:46:55.447181 recv: 2021-09-30 23:40:59.633444]                                                                                                                                                                     
[loaders-77cdd444df-flcv9 loaders]                                                                                                                                                                            
[loaders-77cdd444df-flcv9 loaders] [2021-09-30 23:46:55,447: WARNING/MainProcess] Substantial drift from celery@loaders-77cdd444df-jd6v9 may mean clocks are out of sync.  Current drift is                                                                                              
[loaders-77cdd444df-flcv9 loaders] 355 seconds.  [orig: 2021-09-30 23:46:55.447552 recv: 2021-09-30 23:41:00.723983]                                  
[loaders-77cdd444df-flcv9 loaders]
Oct 1 2021, 5:07 PM · System administration, Storage manager
vlorentz added a comment to T3552: Fix corrupted releases, revisions, and directories in the storage.

https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/analyze_consistency_failures.py

Oct 1 2021, 5:06 PM · Storage manager
vlorentz committed rDSNIP54486974c31b: analyze_consistency_failures.py: Add more heuristics + write out… (authored by vlorentz).
analyze_consistency_failures.py: Add more heuristics + write out…
Oct 1 2021, 5:06 PM
vlorentz changed the status of T3552: Fix corrupted releases, revisions, and directories in the storage, a subtask of T887: Vault: "snapshot" cooker, from Open to Work in Progress.
Oct 1 2021, 5:04 PM · Vault
vlorentz changed the status of T3552: Fix corrupted releases, revisions, and directories in the storage from Open to Work in Progress.
Oct 1 2021, 5:04 PM · Storage manager
vlorentz changed the status of T3552: Fix corrupted releases, revisions, and directories in the storage, a subtask of T3551: Fix git-fsck errors in the git-bare cooker, from Open to Work in Progress.
Oct 1 2021, 5:04 PM · Vault
vlorentz claimed T3552: Fix corrupted releases, revisions, and directories in the storage.
Oct 1 2021, 5:04 PM · Storage manager
zack triaged T3626: graph API: add ?limit parameter to /leaves endpoint as Low priority.
Oct 1 2021, 4:53 PM · Easy hack, Compressed graph service
ardumont created P1191 (An Untitled Masterwork).
Oct 1 2021, 4:51 PM
vsellier committed rDSNIP7cc495e333e2: grid5000/cassandra: kubernetes configuration for massive parallel loader test (authored by vsellier).
grid5000/cassandra: kubernetes configuration for massive parallel loader test
Oct 1 2021, 4:37 PM
ardumont accepted D6389: make the black code formatter skip the pserver scramble shift table.
Oct 1 2021, 4:22 PM
vsellier added a comment to T3592: POC elastic worker infrastructure.

Intermediary status:

  • We have successfully ran loaders in staging using the helm chart we have wrote [1] and an hardcoded number of worker, It adds the possibility to perform rolling upgrades for example
  • We have tried the integrated horizontal pod autoscaler [2], it works pretty well but it's not adapted for our worker scenario. It's based on the cpu consumption(on our test [3], but can be other things) of the pod to decide if the number of running pods must be upscaled or downscaled. It can be very useful to manage classical load like for gunicorn container, but not for the scenario of long running tasks
  • Kubernetes also has some functionalities to reduce the pressure on a node when some limts are reached but it looks like it's more emergency actions than proper scaling management. It's configured at the kubelet level and not dynamic at all [4]. It was rapidly tested but we have lost the node due to oom before the node eviction starts.
Oct 1 2021, 4:18 PM · System administration
anlambert closed D6390: search: Add query language support for staff users.
Oct 1 2021, 3:57 PM
anlambert committed rDWAPPSd7ed7cae590d: search: Add query language support for staff users (authored by anlambert).
search: Add query language support for staff users
Oct 1 2021, 3:57 PM
swh-public-ci added a comment to D6390: search: Add query language support for staff users.

Build is green

Oct 1 2021, 3:51 PM
swh-public-ci added a comment to D6389: make the black code formatter skip the pserver scramble shift table.

Build is green

Oct 1 2021, 3:43 PM
stsp updated the diff for D6389: make the black code formatter skip the pserver scramble shift table.

rebased patch

Oct 1 2021, 3:41 PM
stsp closed D6388: fix cvs pserver authentication error.
Oct 1 2021, 3:40 PM
stsp committed rDLDCVS2b41f5205f62: fix pserver authentication error (authored by stsp).
fix pserver authentication error
Oct 1 2021, 3:39 PM
anlambert updated the diff for D6390: search: Add query language support for staff users.

Rebase

Oct 1 2021, 3:36 PM
anlambert added a comment to D6390: search: Add query language support for staff users.

Oh, I forgot we already had the code for this. Thanks :)

Oct 1 2021, 3:34 PM
vlorentz accepted D6390: search: Add query language support for staff users.

Oh, I forgot we already had the code for this. Thanks :)

Oct 1 2021, 3:27 PM
douardda added a comment to T3104: Persistent readonly perfect hash table.

Wouldn't it make sense to put the cffi-based cmph wrapper in a dedicated python module/project (not necessarily under the swh namespace)?

It would but who would maintain it in the long run ?

Oct 1 2021, 3:19 PM · Object storage (RedHat collaboration)
anlambert requested review of D6390: search: Add query language support for staff users.
Oct 1 2021, 3:18 PM
vlorentz accepted D6388: fix cvs pserver authentication error.
Oct 1 2021, 3:12 PM
anlambert added a revision to T2254: textual search language for the Web UI: D6390: search: Add query language support for staff users.
Oct 1 2021, 3:03 PM · Archive search, Web app
stsp requested review of D6389: make the black code formatter skip the pserver scramble shift table.
Oct 1 2021, 2:58 PM
jayeshv closed T3601: Use PostgreSQL backend for django database in tests as Resolved.
Oct 1 2021, 2:56 PM · Web app
stsp requested review of D6388: fix cvs pserver authentication error.
Oct 1 2021, 2:47 PM
douardda added a comment to D6339: Add support for remote backend on existing storage tests.

IMHO This diff should be squashed in D6165 (it's really part of the work adding the rabbitmq-based backend).

Oct 1 2021, 2:39 PM
vlorentz requested review of D6387: type_validator: Re-allow subclasses.
Oct 1 2021, 2:34 PM
jayeshv closed D6372: Replace Sqlite with Postgres in unit tests.
Oct 1 2021, 2:32 PM
jayeshv committed rDWAPPSa41f09064176: Use PostgreSQL backend in django database in tests (authored by jayeshv).
Use PostgreSQL backend in django database in tests
Oct 1 2021, 2:32 PM
douardda accepted D6272: Remove remote storage based on `swh.core.api.RPCClient`.

as @olasd should be squashed, but meh

Oct 1 2021, 2:32 PM
douardda accepted D6273: Remove remote storage based on `swh.core.api.RPCClient`.
Oct 1 2021, 2:30 PM
swh-public-ci added a comment to D6372: Replace Sqlite with Postgres in unit tests.

Build is green

Oct 1 2021, 2:17 PM
anlambert added a comment to D6372: Replace Sqlite with Postgres in unit tests.

Fixed the issue in timestamp comparison

Oct 1 2021, 2:14 PM
swh-public-ci added a comment to D6372: Replace Sqlite with Postgres in unit tests.

Build is green

Oct 1 2021, 2:10 PM
ardumont added a revision to T3625: Reduce git loader memory footprint: D6386: git: Load git repository through multiple packfiles fetch operations.
Oct 1 2021, 2:09 PM · Git loader
douardda requested changes to D6334: Add `close` method to both `ProvenanceInterface` and `ProvenanceStorageInterface`.

Look to me that this open/close interface really should come with a context manager.

Oct 1 2021, 2:07 PM
jayeshv updated the diff for D6372: Replace Sqlite with Postgres in unit tests.

Updated requirements

Oct 1 2021, 2:01 PM
douardda accepted D6358: Make old StatsD metrics style compliant with the rest of the module.

I still think it's best to use the wrapped function name as "method" but meh

Oct 1 2021, 1:59 PM
jayeshv updated the diff for D6372: Replace Sqlite with Postgres in unit tests.

Fixed the issue in timestamp comparison

Oct 1 2021, 1:56 PM
ardumont committed rDLDGa34aefbf0a7d: Unify logging instructions to use module logger instance (authored by ardumont).
Unify logging instructions to use module logger instance
Oct 1 2021, 1:38 PM
ardumont closed D6384: Unify log instruction to use the module logger instance.
Oct 1 2021, 1:38 PM
ardumont closed D6385: git: Add debugging log around the packfile retrieval step.
Oct 1 2021, 1:38 PM
ardumont committed rDLDG1d69d2be3b3b: git: Add debugging log around the packfile retrieval step (authored by ardumont).
git: Add debugging log around the packfile retrieval step
Oct 1 2021, 1:38 PM
vlorentz accepted D6384: Unify log instruction to use the module logger instance.
Oct 1 2021, 1:29 PM
ardumont updated the summary of D6384: Unify log instruction to use the module logger instance.
Oct 1 2021, 1:18 PM
anlambert accepted D6385: git: Add debugging log around the packfile retrieval step.

Looks good to me !

Oct 1 2021, 1:13 PM
swh-public-ci added a comment to D6385: git: Add debugging log around the packfile retrieval step.

Build is green

Oct 1 2021, 1:09 PM
ardumont added inline comments to D6385: git: Add debugging log around the packfile retrieval step.
Oct 1 2021, 1:08 PM
ardumont updated the diff for D6385: git: Add debugging log around the packfile retrieval step.

Adapt according to sound suggestion

Oct 1 2021, 1:07 PM
anlambert added inline comments to D6385: git: Add debugging log around the packfile retrieval step.
Oct 1 2021, 12:58 PM
anlambert accepted D6372: Replace Sqlite with Postgres in unit tests.

Looks good to me, I added two nitpick comments for small improvements.

Oct 1 2021, 12:35 PM
moranegg added a comment to T2344: Build a connector for software deposit via Zenodo/InvenioRDM.
  1. September 2021 update
Oct 1 2021, 12:32 PM · meta-task, Roadmap 2022, Roadmap 2020, SWORD deposit, Scientific Community Building
ardumont accepted D6382: make "yarn install" non-fatal in swh-web entrypoint.

fine to me, @anlambert @vlorentz , thoughts?

Oct 1 2021, 12:26 PM
ardumont added 1 blocking reviewer(s) for D6382: make "yarn install" non-fatal in swh-web entrypoint: Reviewers.
Oct 1 2021, 12:26 PM
swh-public-ci added a comment to D6385: git: Add debugging log around the packfile retrieval step.

Build is green

Oct 1 2021, 12:26 PM
swh-public-ci added a comment to D6384: Unify log instruction to use the module logger instance.

Build is green

Oct 1 2021, 12:25 PM
ardumont updated the diff for D6385: git: Add debugging log around the packfile retrieval step.

Rebase

Oct 1 2021, 12:24 PM
ardumont updated the diff for D6384: Unify log instruction to use the module logger instance.

Update from_disk module as well

Oct 1 2021, 12:23 PM
ardumont requested review of D6385: git: Add debugging log around the packfile retrieval step.
Oct 1 2021, 12:18 PM
ardumont requested review of D6384: Unify log instruction to use the module logger instance.
Oct 1 2021, 12:15 PM
vlorentz created P1190 (An Untitled Masterwork).
Oct 1 2021, 12:02 PM
jayeshv requested review of D6372: Replace Sqlite with Postgres in unit tests.
Oct 1 2021, 11:58 AM
stsp closed D6383: fix TypeError due to wrong LoadCvsRepository task description.
Oct 1 2021, 11:51 AM
stsp committed rDLDCVS62b0f560ef7e: fix TypeError due to wrong LoadCvsRepository task description (authored by stsp).
fix TypeError due to wrong LoadCvsRepository task description
Oct 1 2021, 11:51 AM