Page MenuHomeSoftware Heritage
Feed All Stories

Oct 4 2021

ardumont updated the task description for T3627: Consider dropping pull request references from the git loader ingestion.
Oct 4 2021, 4:09 PM · Git loader
ardumont added inline comments to D6380: Allow partial snapshot creation during ingestion.
Oct 4 2021, 4:07 PM
douardda accepted D6387: type_validator: Re-allow subclasses.

Oh well...

Oct 4 2021, 4:04 PM
vlorentz added a comment to D6380: Allow partial snapshot creation during ingestion.

The buffer proxy does not buffer visit statuses, so we need to flush the buffer before creating visit statuses.

Oct 4 2021, 3:59 PM
swh-public-ci added a comment to D6165: Add new RabbitMQ-based client/server API.

Build is green

Oct 4 2021, 3:59 PM
aeviso updated the diff for D6339: Add support for remote backend on existing storage tests.

rebase

Oct 4 2021, 3:57 PM
swh-public-ci added a comment to D6358: Make old StatsD metrics style compliant with the rest of the module.

Build is green

Oct 4 2021, 3:56 PM
aeviso updated the diff for D6165: Add new RabbitMQ-based client/server API.

rebase

Oct 4 2021, 3:56 PM
vlorentz closed D6381: Add support for Django 3.
Oct 4 2021, 3:55 PM
vlorentz committed rDDEPd1b840de268f: Add support for Django 3 (authored by vlorentz).
Add support for Django 3
Oct 4 2021, 3:55 PM
swh-public-ci added a comment to D6353: Add StatsD support to graph submodule.

Build is green

Oct 4 2021, 3:55 PM
swh-public-ci added a comment to D6352: Add StatsD support to provenance storage implementations.

Build is green

Oct 4 2021, 3:55 PM
swh-public-ci added a comment to D6351: Add StatsD support to provenance backend.

Build is green

Oct 4 2021, 3:54 PM
aeviso updated the diff for D6358: Make old StatsD metrics style compliant with the rest of the module.

rebase

Oct 4 2021, 3:53 PM
aeviso updated the diff for D6353: Add StatsD support to graph submodule.

rebase

Oct 4 2021, 3:52 PM
swh-public-ci added a comment to D6357: Split `Provenance::flush` method in two (one per layer).

Build is green

Oct 4 2021, 3:52 PM
aeviso updated the diff for D6352: Add StatsD support to provenance storage implementations.

rebase

Oct 4 2021, 3:52 PM
aeviso updated the diff for D6351: Add StatsD support to provenance backend.

rebase

Oct 4 2021, 3:51 PM
swh-public-ci added a comment to D6334: Add `close` method to both `ProvenanceInterface` and `ProvenanceStorageInterface`.

Build is green

Oct 4 2021, 3:50 PM
aeviso updated the summary of D6357: Split `Provenance::flush` method in two (one per layer).
Oct 4 2021, 3:49 PM
aeviso updated the diff for D6357: Split `Provenance::flush` method in two (one per layer).

rebase

Oct 4 2021, 3:49 PM
swh-public-ci added a comment to D6273: Remove remote storage based on `swh.core.api.RPCClient`.

Build is green

Oct 4 2021, 3:49 PM
swh-public-ci added a comment to D6272: Remove remote storage based on `swh.core.api.RPCClient`.

Build is green

Oct 4 2021, 3:49 PM
aeviso updated the diff for D6334: Add `close` method to both `ProvenanceInterface` and `ProvenanceStorageInterface`.

turn backend classes into context managers

Oct 4 2021, 3:47 PM
aeviso updated the diff for D6273: Remove remote storage based on `swh.core.api.RPCClient`.

rebase

Oct 4 2021, 3:46 PM
aeviso updated the diff for D6272: Remove remote storage based on `swh.core.api.RPCClient`.

rebase

Oct 4 2021, 3:46 PM
aeviso retitled D6273: Remove remote storage based on `swh.core.api.RPCClient` from Remove old client/server storage based on `swh.core.api.RPCClient` to Remove remote storage based on `swh.core.api.RPCClient`.
Oct 4 2021, 3:43 PM
swh-public-ci added a comment to D6273: Remove remote storage based on `swh.core.api.RPCClient`.

Build is green

Oct 4 2021, 3:42 PM
swh-public-ci added a comment to D6272: Remove remote storage based on `swh.core.api.RPCClient`.

Build is green

Oct 4 2021, 3:41 PM
aeviso updated the summary of D6273: Remove remote storage based on `swh.core.api.RPCClient`.
Oct 4 2021, 3:41 PM
aeviso retitled D6272: Remove remote storage based on `swh.core.api.RPCClient` from Rename remote storage backend classes to Remove remote storage based on `swh.core.api.RPCClient`.
Oct 4 2021, 3:40 PM
aeviso updated the diff for D6273: Remove remote storage based on `swh.core.api.RPCClient`.

squash with D6272

Oct 4 2021, 3:39 PM
aeviso updated the diff for D6272: Remove remote storage based on `swh.core.api.RPCClient`.

squash with D6273

Oct 4 2021, 3:38 PM
douardda created P1195 (An Untitled Masterwork).
Oct 4 2021, 3:31 PM
dachary added a comment to T3104: Persistent readonly perfect hash table.

it make sense to create a dedicated swh-perfecthash package.

Oct 4 2021, 3:06 PM · Object storage (RedHat collaboration)
dachary added a comment to T3104: Persistent readonly perfect hash table.

That I did not know, so indeed, if we need a specific wrapper for our needs, ...

Oct 4 2021, 3:04 PM · Object storage (RedHat collaboration)
dachary added a comment to T3104: Persistent readonly perfect hash table.
In addition to being unmaintained,

this could be addressed by asking authors to be in charge of the package

Oct 4 2021, 2:55 PM · Object storage (RedHat collaboration)
borisbaldassari abandoned D6393: maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME].

Thanks vlorentz. Done.

Oct 4 2021, 2:36 PM
vlorentz requested review of D6400: Fix label of 'Extrinsic metadata' buttons.
Oct 4 2021, 2:05 PM
stsp closed D6399: apply re-formatting suggested by the black code formatter.
Oct 4 2021, 2:03 PM
stsp committed rDLDCVS7f761b855071: apply re-formatting suggested by the black code formatter (authored by stsp).
apply re-formatting suggested by the black code formatter
Oct 4 2021, 2:03 PM
stsp added a comment to D6399: apply re-formatting suggested by the black code formatter.

Do you have the pre-commit installed in that repository yet? [1]

[1] https://docs.softwareheritage.org/devel/developer-setup.html#checkout-the-source-code

Oct 4 2021, 2:02 PM
zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

According to the snippet referenced by @ardumont, all branch names starting with refs/pull/ should be filtered out.
But in the recent snapshot of torvalds/linux there are a lot of branch names like that.
How come?

Oct 4 2021, 2:01 PM · Git loader
ardumont updated the task description for T3625: Reduce git loader memory footprint.
Oct 4 2021, 1:19 PM · Git loader
ardumont added a revision to T3025: git loaders are getting oom-killed repeatedly in prod: D5657: Spool large packfiles to disk instead of consuming tons of memory.
Oct 4 2021, 1:15 PM · Git loader, System administration
ardumont updated the summary of D5657: Spool large packfiles to disk instead of consuming tons of memory.
Oct 4 2021, 1:15 PM
ardumont updated the task description for T3457: Some git repositories are failing to be ingested because of MemoryError.
Oct 4 2021, 1:15 PM · Git loader
ardumont added a project to T3627: Consider dropping pull request references from the git loader ingestion: Git loader.
Oct 4 2021, 1:10 PM · Git loader
ardumont accepted D6381: Add support for Django 3.

/me *nods*

Oct 4 2021, 1:10 PM
vlorentz added a comment to D6381: Add support for Django 3.

And aside from the tox.ini/requirements.txt change, this diff actually replaces hacks with better code, so it's a win-win

Oct 4 2021, 1:09 PM
vlorentz added a comment to D6381: Add support for Django 3.

Yes, that's correct

Oct 4 2021, 1:08 PM
ardumont triaged T3627: Consider dropping pull request references from the git loader ingestion as High priority.
Oct 4 2021, 1:07 PM · Git loader
ardumont created T3627: Consider dropping pull request references from the git loader ingestion.
Oct 4 2021, 1:07 PM · Git loader
ardumont accepted D6399: apply re-formatting suggested by the black code formatter.

Good idea.

Oct 4 2021, 1:05 PM
stsp requested review of D6399: apply re-formatting suggested by the black code formatter.
Oct 4 2021, 12:54 PM
stsp closed D6389: make the black code formatter skip the pserver scramble shift table.
Oct 4 2021, 12:49 PM
stsp committed rDLDCVSea469457eda2: make the black code formatter skip the pserver scramble shift table (authored by stsp).
make the black code formatter skip the pserver scramble shift table
Oct 4 2021, 12:49 PM
jayeshv added a comment to T3608: Deprecate most of the /browse/origin/.* URLs.

@anlambert This ticket might have some performance implications.
for eg: in the first case, to redirect /browse/origin/directory/?origin_url=<> to the root directory, we have to query the archive first. The obvious way would be to call the get_snapshot_context function.
https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/snapshot_context.py$395

Oct 4 2021, 11:55 AM · Web app
douardda added a comment to T3611: Define the mapping for Bazaar repositories/branches to the SWH data model.

Ideally this doc would (briefly) describe how bazaar works and how it is different from already supported DVCS, then document chosen the "mapping" of the bzr model into swh (especially mentioning what is lost during this).

Oct 4 2021, 11:43 AM · Data Model, BZR loader
douardda added a comment to T3104: Persistent readonly perfect hash table.

@douardda

SWH I guess: I don't see the difference whether it's embedded in swh-objstorage, winery or a dedicated package.

If I understand correctly, you're suggesting that I create a package at the same level as https://forge.softwareheritage.org/source/puppet-swh-site/, right ? For instance https://forge.softwareheritage.org/source/swh-perfecthash/ by following the instructions from the documentation.

So does it make sense to use this package instead of reimplementing one? What's the catch?

In addition to being unmaintained,

Oct 4 2021, 11:39 AM · Object storage (RedHat collaboration)
douardda added a comment to T3611: Define the mapping for Bazaar repositories/branches to the SWH data model.

Would it be possible to add a "conception documentation" included in the docs/ of the BZR loader repo? (possibly with D6344 or as a standalone diff)?

Oct 4 2021, 10:48 AM · Data Model, BZR loader
ardumont created P1194 unknown reference in git loader with partial snapshot run warning becomes tedious....
Oct 4 2021, 10:41 AM
ardumont updated the summary of D6380: Allow partial snapshot creation during ingestion.
Oct 4 2021, 10:36 AM
dachary added a comment to T3104: Persistent readonly perfect hash table.

SWH I guess: I don't see the difference whether it's embedded in swh-objstorage, winery or a dedicated package.

Oct 4 2021, 10:36 AM · Object storage (RedHat collaboration)
ardumont updated the summary of D6380: Allow partial snapshot creation during ingestion.
Oct 4 2021, 10:36 AM
vlorentz added a comment to D6393: maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME].

@borisbaldassari Click "Add Action..." over the comment box, select "Abandon Revision", then submit

Oct 4 2021, 10:28 AM
ardumont updated the summary of D6380: Allow partial snapshot creation during ingestion.
Oct 4 2021, 9:55 AM
vsellier added a comment to T3592: POC elastic worker infrastructure.

keda looks promising. P1193 is an example of configuration working for the docker environment. It's able to scale to 0 when no messages are present on the queue.
When messages are present, the loaders are launched progressively until the limit of cpu/memory of the host is reached or the max number of allowed worker is reached.

Oct 4 2021, 9:21 AM · System administration
vsellier created P1193 keda configuration for docker environment.
Oct 4 2021, 9:19 AM
ardumont added a comment to P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
  • commands
/usr/bin/time -v swh loader run git https://github.com/CocoaPods/Specs
/usr/bin/time -v swh loader run git https://github.com/cozy/cozy-stack
/usr/bin/time -v swh loader run git https://github.com/hylang/hy
/usr/bin/time -v swh loader run git https://github.com/vsellier/easy-cozy
/usr/bin/time -v swh loader run git https://github.com/rancher/dashboard
/usr/bin/time -v swh loader run git https://github.com/kubernetes/kubectl
/usr/bin/time -v swh loader run git https://github.com/git/git
/usr/bin/time -v swh loader run git https://github.com/torvalds/linux
/usr/bin/time -v swh loader run git https://github.com/rust-lang/rust
Oct 4 2021, 8:09 AM
dachary abandoned D6398: test docker availability for integration tests.

That was just a test, trash it.

Oct 4 2021, 7:43 AM
dachary added a comment to D6398: test docker availability for integration tests.
07:38:12  py3 run-test: commands[0] | docker run debian:bullseye date
07:38:12  Unable to find image 'debian:bullseye' locally
07:38:12  bullseye: Pulling from library/debian
07:38:12  df5590a8898b: Already exists
07:38:12  Digest: sha256:86dddd82dddf445aea3d2ea26af46cebd727bf2f47ed810fa1450a0d79722d55
07:38:12  Status: Downloaded newer image for debian:bullseye
Oct 4 2021, 7:39 AM
swh-public-ci added a comment to D6398: test docker availability for integration tests.

Build is green

Oct 4 2021, 7:38 AM
dachary updated the diff for D6398: test docker availability for integration tests.

tox is called with explicit -e, adding a new environment is a noop unless the matching jenkins job is updated

Oct 4 2021, 7:36 AM
dachary requested review of D6398: test docker availability for integration tests.
Oct 4 2021, 7:31 AM
dachary added a revision to T3432: Add winery backend: D6398: test docker availability for integration tests.
Oct 4 2021, 7:30 AM · Object storage
dachary added a revision to T3104: Persistent readonly perfect hash table: D6397: add cmph dependency.
Oct 4 2021, 6:39 AM · Object storage (RedHat collaboration)

Oct 3 2021

ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 3 2021, 6:59 PM
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 3 2021, 6:20 PM
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 3 2021, 5:51 PM
ardumont added a comment to T3625: Reduce git loader memory footprint.

All runs done from medium to large repositories.
No diverging hash and consistently the loader-git ran with the patched version uses less memory.

Oct 3 2021, 5:44 PM · Git loader
ardumont added a comment to P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.

Run on large repositories:

|---------+-----------------+-------+-------------------------+-------------------------+------------------------|
| Machine | torvalds/linux  | refs  | Snapshot                | Memory (max RSS kbytes) | Elapsed Time (h:mm:ss) |
|---------+-----------------+-------+-------------------------+-------------------------+------------------------|
| staging | torvalds/linux  | 1496  | \xc2847...3fb4          |                 1361324 |                6:59:16 |
| prod    | //              | //    | \xc2847...3fb4          |                 3080408 |               24:13:11 |
|---------+-----------------+-------+-------------------------+-------------------------+------------------------|
| staging | CocoaPods/Specs | 14036 | X (hash mismatched) [1] |                 5789344 |               23:10:48 |
| prod    | //              | //    | X (killed) [2]          |                14280284 |               10:09:09 |
|---------+-----------------+-------+-------------------------+-------------------------+------------------------|
Oct 3 2021, 4:29 PM
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 3 2021, 4:11 PM
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 3 2021, 3:43 PM
swh-public-ci added a comment to D6380: Allow partial snapshot creation during ingestion.

Build is green

Oct 3 2021, 3:15 PM
ardumont added a comment to D6380: Allow partial snapshot creation during ingestion.

what about naming the parameter create_snapshot instead?

Fine with me.

Prior to this
commit, it was implied that the store_data could only be called once. It's a limitation
that needs to change for some ongoing optimizations in the loader git.

is it, though? it allows creating "partial" snapshots

Well, yes. But even with this, that's still the case (if create_snapshot is True after the first round).

Without this though, we cannot pass into more than one iteration of the loop (in the git loader
which is the sole running subclass instance of this). The ingestion fails because it wants
to create one snapshot after the first store_data called (so only one loop is allowed
with the current implem).

As it misses references to build the snapshot, it fails.
The reading of all references is done through multiple iterations (optimization ongoing to
read the packfiles into multiple steps)

Oct 3 2021, 3:14 PM
ardumont updated the summary of D6380: Allow partial snapshot creation during ingestion.
Oct 3 2021, 3:13 PM
ardumont retitled D6380: Allow partial snapshot creation during ingestion from Allow multiple calls to DVCSLoader.store_data implementation to Improve store_data implem to allow multiple calls with partial visit.
Oct 3 2021, 3:13 PM
ardumont updated the diff for D6380: Allow partial snapshot creation during ingestion.

Adapt according to latest analysis/development

Oct 3 2021, 3:12 PM
ardumont edited P1192 [draft-ml-devel] About reducing the loader-git memory consumption and overall work.
Oct 3 2021, 10:36 AM

Oct 2 2021

borisbaldassari requested review of D6396: Implement maven jar source files loader.
Oct 2 2021, 9:58 PM
borisbaldassari updated the summary of D6158: maven jar-loader: Initalise files..
Oct 2 2021, 9:57 PM
borisbaldassari added a revision to T1724: Maven Central repository support: D6396: Implement maven jar source files loader.
Oct 2 2021, 9:56 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
borisbaldassari updated the summary of D6393: maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME].
Oct 2 2021, 9:26 PM
borisbaldassari updated the summary of D6133: maven-lister: initialise lister..
Oct 2 2021, 9:25 PM
borisbaldassari added a revision to T1724: Maven Central repository support: D6395: lister: Add new maven lister.
Oct 2 2021, 9:21 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
borisbaldassari added a revision to T1724: Maven Central repository support: D6394: gitlab: Allow listing of instances providing multiple vcs_type [PLEASE DELETE ME].
Oct 2 2021, 9:06 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
borisbaldassari added a revision to T3590: opam loader: Ensure required opam state is shared amongst ingestion/listing runs: D6394: gitlab: Allow listing of instances providing multiple vcs_type [PLEASE DELETE ME].
Oct 2 2021, 9:06 PM · Archive coverage, Opam
borisbaldassari added a revision to T3581: List heptapod instance foss.heptapod.net: D6394: gitlab: Allow listing of instances providing multiple vcs_type [PLEASE DELETE ME].
Oct 2 2021, 9:06 PM · Archive coverage, System administration, Origin-GitLab
borisbaldassari retitled D6393: maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME] from maven-lister: initialise lister. maven-lister: update following review on D6133. to maven-lister: initialise lister.maven-lister: update following review on D6133. [PLEASE DELETE ME].
Oct 2 2021, 9:06 PM