Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 11 2021

douardda added a comment to T3592: POC elastic worker infrastructure.

just a quick remark about the scheduling of (sub)tasks of this task: IMHO the autoscaling should come last; all the supervision/monitoring/logging related tasks are much more important than the autoscaling.

Oct 11 2021, 10:29 AM · System administration

Oct 8 2021

douardda updated the diff for D6442: Extract the path slicing logic in a dedicated PathSlicer class.

Better docstrings and kill a few map()

Oct 8 2021, 4:19 PM
douardda added inline comments to D6442: Extract the path slicing logic in a dedicated PathSlicer class.
Oct 8 2021, 3:53 PM
douardda closed D6444: docker: configure and document the APP evironment variable for celery.
Oct 8 2021, 3:51 PM
douardda committed rDENVeefd5e532124: docker: configure and document the APP evironment variable for celery (authored by douardda).
docker: configure and document the APP evironment variable for celery
Oct 8 2021, 3:51 PM
douardda accepted D6443: buffer: add a threshold for the number of directory entries in one batch.

Thx

Oct 8 2021, 3:46 PM
douardda updated the diff for D6444: docker: configure and document the APP evironment variable for celery.

be a bit more consistent...

Oct 8 2021, 3:29 PM
douardda added a comment to D6410: Allow application/x-msgpack deserialization again.

as @vlorentz pointed out [1], this change should be irrelevant though...

[1] https://github.com/celery/kombu/blob/master/kombu/serialization.py#L369-L372

does not seem to be the proper fix.

FTR, using the celery cli tool directly from a development venv to interact with the celery server running in the docker compose test setup (as described there ) used to work ok, but not any more.

One have to specify the app, like:

celery --app=swh.scheduler.celery_backend.config.app status

[edit] I use celery 5.1.2 in my venv.

Oct 8 2021, 3:27 PM
douardda requested review of D6444: docker: configure and document the APP evironment variable for celery.
Oct 8 2021, 3:25 PM
douardda added a comment to T3632: Investigate the ContentDisallowed exception.

Unless I'm mistaken, this error does not appear in sentry any more, right?

Oct 8 2021, 3:06 PM · Scheduling utilities
douardda added a comment to D6410: Allow application/x-msgpack deserialization again.

as @vlorentz pointed out [1], this change should be irrelevant though...

[1] https://github.com/celery/kombu/blob/master/kombu/serialization.py#L369-L372

Oct 8 2021, 3:04 PM
douardda accepted D6427: swh.storage filter/buffer improvements.

looks fine to me

Oct 8 2021, 2:53 PM
douardda accepted D6428: docs: Add a save forge documentation.

Ok but see my 2 (nitpicky) comments

Oct 8 2021, 2:49 PM
douardda accepted D6431: Rename imports of swh.model.identifiers to fix deprecation warnings..

LGTM thx

Oct 8 2021, 2:46 PM
douardda added a comment to T3104: Persistent readonly perfect hash table.

[FTR] See https://forge.softwareheritage.org/source/swh-perfecthash/

Oct 8 2021, 2:43 PM · Object storage (RedHat collaboration)
douardda updated the diff for D6442: Extract the path slicing logic in a dedicated PathSlicer class.

allow the pathslicer to be a noop (with an empty slicing)

Oct 8 2021, 2:12 PM
douardda requested review of D6442: Extract the path slicing logic in a dedicated PathSlicer class.
Oct 8 2021, 2:10 PM

Oct 7 2021

douardda accepted D6401: Filter out pull request related branches.

LGTM

Oct 7 2021, 9:32 AM

Oct 6 2021

douardda added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

FTR without D6401, the packfile received from GH for the CocoaPods/Specs repo contains 21162 references, 21146 of which are starting with /refs/pull/ and 7126 are ending with /merge (even if those have been explicitly not asked thanks to the filtering in RepoRepresentation.determine_wanted().
When D6401 is applied, we only get the 20-ish references that are not pull request related.

Oct 6 2021, 2:56 PM · Git loader

Oct 5 2021

douardda committed rMSLDfef6e8ca5b60: EOSC-Pillar F2F meeting: presentation of the UC6.4 (authored by douardda).
EOSC-Pillar F2F meeting: presentation of the UC6.4
Oct 5 2021, 5:03 PM
douardda added a comment to T3633: staging/production - Kafka access for ENEA mirror.

token for the prod will be needed after that as well, thanks

Oct 5 2021, 3:40 PM · System administration
douardda committed rCDFPe00b10ea28c8: Fix memcache config in web.yml (authored by douardda).
Fix memcache config in web.yml
Oct 5 2021, 2:32 PM
douardda committed rCDFP44d8b4cad1ed: Fix replayers' entrypoint script (authored by douardda).
Fix replayers' entrypoint script
Oct 5 2021, 2:32 PM
douardda committed rCDFP1831360b6c84: Improve posgresql config, especially for swh-web (authored by douardda).
Improve posgresql config, especially for swh-web
Oct 5 2021, 2:32 PM
douardda committed rCDFP5de6a2ec92ea: Dockerfile: install postgresql-client in swh-web image (authored by douardda).
Dockerfile: install postgresql-client in swh-web image
Oct 5 2021, 2:32 PM
douardda committed rCDFPb466ad7a743f: Improve nginx config (authored by douardda).
Improve nginx config
Oct 5 2021, 2:32 PM
douardda committed rCDFPc36f34d1e137: Add support for postgresql as swh-web database (authored by douardda).
Add support for postgresql as swh-web database
Oct 5 2021, 2:32 PM
douardda committed rCDFPf1cf061a3177: Add explicit rw and Z to volume definitions (authored by Jonas Eriksson <jonas.eriksson@fossid.com>).
Add explicit rw and Z to volume definitions
Oct 5 2021, 2:32 PM
douardda committed rCDFP0fb09c414448: Storage conf: Point to correct objstorage port (authored by Jonas Eriksson <jonas.eriksson@fossid.com>).
Storage conf: Point to correct objstorage port
Oct 5 2021, 2:32 PM
douardda closed D6403: docker: use a dedicated container for the cron-like job of swh-web.
Oct 5 2021, 10:56 AM
douardda committed rDENVaf0a2af3e7c8: docker: use a dedicated container for the cron-like job of swh-web (authored by douardda).
docker: use a dedicated container for the cron-like job of swh-web
Oct 5 2021, 10:56 AM
douardda closed D6402: docker: Do not limit the list of task types handled by swh-scheduler-runner-priority.
Oct 5 2021, 10:56 AM
douardda committed rDENVebb07bdae059: docker: Do not limit the list of task types handled by swh-scheduler-runner… (authored by douardda).
docker: Do not limit the list of task types handled by swh-scheduler-runner…
Oct 5 2021, 10:56 AM
douardda added a comment to D6165: Add new RabbitMQ-based client/server API.

Also there is no real value in keeping 3 revisions: the last 2 revisions actually improve/modify the code from the first revision.

Oct 5 2021, 10:47 AM
douardda added a comment to D6339: Add support for remote backend on existing storage tests.

this should be squashed with the previous diff, and still my previous question about .gitignore

Oct 5 2021, 10:45 AM
douardda accepted D6165: Add new RabbitMQ-based client/server API.

As others (and I) said, this must come with actual documentation.
As is, I have hard time understanding how this actually works (even after reading the document in hedgdoc).

Oct 5 2021, 10:39 AM
douardda updated the diff for D6403: docker: use a dedicated container for the cron-like job of swh-web.

indent...

Oct 5 2021, 10:01 AM
douardda updated the diff for D6403: docker: use a dedicated container for the cron-like job of swh-web.

improve entrypoint script to properly handle a SIGTERM

Oct 5 2021, 9:59 AM
douardda accepted D6334: Add `close` method to both `ProvenanceInterface` and `ProvenanceStorageInterface`.
Oct 5 2021, 9:34 AM
douardda added a comment to D6334: Add `close` method to both `ProvenanceInterface` and `ProvenanceStorageInterface`.

looks ok to me. Just one question, why do you need __future__.annotation?

Oct 5 2021, 9:34 AM

Oct 4 2021

douardda requested review of D6403: docker: use a dedicated container for the cron-like job of swh-web.
Oct 4 2021, 5:33 PM
douardda requested review of D6402: docker: Do not limit the list of task types handled by swh-scheduler-runner-priority.
Oct 4 2021, 5:33 PM
douardda accepted D6387: type_validator: Re-allow subclasses.

Oh well...

Oct 4 2021, 4:04 PM
douardda created P1195 (An Untitled Masterwork).
Oct 4 2021, 3:31 PM
douardda added a comment to T3611: Define the mapping for Bazaar repositories/branches to the SWH data model.

Ideally this doc would (briefly) describe how bazaar works and how it is different from already supported DVCS, then document chosen the "mapping" of the bzr model into swh (especially mentioning what is lost during this).

Oct 4 2021, 11:43 AM · Data Model, BZR loader
douardda added a comment to T3104: Persistent readonly perfect hash table.

@douardda

SWH I guess: I don't see the difference whether it's embedded in swh-objstorage, winery or a dedicated package.

If I understand correctly, you're suggesting that I create a package at the same level as https://forge.softwareheritage.org/source/puppet-swh-site/, right ? For instance https://forge.softwareheritage.org/source/swh-perfecthash/ by following the instructions from the documentation.

So does it make sense to use this package instead of reimplementing one? What's the catch?

In addition to being unmaintained,

Oct 4 2021, 11:39 AM · Object storage (RedHat collaboration)
douardda added a comment to T3611: Define the mapping for Bazaar repositories/branches to the SWH data model.

Would it be possible to add a "conception documentation" included in the docs/ of the BZR loader repo? (possibly with D6344 or as a standalone diff)?

Oct 4 2021, 10:48 AM · Data Model, BZR loader

Oct 1 2021

douardda added a comment to T3104: Persistent readonly perfect hash table.

Wouldn't it make sense to put the cffi-based cmph wrapper in a dedicated python module/project (not necessarily under the swh namespace)?

It would but who would maintain it in the long run ?

Oct 1 2021, 3:19 PM · Object storage (RedHat collaboration)
douardda added a comment to D6339: Add support for remote backend on existing storage tests.

IMHO This diff should be squashed in D6165 (it's really part of the work adding the rabbitmq-based backend).

Oct 1 2021, 2:39 PM
douardda accepted D6272: Remove remote storage based on `swh.core.api.RPCClient`.

as @olasd should be squashed, but meh

Oct 1 2021, 2:32 PM
douardda accepted D6273: Remove remote storage based on `swh.core.api.RPCClient`.
Oct 1 2021, 2:30 PM
douardda requested changes to D6334: Add `close` method to both `ProvenanceInterface` and `ProvenanceStorageInterface`.

Look to me that this open/close interface really should come with a context manager.

Oct 1 2021, 2:07 PM
douardda accepted D6358: Make old StatsD metrics style compliant with the rest of the module.

I still think it's best to use the wrapped function name as "method" but meh

Oct 1 2021, 1:59 PM

Sep 30 2021

douardda accepted D6376: SvnLoaderFromRemoteDump: Fix failed visit which should be an uneventful visit.

Looks ok (not sure I really understand the fix however, more precisely, what was the purpose of the revision_start != 1 condition), but I really don't understand the commit message:

Sep 30 2021, 9:40 AM

Sep 29 2021

douardda created P1184 (An Untitled Masterwork).
Sep 29 2021, 4:19 PM
douardda added a comment to T3104: Persistent readonly perfect hash table.

Ideally, since the perfecthash feature will be needed only for a specific objstorage backend, it should be an optional dependency.

Wouldn't it make sense to put the cffi-based cmph wrapper in a dedicated python module/project (not necessarily under the swh namespace)?

Or use this one maybe https://github.com/GregBowyer/cmph-cffi ?

Source for the cmph-cffi package in pypi seems to be https://github.com/venkateshks/cmph-cffi (well at least there are tags in there)

Sep 29 2021, 12:00 PM · Object storage (RedHat collaboration)
douardda added a comment to T3104: Persistent readonly perfect hash table.

Ideally, since the perfecthash feature will be needed only for a specific objstorage backend, it should be an optional dependency.

Sep 29 2021, 11:55 AM · Object storage (RedHat collaboration)
douardda committed rDSTO113088ab065b: replay: add type annotation for process_replay_objects() (authored by douardda).
replay: add type annotation for process_replay_objects()
Sep 29 2021, 11:26 AM
douardda committed rDSTO9a3589f24cae: replay: fix raw_extrinsic_metadata insertion and type annotation (authored by douardda).
replay: fix raw_extrinsic_metadata insertion and type annotation
Sep 29 2021, 11:26 AM
douardda closed D6370: replay: fix raw_extrinsic_metadata insertion and type annotation.
Sep 29 2021, 11:26 AM
douardda committed rDSTO21aff2d1cbab: replay: fix annotation of collision_aware_content_add() (authored by douardda).
replay: fix annotation of collision_aware_content_add()
Sep 29 2021, 11:26 AM
douardda updated the diff for D6370: replay: fix raw_extrinsic_metadata insertion and type annotation.

add type annotation for process_replay_objects()

Sep 29 2021, 11:02 AM
douardda updated the diff for D6370: replay: fix raw_extrinsic_metadata insertion and type annotation.

rebase

Sep 29 2021, 10:49 AM
douardda accepted D6341: model: Replace attrs-strict with stricter validation.

Test coverage looks fairly complete, thx

Sep 29 2021, 10:45 AM

Sep 28 2021

douardda created P1181 (An Untitled Masterwork).
Sep 28 2021, 4:54 PM
douardda created P1179 (An Untitled Masterwork).
Sep 28 2021, 4:46 PM

Sep 27 2021

douardda added inline comments to D6353: Add StatsD support to graph submodule.
Sep 27 2021, 5:00 PM
douardda added inline comments to D6358: Make old StatsD metrics style compliant with the rest of the module.
Sep 27 2021, 4:55 PM
douardda accepted D6351: Add StatsD support to provenance backend.
Sep 27 2021, 4:50 PM
douardda added a comment to T3487: Installation of the new provenance server.
  • postgresql:13
    • 1000 parallel connections allowed
Sep 27 2021, 4:39 PM · System administration
douardda added inline comments to D6340: opam: Define a initialize_opam_root parameter for opam loader.
Sep 27 2021, 3:29 PM
douardda closed D6347: docker: use a dedicated container for memcached.
Sep 27 2021, 3:15 PM
douardda committed rDENV9dd8181b0107: docker: use a dedicated container for memcached (authored by douardda).
docker: use a dedicated container for memcached
Sep 27 2021, 3:15 PM
douardda added inline comments to D6353: Add StatsD support to graph submodule.
Sep 27 2021, 2:57 PM
douardda accepted D6353: Add StatsD support to graph submodule.
Sep 27 2021, 2:56 PM
douardda added a comment to D6352: Add StatsD support to provenance storage implementations.

Might be possible to simplify this a bit using a similar approach to https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/metrics.py$16-26

Sep 27 2021, 2:55 PM
douardda accepted D6352: Add StatsD support to provenance storage implementations.
Sep 27 2021, 2:53 PM
douardda requested changes to D6351: Add StatsD support to provenance backend.

Overall looks ok to me but:

Sep 27 2021, 2:32 PM
douardda added inline comments to D6348: Clarify local/remote heads type as those are hexadecimal bytes str.
Sep 27 2021, 12:11 PM
douardda closed D6346: docker: use dsn connection string in web.yml.
Sep 27 2021, 12:02 PM
douardda committed rDENV27ddb8a16f3b: docker: use dsn connection string in web.yml (authored by douardda).
docker: use dsn connection string in web.yml
Sep 27 2021, 12:02 PM
douardda closed D6345: docker: do not override the DJANGO_SETTINGS_MODULE in swh-web/entrypoint.sh.
Sep 27 2021, 12:02 PM
douardda committed rDENV6c4f3c5f41fc: docker: do not override the DJANGO_SETTINGS_MODULE in swh-web/entrypoint.sh (authored by douardda).
docker: do not override the DJANGO_SETTINGS_MODULE in swh-web/entrypoint.sh
Sep 27 2021, 12:02 PM
douardda added a comment to D6347: docker: use a dedicated container for memcached.

Why?

Sep 27 2021, 12:00 PM
douardda updated the diff for D6347: docker: use a dedicated container for memcached.

Rebase

Sep 27 2021, 11:51 AM
douardda updated the diff for D6346: docker: use dsn connection string in web.yml.

Better commit message

Sep 27 2021, 11:51 AM
douardda updated the diff for D6345: docker: do not override the DJANGO_SETTINGS_MODULE in swh-web/entrypoint.sh.

typos

Sep 27 2021, 11:51 AM
douardda added a comment to D6346: docker: use dsn connection string in web.yml.

I don't see why (I'm guessing for simplification), but ok

Sep 27 2021, 11:45 AM
douardda accepted D6349: Update archive changelog about the opam.ocaml.org instance.
Sep 27 2021, 11:30 AM
douardda requested review of D6347: docker: use a dedicated container for memcached.
Sep 27 2021, 10:13 AM
douardda requested review of D6346: docker: use dsn connection string in web.yml.
Sep 27 2021, 10:12 AM
douardda requested review of D6345: docker: do not override the DJANGO_SETTINGS_MODULE in swh-web/entrypoint.sh.
Sep 27 2021, 10:11 AM
douardda added a comment to D6341: model: Replace attrs-strict with stricter validation.

Looks fine to me, but it needs some extensive tests indeed.

Sep 27 2021, 9:42 AM

Sep 24 2021

douardda closed D6336: Naive attempt to add support for dsn url config style for production db.
Sep 24 2021, 3:33 PM
douardda committed rDWAPPSdd6dde3e44cd: Naive attempt to add support for dsn url config style for production db (authored by douardda).
Naive attempt to add support for dsn url config style for production db
Sep 24 2021, 3:33 PM
douardda closed D6335: Wrap long lines in the README file.
Sep 24 2021, 3:33 PM
douardda committed rDWAPPS15b0e84456ae: Wrap long lines in the README file (authored by douardda).
Wrap long lines in the README file
Sep 24 2021, 3:33 PM
douardda updated the diff for D6336: Naive attempt to add support for dsn url config style for production db.

use types-psycopg2 instead of ignore it in mymy.ini

Sep 24 2021, 2:14 PM
douardda added inline comments to D6336: Naive attempt to add support for dsn url config style for production db.
Sep 24 2021, 2:10 PM
douardda added inline comments to D6336: Naive attempt to add support for dsn url config style for production db.
Sep 24 2021, 2:08 PM