Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 2 2022

ardumont requested review of D7949: db.BaseDb: Propose default get_current_version method implementation.
Jun 2 2022, 5:23 PM
ardumont updated the summary of D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".
Jun 2 2022, 5:22 PM
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7949: db.BaseDb: Propose default get_current_version method implementation.
Jun 2 2022, 5:21 PM · Archive integrity, System administration
ardumont added a comment to P1371 still github origins listed from maven with exotic urls.

listing done!

Jun 2 2022, 4:48 PM
ardumont edited P1371 still github origins listed from maven with exotic urls.
Jun 2 2022, 4:48 PM
ardumont updated the diff for D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".

Open it back, it's required in the end. Another fix will happen to propose a default
get_current_version implementation in the BaseDb.

Jun 2 2022, 4:41 PM
ardumont added a comment to T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.

This needs to be reverted in waiting for [1] to be resolved.
I'll attend to it tomorrow.

Jun 2 2022, 4:38 PM · System administration, Indexer, Metadata workflow
ardumont closed D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 4:01 PM
ardumont committed rDCOREe1a1d84eb4ea: github/utils: Deal with exotic urls to canonicalize (authored by ardumont).
github/utils: Deal with exotic urls to canonicalize
Jun 2 2022, 4:01 PM
ardumont added inline comments to D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:58 PM
ardumont updated the diff for D7946: github/utils: Deal with exotic urls to canonicalize.

Drop function for f-string, simpler indeed.

Jun 2 2022, 3:58 PM
ardumont added inline comments to D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:53 PM
ardumont added inline comments to D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:53 PM
ardumont accepted D7944: Add support for running the server with 'postgresql' storage cls.
Jun 2 2022, 3:44 PM
ardumont added inline comments to D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:43 PM
ardumont added a comment to T4278: Elastic worker cluster failures to unstuck.

I've started back the loader git on that cluster:

Jun 2 2022, 3:39 PM · System administration, Roadmap 2022
ardumont added inline comments to D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:25 PM
ardumont added inline comments to D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:25 PM
ardumont updated the summary of D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:24 PM
ardumont updated the diff for D7946: github/utils: Deal with exotic urls to canonicalize.

Update tests

Jun 2 2022, 3:23 PM
ardumont accepted D7945: indexer_storage.yml: Replace deprecated alias 'local' with 'postgresql'.
Jun 2 2022, 3:21 PM
ardumont added a comment to D7945: indexer_storage.yml: Replace deprecated alias 'local' with 'postgresql'.

(Same for scrubber but idk if it's in docker)

Jun 2 2022, 3:21 PM
ardumont requested review of D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:17 PM
ardumont added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:14 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Full listing is not finished yet but still there remains origins with exotic starting urls which are not canonicalized.
I'd say the issue lies with the canonicalize swh.core implementation code which only deals with https:// and git:// urls.
So some improvments are needed there.

Jun 2 2022, 2:08 PM · Maven loader, Maven lister, Archive coverage
ardumont created P1371 still github origins listed from maven with exotic urls.
Jun 2 2022, 2:06 PM
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

heads up, ingestion still ongoing with quite some stability in regards to memory consumption.

Jun 2 2022, 1:40 PM · System administration, Git loader
ardumont requested changes to D7937: Replace RevisionMetadataIndexer with DirectoryMetadataIndexer.

But, please make sure the migration script are actually runnable though (i'd say with docker and swh db upgrade cli).
I'm not sure our schema and indexes sql files are idempotent enough for the migration script 134 to work with it.
^ Hence the request changes here.

Jun 2 2022, 1:35 PM
ardumont added inline comments to D7937: Replace RevisionMetadataIndexer with DirectoryMetadataIndexer.
Jun 2 2022, 1:13 PM
ardumont accepted D7942: Improve doc strings and inline documentation.
Jun 2 2022, 1:02 PM
ardumont accepted D7933: add a kafka_stream_to_value helper function in serializers.py.

Not sure i get all this but this rather lgtm

Jun 2 2022, 12:56 PM
ardumont accepted D7941: Add support for indexing from head releases.

But i don't get why you use of assert within the runtime code instead of raising proper exception instead.

Jun 2 2022, 12:45 PM
ardumont abandoned D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".

Nope, it's fine without this.

Jun 2 2022, 11:50 AM
ardumont requested review of D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".
Jun 2 2022, 11:44 AM
ardumont added a reverting change for D7907: cli.db: Use attribute current_version instead of undeclared getter: D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".
Jun 2 2022, 11:42 AM
ardumont added a reverting change for rDCORE5cda0ca62601: cli.db: Use attribute current_version instead of undeclared getter: D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".
Jun 2 2022, 11:42 AM
ardumont added a revision to T4284: scrubber does not comply to what's expected by the swh db tooling: D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".
Jun 2 2022, 11:42 AM · Archive integrity

Jun 1 2022

ardumont moved T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from in-progress to deployed/landed/monitoring on the System administration board.
Jun 1 2022, 5:37 PM · System administration, Indexer, Metadata workflow
ardumont moved T4283: Load https://github.com/chromium/chromium with a higher packfile size limit from code-review/await-feedback/pause to deployed/landed/monitoring on the System administration board.
Jun 1 2022, 5:37 PM · System administration, Git loader
ardumont moved T4283: Load https://github.com/chromium/chromium with a higher packfile size limit from in-progress to code-review/await-feedback/pause on the System administration board.
Jun 1 2022, 5:37 PM · System administration, Git loader
ardumont updated the task description for T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.
Jun 1 2022, 5:37 PM · System administration, Indexer, Metadata workflow
ardumont committed rSPSITEa26b42da1b9a: indexer_journal_client: Fix cli template (authored by ardumont).
indexer_journal_client: Fix cli template
Jun 1 2022, 5:33 PM
ardumont committed rSPSITEb537a55236c0: indexer_journal_client: Fix configuration (authored by ardumont).
indexer_journal_client: Fix configuration
Jun 1 2022, 5:29 PM
ardumont updated the task description for T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.
Jun 1 2022, 5:22 PM · System administration, Indexer, Metadata workflow
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Status update, both worker1.staging and worker17 are beyond the failing step of pack
file limit where they usually crash \o/ [1].

Jun 1 2022, 5:16 PM · System administration, Git loader
ardumont updated the task description for T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.
Jun 1 2022, 5:15 PM · System administration, Indexer, Metadata workflow
ardumont edited P1370 indexer cli says no.
Jun 1 2022, 5:10 PM
ardumont renamed T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from staging: Deploy new origin intrinsic metadata journal client indexer v1.1 to staging: Deploy new origin intrinsic metadata journal client indexer > v1.1.
Jun 1 2022, 5:09 PM · System administration, Indexer, Metadata workflow
ardumont accepted D7940: Switch origin-intrinsic-metadata from celery- to journal-based workers.

much better ;)

Jun 1 2022, 4:56 PM
ardumont accepted D7940: Switch origin-intrinsic-metadata from celery- to journal-based workers.
Jun 1 2022, 4:50 PM
ardumont created P1370 indexer cli says no.
Jun 1 2022, 4:43 PM
ardumont updated the task description for T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.
Jun 1 2022, 4:33 PM · System administration, Indexer, Metadata workflow
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

I've started a 32g experiment in worker1.staging and 64g in worker17.

Jun 1 2022, 4:24 PM · System administration, Git loader
ardumont committed rSPSITE587aa33416e4: Fix service clean up from absent to stopped (authored by ardumont).
Fix service clean up from absent to stopped
Jun 1 2022, 4:16 PM
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

8g (pack size limit) was not enough either, it broke on both workers ¯\_(ツ)_/¯.
We have no clue as to what size limit should be done so i'm clearly taking shots in the dark.
I've started a 32g experiment in worker1.staging and 64g in worker17.
We will see.

Jun 1 2022, 4:04 PM · System administration, Git loader
ardumont added a comment to D7936: [WIP] collabgraph: add tool to generate author collaboration graphs.

This was meant to be a draft, but I couldn't find the button to make it so

Jun 1 2022, 4:01 PM
ardumont added a comment to D7936: [WIP] collabgraph: add tool to generate author collaboration graphs.

missing documentation and rationale

Jun 1 2022, 3:54 PM
ardumont accepted D7935: ORC: handle nullable columns/empty tables properly.

Side node, why aren't there tests alongside this module?

Jun 1 2022, 3:52 PM
ardumont accepted D7939: tests: Simplify definition of ORIGINS list.
Jun 1 2022, 3:44 PM
ardumont accepted D7934: deprecate the db/pytest_plugin.py module.
Jun 1 2022, 3:43 PM
ardumont closed D7928: Deploy new origin intrinsic metadata journal client indexer.
Jun 1 2022, 3:20 PM
ardumont committed rSPSITEa41b6ac28486: Deploy new origin intrinsic metadata journal client indexer (authored by ardumont).
Deploy new origin intrinsic metadata journal client indexer
Jun 1 2022, 3:20 PM
ardumont closed T4298: Configure lister services to use github credentials as Invalid.

mmm, it's already the case so something is off.

Jun 1 2022, 3:19 PM · System administration, Lister
ardumont added projects to T4298: Configure lister services to use github credentials: Lister, System administration.
Jun 1 2022, 3:09 PM · System administration, Lister
ardumont triaged T4298: Configure lister services to use github credentials as Normal priority.
Jun 1 2022, 3:09 PM · System administration, Lister
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 3:06 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Plan:

  • P1369: Listing status after first round listing
  • Clean up maven github origins listing [1]
  • Trigger maven full run [2]
  • Wait for listing to finish
  • Listing status after new maven lister round of listing
  • Ping in mailing list discussion with data!
Jun 1 2022, 3:05 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 3:01 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

worker17 is complaining as well but differently somehow.
same version for both though [2].

Jun 1 2022, 2:58 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Ok, expectedly, it does not work as is [1] ;)
Second run then with twice the actual pack file limit [2].

Jun 1 2022, 2:54 PM · System administration, Git loader
ardumont changed the status of T4135: staging: Deploy graphql service, a subtask of T4134: Package the graphql service, from Open to Work in Progress.
Jun 1 2022, 2:36 PM · System administration, GraphQL API
ardumont changed the status of T4135: staging: Deploy graphql service from Open to Work in Progress.
Jun 1 2022, 2:36 PM · System administration, GraphQL API
ardumont changed the status of T4283: Load https://github.com/chromium/chromium with a higher packfile size limit from Open to Work in Progress.
Jun 1 2022, 2:35 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

I've triggered a run on worker1.staging [1] and worker17 as is for now.
We'll see for the pack file size limit after that run fails (if it does).

Jun 1 2022, 2:35 PM · System administration, Git loader
ardumont changed the status of T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from Open to Work in Progress.
Jun 1 2022, 11:56 AM · System administration, Indexer, Metadata workflow
ardumont changed the status of T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1, a subtask of T4273: Rewrite indexers as journal clients when relevant, from Open to Work in Progress.
Jun 1 2022, 11:56 AM · Indexer, Metadata workflow
ardumont added a project to T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1: System administration.
Jun 1 2022, 11:56 AM · System administration, Indexer, Metadata workflow
ardumont added a comment to T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.

Should be ready to be deployed now.

Jun 1 2022, 11:56 AM · System administration, Indexer, Metadata workflow
ardumont added inline comments to D7878: svn: Wraps commit info retrieval in a retryable SvnRepo method.
Jun 1 2022, 11:50 AM
ardumont added inline comments to D7878: svn: Wraps commit info retrieval in a retryable SvnRepo method.
Jun 1 2022, 11:49 AM
ardumont added a comment to T4278: Elastic worker cluster failures to unstuck.

Awesome! Thanks.

Jun 1 2022, 11:48 AM · System administration, Roadmap 2022
ardumont added a comment to D7913: db: Grant read access to guest user on all tables of the schema.

@douardda Any news on how to modify a db template for the tests?

Jun 1 2022, 11:36 AM
ardumont updated the task description for T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.
Jun 1 2022, 11:26 AM · System administration, Indexer, Metadata workflow
ardumont updated the summary of D7928: Deploy new origin intrinsic metadata journal client indexer.
Jun 1 2022, 11:23 AM
ardumont added a comment to D7928: Deploy new origin intrinsic metadata journal client indexer.

Is the scheduler section in 'swh::deploy::indexer_journal_client::config' still needed ?

Jun 1 2022, 11:22 AM
ardumont updated the diff for D7928: Deploy new origin intrinsic metadata journal client indexer.

Adapt according to discussion (description and test plan updated already)

Jun 1 2022, 11:21 AM
ardumont updated the test plan for D7928: Deploy new origin intrinsic metadata journal client indexer.
Jun 1 2022, 11:19 AM
ardumont updated the test plan for D7928: Deploy new origin intrinsic metadata journal client indexer.
Jun 1 2022, 11:18 AM
ardumont updated the summary of D7928: Deploy new origin intrinsic metadata journal client indexer.
Jun 1 2022, 11:07 AM
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 10:50 AM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Old maven behavior results in origins like git://github.com, ... [1]
The new maven lister behavior should now result in canonical github urls http://github.com/user/repo.
Analysis ongoing and report will go after that comment.

Jun 1 2022, 10:50 AM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 10:47 AM · Maven loader, Maven lister, Archive coverage
ardumont created P1369 old maven listing with github origins non-canonicalized.
Jun 1 2022, 10:46 AM · Maven lister
ardumont added inline comments to D7928: Deploy new origin intrinsic metadata journal client indexer.
Jun 1 2022, 10:34 AM
ardumont accepted D7932: Add chart to deploy graphql.

couple of questions inline.

Jun 1 2022, 10:19 AM
ardumont added inline comments to D7932: Add chart to deploy graphql.
Jun 1 2022, 10:19 AM

May 31 2022

ardumont updated the test plan for D7928: Deploy new origin intrinsic metadata journal client indexer.
May 31 2022, 5:42 PM
ardumont added a comment to D7894: Add arch lister module (origins from archives)..

I've skimmed through a bit and this does lgtm from afar so far.

May 31 2022, 5:00 PM
ardumont updated the task description for T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1.
May 31 2022, 4:52 PM · System administration, Indexer, Metadata workflow
ardumont added inline comments to D7928: Deploy new origin intrinsic metadata journal client indexer.
May 31 2022, 4:12 PM