Page MenuHomeSoftware Heritage

IndexerFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Details

Description

Miner/index on objects contained the Software Heritage archive.

Recent Activity

Wed, Oct 5

vlorentz added a parent task for T4606: Deploy swh-indexer v2.7.0: T4605: Deploy swh-loader-metadata v0.0.3.
Wed, Oct 5, 1:07 PM · System administration, Indexer
vlorentz added a subtask for T4605: Deploy swh-loader-metadata v0.0.3: T4606: Deploy swh-indexer v2.7.0.
Wed, Oct 5, 1:07 PM · Metadata Loaders, System administration
vlorentz added a comment to T4605: Deploy swh-loader-metadata v0.0.3.

also, this should preferably be deployed after swh-indexer v2.7.0 so we don't need to reset indexer journal consumers to index metadata from Gitea, but not a requirement.

Wed, Oct 5, 1:07 PM · Metadata Loaders, System administration
vlorentz added a subtask for T4457: Index metadata from Gitea/Gogs: T4606: Deploy swh-indexer v2.7.0.
Wed, Oct 5, 1:06 PM · Origin-Gitea/Gogs, Extrinsic metadata, Indexer
vlorentz added a parent task for T4606: Deploy swh-indexer v2.7.0: T4457: Index metadata from Gitea/Gogs.
Wed, Oct 5, 1:06 PM · System administration, Indexer
vlorentz triaged T4606: Deploy swh-indexer v2.7.0 as Normal priority.
Wed, Oct 5, 1:05 PM · System administration, Indexer
vlorentz updated the task description for T4605: Deploy swh-loader-metadata v0.0.3.
Wed, Oct 5, 1:05 PM · Metadata Loaders, System administration
vlorentz updated the task description for T4605: Deploy swh-loader-metadata v0.0.3.
Wed, Oct 5, 1:05 PM · Metadata Loaders, System administration
vlorentz triaged T4605: Deploy swh-loader-metadata v0.0.3 as Normal priority.
Wed, Oct 5, 1:04 PM · Metadata Loaders, System administration

Wed, Sep 28

ardumont closed T4459: Deploy swh-indexer > v2.6 on staging then production, a subtask of T4392: Metadata Indexer for NuGet (.nuspec), as Resolved.
Wed, Sep 28, 7:22 PM · Indexer
ardumont closed T4459: Deploy swh-indexer > v2.6 on staging then production, a subtask of T4401: Index metadata from the deposit, as Resolved.
Wed, Sep 28, 7:22 PM · SWORD deposit, Indexer, Metadata workflow
ardumont closed T4459: Deploy swh-indexer > v2.6 on staging then production as Resolved.
Wed, Sep 28, 7:22 PM · Indexer, System administration
vlorentz added revisions to T4401: Index metadata from the deposit: D8570: Index extrinsic metadata from the deposit, D8568: codemeta: Fix crash when translating PropertyValue objects from codemeta-in-SWORD.
Wed, Sep 28, 7:02 PM · SWORD deposit, Indexer, Metadata workflow
vlorentz closed T4536: Document how swh-indexer uses Codemeta crosswalks as Resolved.
Wed, Sep 28, 12:57 PM · Documentation, Indexer

Tue, Sep 27

vlorentz added a revision to T4536: Document how swh-indexer uses Codemeta crosswalks: D8549: Make read_crosstable public and document it..
Tue, Sep 27, 2:14 PM · Documentation, Indexer

Mon, Sep 26

vlorentz closed T4392: Metadata Indexer for NuGet (.nuspec) as Resolved.
Mon, Sep 26, 6:30 PM · Indexer

Fri, Sep 16

ardumont moved T4459: Deploy swh-indexer > v2.6 on staging then production from code-review/await-feedback/pause to deployed/landed/monitoring on the System administration board.
Fri, Sep 16, 6:09 PM · Indexer, System administration

Thu, Sep 15

ardumont added a revision to T4459: Deploy swh-indexer > v2.6 on staging then production: D8493: indexer: Use public brokers in production, internal ones for staging.
Thu, Sep 15, 6:29 PM · Indexer, System administration
ardumont added a comment to T4459: Deploy swh-indexer > v2.6 on staging then production.

There's a few issues with the configuration of these indexer clients:
the traffic should not be going through the IPSec VPN. They need to use the public,
authenticated kafka endpoints. The IPSec load is making all azure communication
struggle.

ack, that should be "simple" enough to adapt [1]
[1] https://docs.softwareheritage.org/sysadm/mirror-operations/onboard.html?highlight=credential#how-to-create-the-journal-credentials

Thu, Sep 15, 6:15 PM · Indexer, System administration
ardumont added a revision to T4459: Deploy swh-indexer > v2.6 on staging then production: D8492: indexer: Allow journal client authentication configuration.
Thu, Sep 15, 5:49 PM · Indexer, System administration
vlorentz triaged T4536: Document how swh-indexer uses Codemeta crosswalks as Normal priority.
Thu, Sep 15, 12:04 PM · Documentation, Indexer

Tue, Sep 13

ardumont moved T4459: Deploy swh-indexer > v2.6 on staging then production from deployed/landed/monitoring to code-review/await-feedback/pause on the System administration board.
Tue, Sep 13, 3:42 PM · Indexer, System administration
vlorentz added a revision to T4457: Index metadata from Gitea/Gogs: D8460: Add Gitea metadata mapping.
Tue, Sep 13, 1:31 PM · Origin-Gitea/Gogs, Extrinsic metadata, Indexer
ardumont added a comment to T4459: Deploy swh-indexer > v2.6 on staging then production.

There's a few issues with the configuration of these indexer clients:

Tue, Sep 13, 10:06 AM · Indexer, System administration

Mon, Sep 12

vlorentz added a comment to T4459: Deploy swh-indexer > v2.6 on staging then production.

I'm guessing that's the extrinsic metadata indexer; others need to do plenty of random access to the storage, but that one consumes very quickly from Kafka. On the bright side, it consumes the entire topic within hours so parallelism could be reduced, as a quick fix

Mon, Sep 12, 9:04 PM · Indexer, System administration
vsellier added a comment to T4459: Deploy swh-indexer > v2.6 on staging then production.

All the indexers were stopped at 20:00 FR because something was consummng all the bandwidth of the VPN between azure and the our infra.

root@pergamon:/etc/clustershell# clush -b -w @indexer-workers "puppet agent --disable 'stop indexer to avoid bandwith consumption'"
root@pergamon:/etc/clustershell# clush -b -w @indexer-workers "systemctl stop swh-indexer-journal-client@*"
Mon, Sep 12, 8:10 PM · Indexer, System administration
olasd updated subscribers of T4459: Deploy swh-indexer > v2.6 on staging then production.

There's a few issues with the configuration of these indexer clients:

Mon, Sep 12, 8:10 PM · Indexer, System administration
ardumont moved T4459: Deploy swh-indexer > v2.6 on staging then production from in-progress to deployed/landed/monitoring on the System administration board.
Mon, Sep 12, 6:01 PM · Indexer, System administration
ardumont moved T4459: Deploy swh-indexer > v2.6 on staging then production from Weekly backlog to in-progress on the System administration board.
Mon, Sep 12, 6:01 PM · Indexer, System administration
ardumont updated the task description for T4459: Deploy swh-indexer > v2.6 on staging then production.
Mon, Sep 12, 6:01 PM · Indexer, System administration
ardumont updated the task description for T4459: Deploy swh-indexer > v2.6 on staging then production.
Mon, Sep 12, 5:48 PM · Indexer, System administration
ardumont updated the task description for T4459: Deploy swh-indexer > v2.6 on staging then production.
Mon, Sep 12, 5:47 PM · Indexer, System administration
ardumont renamed T4459: Deploy swh-indexer > v2.6 on staging then production from Deploy swh-indexer > v2.5 on production and staging to Deploy swh-indexer > v2.6 on staging then production.
Mon, Sep 12, 5:33 PM · Indexer, System administration

Thu, Sep 8

ardumont moved T4459: Deploy swh-indexer > v2.6 on staging then production from in-progress to Weekly backlog on the System administration board.
Thu, Sep 8, 11:16 AM · Indexer, System administration
ardumont renamed T4459: Deploy swh-indexer > v2.6 on staging then production from Deploy swh-indexer v2.4 on production and staging to Deploy swh-indexer > v2.5 on production and staging.
Thu, Sep 8, 11:16 AM · Indexer, System administration

Sep 2 2022

vlorentz triaged T4490: gemspec mapping: Add support for optional parameter as Normal priority.
Sep 2 2022, 2:13 PM · Easy hack, Indexer

Sep 1 2022

vlorentz triaged T4480: NpmMapping: Add basic support of SPDX expressions as Normal priority.
Sep 1 2022, 4:59 PM · Indexer, Easy hack
ardumont updated the task description for T4459: Deploy swh-indexer > v2.6 on staging then production.
Sep 1 2022, 11:50 AM · Indexer, System administration

Aug 31 2022

vlorentz added revisions to T4459: Deploy swh-indexer > v2.6 on staging then production: D8372: base: Filter out empty URIs so PyLD does not crash, D8373: Filter out more invalid URIs that make PyLD crash.
Aug 31 2022, 9:14 PM · Indexer, System administration
ardumont closed T4477: staging origin intrinsic metadata indexer are stuck as Resolved.
Aug 31 2022, 6:49 PM · Indexer, System administration
ardumont closed T4477: staging origin intrinsic metadata indexer are stuck, a subtask of T4459: Deploy swh-indexer > v2.6 on staging then production, as Resolved.
Aug 31 2022, 6:49 PM · Indexer, System administration
ardumont moved T4477: staging origin intrinsic metadata indexer are stuck from in-progress to deployed/landed/monitoring on the System administration board.
Aug 31 2022, 6:49 PM · Indexer, System administration
ardumont changed the status of T4477: staging origin intrinsic metadata indexer are stuck, a subtask of T4459: Deploy swh-indexer > v2.6 on staging then production, from Open to Work in Progress.
Aug 31 2022, 6:49 PM · Indexer, System administration
ardumont changed the status of T4477: staging origin intrinsic metadata indexer are stuck from Open to Work in Progress.
Aug 31 2022, 6:49 PM · Indexer, System administration
vsellier added a revision to T4477: staging origin intrinsic metadata indexer are stuck: D8371: staging: Increase the number of workers for storage and indexer storage.
Aug 31 2022, 6:19 PM · Indexer, System administration
ardumont added a revision to T4477: staging origin intrinsic metadata indexer are stuck: D8370: staging intrinsic metadata indexer: Declare batch size to 100.
Aug 31 2022, 6:17 PM · Indexer, System administration
ardumont added a comment to T4477: staging origin intrinsic metadata indexer are stuck.

The lag is subsiding now, slowly because only 1 journal client:

Aug 31 2022, 6:09 PM · Indexer, System administration
ardumont updated subscribers of T4477: staging origin intrinsic metadata indexer are stuck.

After further investigation w/ @vsellier, it's also related to our storage and indexer storage having too few gunicorn workers serving the journal clients (among other things).
So decreasing the batch size to something like 100 and fixing that should fairly help ^.

Aug 31 2022, 6:06 PM · Indexer, System administration
ardumont added a revision to T4477: staging origin intrinsic metadata indexer are stuck: D8369: indexer.cli: Allow batch_size configuration on journal client.
Aug 31 2022, 5:20 PM · Indexer, System administration
ardumont added a comment to T4477: staging origin intrinsic metadata indexer are stuck.

Activating debug log [1]

Aug 31 2022, 5:18 PM · Indexer, System administration