Maniphest T4273

Rewrite indexers as journal clients when relevant
Closed, MigratedEdits Locked
Actions

Assigned To

gitlab-migration

Authored By

	vlorentz
	May 24 2022, 5:29 PM

Tags

Subscribers

gitlab-migration

Description

Currently on the metadata indexer was implemented as one but it was dedicated to create one-shot tasks with an indirection on the scheduler.

This would:

simplify the stack by removing moving parts (scheduler, storage access db for content indexer...).
allow better monitoring (as we already have a grafana dashboard for journal clients)
allow indexation to be retried [1] on error
stop one index computation failure to fail the full batch indexation

Indexers:

D7899: origin intrinsic metadata
D8149: origin extrinsic metadata
D8147: mimetype (content indexer)
D8156: fossology-indexer (content indexer)

Revisions and Commits

rDWAPPS Web applications
		D8179	rDWAPPS3e5d552a3283 Drop ctags indexer references and hidden api using it
rDENV Development environment
		D7940	rDENV5d2296c6d9e3 Switch origin-intrinsic-metadata from celery- to journal-based workers
rDCIDX Metadata indexer
	Closed		D7899 Add support for indexing directly from the journal client
		D8158	rDCIDXb8be8197ef58 Drop decomissioned language indexer storage endpoints
		D8157	rDCIDX230e89d8eeb8 cleanup: Drop decomissioned ctags indexer
		D8156	rDCIDXa3253a0be473 cli: Add fossology license indexer journal client
		D8147	rDCIDXd0b5f23300e1 Add tests around new content-mimetype journal client indexer
		D8147	rDCIDX05cc6a62620e Adapt content indexer to allow journal objects processing
		D7893	rDCIDX35ff46ef5bb2 Refactor base indexers to provide a process_journal_objects method

Related Objects
Search...

		Status	Assigned	Task
		Migrated	gitlab-migration	T4392 Metadata Indexer for NuGet (.nuspec)
		Migrated	gitlab-migration	T3097 Expose metadata in the WebApp and make it searchable
		Migrated	gitlab-migration	T2064 Add metadata from deposits to metadata search
		Migrated	gitlab-migration	T2073 Index extrinsic metadata from the journal in swh-search/Elasticsearch
		Migrated	gitlab-migration	T4273 Rewrite indexers as journal clients when relevant
		Migrated	gitlab-migration	T4282 Deploy new origin intrinsic metadata journal client indexer > v1.1
		Migrated	gitlab-migration	T4274 Resolve all known crashes in the metadata indexer
		Migrated	gitlab-migration	T4275 CffMapping: Add checks for value types
		Migrated	gitlab-migration	T4276 CffMapping: ignore invalid yaml files
		Migrated	gitlab-migration	T4277 Deal with null characters in the output of the metadata indexer
		Migrated	gitlab-migration	T4333 test_npm_adversarial fails
		Migrated	gitlab-migration	T4395 Migrate azure worker vms to cheaper and more efficient vms
				Restricted Maniphest Task
		Migrated	gitlab-migration	T4401 Index metadata from the deposit
		Migrated	gitlab-migration	T4459 Deploy swh-indexer > v2.6 on staging then production
		Migrated	gitlab-migration	T4429 Deploy swh-indexer v2.3.0 on production and staging
		Migrated	gitlab-migration	T4477 staging origin intrinsic metadata indexer are stuck
		Migrated	gitlab-migration	T4606 Deploy swh-indexer v2.7.0
		Migrated	gitlab-migration	T4694 Use directory metadata in origin search

Event Timeline

vlorentz triaged this task as Normal priority.May 24 2022, 5:29 PM

vlorentz created this task.

vlorentz updated the task description. (Show Details)May 25 2022, 10:32 AM

vlorentz mentioned this in D7899: Add support for indexing directly from the journal client.May 25 2022, 3:40 PM

vlorentz added a revision: D7893: Refactor base indexers to provide a process_journal_objects method.May 25 2022, 3:40 PM

vlorentz added a commit: rDCIDX35ff46ef5bb2: Refactor base indexers to provide a process_journal_objects method.May 30 2022, 3:55 PM

ardumont changed the status of subtask T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from Open to Work in Progress.Jun 1 2022, 11:56 AM

vlorentz mentioned this in D7940: Switch origin-intrinsic-metadata from celery- to journal-based workers.Jun 1 2022, 4:45 PM

vlorentz added a revision: D7940: Switch origin-intrinsic-metadata from celery- to journal-based workers.Jun 1 2022, 4:46 PM

vlorentz added a revision: D7899: Add support for indexing directly from the journal client.

vlorentz added a commit: rDENV5d2296c6d9e3: Switch origin-intrinsic-metadata from celery- to journal-based workers.Jun 1 2022, 4:57 PM

ardumont changed the status of subtask T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from Work in Progress to Open.Jun 2 2022, 6:00 PM

vlorentz closed subtask T4274: Resolve all known crashes in the metadata indexer as Resolved.Jul 5 2022, 4:50 PM

ardumont changed the status of subtask T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from Open to Work in Progress.Jul 18 2022, 6:23 PM

ardumont updated the task description. (Show Details)Jul 19 2022, 10:57 AM

vlorentz updated the task description. (Show Details)Jul 19 2022, 11:04 AM

ardumont added a revision: D8147: Adapt content indexer to allow journal objects processing.Jul 20 2022, 7:18 PM

ardumont closed subtask T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 as Resolved.Jul 21 2022, 10:40 AM

ardumont updated the task description. (Show Details)Jul 21 2022, 2:41 PM

ardumont updated the task description. (Show Details)

ardumont updated the task description. (Show Details)

ardumont added a commit: rDCIDX05cc6a62620e: Adapt content indexer to allow journal objects processing.Jul 22 2022, 3:30 PM

ardumont added a commit: rDCIDXd0b5f23300e1: Add tests around new content-mimetype journal client indexer.

ardumont added a revision: D8156: cli: Add fossology license indexer journal client.Jul 22 2022, 3:31 PM

ardumont updated the task description. (Show Details)Jul 22 2022, 3:32 PM

ardumont added a commit: rDCIDXa3253a0be473: cli: Add fossology license indexer journal client.Jul 22 2022, 4:04 PM

ardumont updated the task description. (Show Details)Jul 22 2022, 4:05 PM

ardumont added a revision: D8157: cleanup: Drop decomissioned ctags indexer.Jul 22 2022, 4:30 PM

ardumont added a revision: D8158: Drop decomissioned language indexer storage endpoints.Jul 22 2022, 4:42 PM

ardumont added a commit: rDCIDX230e89d8eeb8: cleanup: Drop decomissioned ctags indexer.Jul 22 2022, 6:16 PM

ardumont added a commit: rDCIDXb8be8197ef58: Drop decomissioned language indexer storage endpoints.Jul 25 2022, 4:15 PM

ardumont added a revision: D8179: Drop ctags indexer references and hidden api using it.Aug 4 2022, 10:24 AM

ardumont added a commit: rDWAPPS3e5d552a3283: Drop ctags indexer references and hidden api using it.Aug 4 2022, 12:33 PM

vlorentz closed this task as Resolved.Aug 8 2022, 9:59 AM

gitlab-migration changed the status of subtask T4282: Deploy new origin intrinsic metadata journal client indexer > v1.1 from Resolved to Migrated.Oct 19 2022, 6:07 PM

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T4274: Resolve all known crashes in the metadata indexer from Resolved to Migrated.Jan 8 2023, 4:36 PM