⚓ T359 Indexers: batch content analyzer infrastructure

Status	Assigned	Task
Migrated	gitlab-migration	T439 Indexers: compute (and maintain up-to-date) the filetype of all blobs
Migrated	gitlab-migration	T1385 Monitor output of metadata indexers
Migrated	gitlab-migration	T359 Indexers: batch content analyzer infrastructure
Migrated	gitlab-migration	T1227 General improvments of the indexer: Schedule indexer tasks
Migrated	gitlab-migration	T1229 Indexers: Make orchestrators use swh-scheduler for scheduling
Migrated	gitlab-migration	T1290 Indexers: Use swh.scheduler instead of directly relying on Celery
Migrated	gitlab-migration	T1230 Indexers: Improve readme to be more explicit on how to run locally
Migrated	gitlab-migration	T1310 Simplify indexer design: move away from the pipeline approach
Migrated	gitlab-migration	T1311 indexer: Remove orchestrators
Migrated	gitlab-migration	T1312 indexer: Adapt textual content indexer to actually filter textual content themselves
Migrated	gitlab-migration	T1324 Deploy metadata indexers in production
Migrated	gitlab-migration	T1326 metadata indexer: Deploy origin head
Migrated	gitlab-migration	T991 Indexers: Send range of ids instead of list of ids
Migrated	gitlab-migration	T1375 Deploy revision metadata indexer
Migrated	gitlab-migration	T1376 Deploy origin indexer
Migrated	gitlab-migration	T1374 content indexer: Determine the identifier ranges to use to schedule those
Migrated	gitlab-migration	T818 indexer DB should not use bytea for mimetype and encoding columns

zack created this task.Apr 1 2016, 10:51 AM

zack updated the task description. (Show Details)

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:09 PM

zack added a subtask: T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage.May 29 2016, 5:57 PM

zack added a parent task: T439: Indexers: compute (and maintain up-to-date) the filetype of all blobs.Jun 13 2016, 4:06 PM

olasd edited subtasks, added: T528: swh-journal: Create a journal client listing objects of a given type; removed: T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage.Aug 16 2016, 6:35 PM

A POC is ongoing for as T548

ardumont renamed this task from batch blob analyzer infrastructure to Indexers: batch blob analyzer infrastructure.Oct 5 2018, 2:47 PM

ardumont removed a subtask: T528: swh-journal: Create a journal client listing objects of a given type.Oct 18 2018, 3:55 PM

Unplugging T528 as per discussion.

We need to rework the current indexer implementation to use range instead (T991).
After that, we can schedule 256 ranges of contents to index using the scheduler stack instead.
And see where that goes.

ardumont renamed this task from Indexers: batch blob analyzer infrastructure to Indexers: batch content analyzer infrastructure.Oct 19 2018, 8:44 AM

ardumont raised the priority of this task from Low to Normal.

ardumont added a subtask: T991: Indexers: Send range of ids instead of list of ids.

ardumont updated the task description. (Show Details)

ardumont added a project: Indexer.

ardumont edited subtasks, added: T1227: General improvments of the indexer: Schedule indexer tasks; removed: T991: Indexers: Send range of ids instead of list of ids.Oct 19 2018, 8:47 AM

vlorentz added a parent task: T1385: Monitor output of metadata indexers.Nov 27 2018, 11:58 AM

ardumont closed subtask T1227: General improvments of the indexer: Schedule indexer tasks as Resolved.Dec 4 2018, 11:47 AM

We need to rework the current indexer implementation to use range instead (T991).
After that, we can schedule 256 ranges of contents to index using the scheduler stack instead.
And see where that goes.

Done.

So in effect:

To this end we need some scheduling tooling that allows to add/remove analyzer, (re)run analysis in batch, incrementally stay up to date with new incoming content blobs.

Done.

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T1227: General improvments of the indexer: Schedule indexer tasks from Resolved to Migrated.Jan 8 2023, 9:58 PM

Indexers: batch content analyzer infrastructure
Closed, MigratedEdits Locked
Actions

Description

Related Objects
Search...

Event Timeline

Indexers: batch content analyzer infrastructureClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Indexers: batch content analyzer infrastructure
Closed, MigratedEdits Locked
Actions

Related Objects
Search...