Page MenuHomeSoftware Heritage

Index existing contents (mimetype, language, license)
Started, Work in Progress, HighPublic

Description

Leveraging azure infrastructure, trigger the indexation of contents with the following indexer:

  • mimetype
  • language
  • license

This means:

  • Provisioning vms (reuse the vms created for T712)
  • Deploying the indexer chaining with adaptation of current setup (swh-site)
  • Send contents for indexation (in-progress)

Note:
At the moment, each indexer reads the content from a multiplexer objstorage (starting from azure, if not found, fallback read from banco, if not found, fallback read from uffizi).

Related Objects

Event Timeline

ardumont updated the task description. (Show Details)Apr 26 2017, 2:09 PM
ardumont updated the task description. (Show Details)May 2 2017, 4:02 PM
ardumont updated the task description. (Show Details)May 2 2017, 5:44 PM
ardumont changed the task status from Open to Work in Progress.May 5 2017, 2:36 PM
ardumont updated the task description. (Show Details)May 5 2017, 2:43 PM
ardumont updated the task description. (Show Details)Sep 27 2017, 5:03 PM
zack renamed this task from Indexing existing contents (mimetype, language, license) to Index existing contents (mimetype, language, license).Oct 6 2017, 3:04 PM
This comment was removed by zack.
zack added a subscriber: zack.Dec 7 2017, 8:34 AM

First batch is done: 3.7 billion [1]

Next batch of 1 billion remaining is currently being computed (for scheduling purposes).

[1] https://grafana.softwareheritage.org/d/bPlebbSiz/softwareheritage-indexer?orgId=1

ardumont updated the task description. (Show Details)Sep 7 2018, 11:55 PM