Page MenuHomeSoftware Heritage

Index existing contents (mimetype, language, license)
Closed, MigratedEdits Locked

Description

Leveraging azure infrastructure, trigger the indexation of contents with the following indexer:

  • mimetype
  • language
  • license

This means:

  • Provisioning vms (reuse the vms created for T712)
  • Deploying the indexer chaining with adaptation of current setup (swh-site)
  • Send contents for indexation (in-progress)

Note:
At the moment, each indexer reads the content from a multiplexer objstorage (starting from azure, if not found, fallback read from banco, if not found, fallback read from uffizi).

Related Objects

Event Timeline

ardumont changed the task status from Open to Work in Progress.May 5 2017, 2:36 PM
zack renamed this task from Indexing existing contents (mimetype, language, license) to Index existing contents (mimetype, language, license).Oct 6 2017, 3:04 PM
This comment was removed by zack.

First batch is done: 3.7 billion [1]

Next batch of 1 billion remaining is currently being computed (for scheduling purposes).

[1] https://grafana.softwareheritage.org/d/bPlebbSiz/softwareheritage-indexer?orgId=1