Page MenuHomeSoftware Heritage

scheduler: Update schema to reference origin informations
Open, NormalPublic

Description

This will improve:

  • swh-loader*: a posteriori analysis for loading failures (no longer need to query on the task's arguments which completely depends on the loader's nature)
  • webapp: the save code now where we could provide more information on the status' visits (at the moment, nothing is provided)
  • swh-indexer: that could also help in the metadata indexers to check for the visit's status (which could also be implemented in other ways but hey ;)

This calls for:

  • sql:
    • add a new column origin_id on table task (not null)
    • add new columns origin_id, visit_id on table task_run (not null for both)
    • create independent indexes on those new columns
  • loader-core: update the load_status method to provide the origin_id and visit_id as results (alongside the current eventful/uneventful status)
  • lister: update them so that when scheduling a task, we provide the origin_id (T1471 could ease that step :)
  • migration: After schema migration, we need a routine that actually fill in the gaps in existing task/task_run.

Event Timeline

ardumont triaged this task as Normal priority.Jan 15 2019, 2:36 PM
ardumont created this task.