Page MenuHomeSoftware Heritage

Write specs about metadata workflow
Closed, MigratedEdits Locked


(opening tasks for myself on my last day might not be the best idea, but this should be written somewhere)

The metadata workflow and strategy is about recovering descriptive metadata on the artifacts in the archive.
This metadata can be found:

  • in the content itself -> intrinsic metadata (implemented with T1232)
  • not in the content -> extrinsic metadata
    • extrinsic metadata can be found with the content when listing or loading the content
    • or in a software registry (e.g Wikidata, swMath, ASCL..)

The different components and the storage infrastructure that was put in place to keep this information
should be specified and documented.

A discussion started over the metadata_provider in D637.

Related Objects

Event Timeline

moranegg triaged this task as Normal priority.Nov 14 2018, 4:03 PM
moranegg created this task.

Once this task is handled, an improved docstring for Storage.metadata_provider_add would be necessary.

Where would you put this type of specs?

Either in the docs (like the persistent identifiers spec) or on the wiki (like the snapshot spec).

What about writing it on the wiki, and moving it to the docs when it's finished?

EDIT: nevermind, it's discussed in T1683

moranegg added a parent task: Unknown Object (Maniphest Task).Jun 14 2019, 12:33 PM

It's still missing " Define an architecture to fetch extrinsic metadata outside listers and loaders"

but this is an overly broad task, so indeed, let's close it so it doesn't clutter dashboards