Page MenuHomeSoftware Heritage

Refactor swh-indexer to simplify non-trivial mapping operations
Closed, MigratedEdits Locked

Description

I am considering switching swh-indexer's mappings to use rdflib.Graph as the internal representation instead of Python objects based on JSON-LD.

I expect this to make it easier to work with namespaces and various types (eg. {"@id": s} we have everywhere becomes URIRef(s) whose value is checked for validatity immediately instead of crashing later while compacting/expanding). And will rely less on PyLD so https://forge.softwareheritage.org/T4436 won't be an issue anymore.

There are some issues with this:

  • The PubSpec mapping relies on having two authorship statements whose value is a list; which is not preserved by JSON-LD compaction when working from a fully expanded JSON-LD. It worked so far because we only used the compaction algo on a somewhat-compacted form. https://github.com/w3c/json-ld-api/issues/547
  • Lists are built-in to JSON-LD; but in RDF they need to be added as linked lists / graph chains, which is clunky to work with, so I need to design a better API than the naive solution here.

Event Timeline

vlorentz created this task.