Page MenuHomeSoftware Heritage

Always output valid JSON-LD.
ClosedPublic

Authored by vlorentz on Oct 29 2018, 3:24 PM.

Details

Summary

Our metadata extraction used to export a nested dict structure
loosely based on JSON-LD with schema.org/CodeMeta.

This diff makes it output proper JSON-LD data.

Note that because of some issues (eg.
https://github.com/codemeta/codemeta/issues/198 and the impossibility to
represent a software entity without a specific version), I removed
softwareRequirements and softwareSuggestions from the output

As a side-effect, it resolves T1289

PS: Morane, don't review this diff today, it can wait until tomorrow :)

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
output-jsonld
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2040
Build 2478: tox-on-jenkinsJenkins
Build 2477: arc lint + arc unit

Event Timeline

  • Simplify the normalization logic.
zack added inline comments.
swh/indexer/codemeta.py
24

This URI is here as a constant (good!) but its value is repeated in many places in the rest of this diff (bad :-)), violating DRY. It would be nice to remove other occurrences of this verbatim URI, e.g., using dictionary comprehension or string concatenation referencing this constant.

  • Deduplicate Codemeta's base URI.
  • Add CITATION for codemeta.jsonld.

Accepting because I accepted D620

This revision is now accepted and ready to land.Nov 5 2018, 5:31 PM
This revision was automatically updated to reflect the committed changes.