Page MenuHomeSoftware Heritage

utils: Use https protocol for xmlns of schema.org
AbandonedPublic

Authored by anlambert on Apr 11 2022, 4:21 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Maniphest Tasks
T3939: Display metadata provenance in the deposit web ui
Summary

schema.org recommends to use the https protocol when declaring xmlns
but we expect it to be http when parsing raw XML metadata.

Metadata sent by HAL use https protocol for schema.org so this is why
we cannot currently parse their provenance.

Related to T3939

Diff Detail

Repository
rDDEP Push deposit
Branch
schema-org-https
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28377
Build 44374: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 44373: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7547 (id=27363)

Rebasing onto 88ea58cb36...

Current branch diff-target is up to date.
Changes applied before test
commit 197b818a1f73b2fca145def74eb982f43a302a59
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Apr 11 16:20:01 2022 +0200

    utils: Use https protocol for xmlns of schema.org
    
    schema.org recommends to use the https protocol when declaring xmlns
    but we expect it to be http when parsing raw XML metadata.
    
    Metadata sent by HAL use https protocol for schema.org so this is why
    we cannot currently parse their provenance.
    
    Related to T3939

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/838/ for more details.

This is not compatible with Codemeta, which uses http:// : https://raw.githubusercontent.com/codemeta/codemeta/2.0/codemeta.jsonld

That's because schema.org used to use only http:// instead of https://

This is not compatible with Codemeta, which uses http:// : https://raw.githubusercontent.com/codemeta/codemeta/2.0/codemeta.jsonld

That's because schema.org used to use only http:// instead of https://

Ok so the issue comes from HAL then. They should use http instead of https for schema.org in the XML metadata they send to us.

Abandoning this, http should be used as https break compatibilty with codemeta.