Page MenuHomeSoftware Heritage

Normalize Codemeta documents by expanding them.
ClosedPublic

Authored by vlorentz on Dec 11 2020, 1:44 PM.

Details

Summary

This causes all fields to be an array of objects, instead of being
a mix of concrete values, arrays, and objects.

ElasticSearch requires this consistency to be able to dynamically
create mappings.

Resolves T2876.

The FIXME will be resolved by D4723.

Diff Detail

Repository
rDSEA Archive search
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17889
Build 27639: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 27638: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4722 (id=16724)

Rebasing onto c7cd341796...

Current branch diff-target is up to date.
Changes applied before test
commit 4390bea8dbd89016ba87010d785ed55e16d6038c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:43:20 2020 +0100

    Normalize Codemeta documents by expanding them.
    
    This causes all fields to be an array of objects, instead of being
    a mix of concrete values, arrays, and objects.
    
    ElasticSearch requires this consistency to be able to dynamically
    create mappings.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/56/ for more details.

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/search/elasticsearch.py
20

what about:

from swh.indexer.codemeta import expand as normalize_intrinsic_metadata

?

This revision is now accepted and ready to land.Dec 11 2020, 2:07 PM
swh/search/elasticsearch.py
32

I kinda found the following nicer to the eyes.

intrinsic_metadata = res.get("intrinsic_metadata")
if intrinsic_metadata:
    res["intrinsic_metadata"] = normalize_intrinsic_metadata(intrinsic_metadata)

What do you think?

Build is green

Patch application report for D4722 (id=16726)

Rebasing onto c7cd341796...

Current branch diff-target is up to date.
Changes applied before test
commit 407a00e8aa5a69350689a4a02609bf79a0e75ff3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:43:20 2020 +0100

    Normalize Codemeta documents by expanding them.
    
    This causes all fields to be an array of objects, instead of being
    a mix of concrete values, arrays, and objects.
    
    ElasticSearch requires this consistency to be able to dynamically
    create mappings.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/58/ for more details.