Page MenuHomeSoftware Heritage

Normalize Codemeta documents by expanding them.
ClosedPublic

Authored by vlorentz on Dec 11 2020, 1:44 PM.

Details

Summary

This causes all fields to be an array of objects, instead of being
a mix of concrete values, arrays, and objects.

ElasticSearch requires this consistency to be able to dynamically
create mappings.

Resolves T2876.

The FIXME will be resolved by D4723.

Diff Detail

Repository
rDSEA Archive search
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4722 (id=16724)

Rebasing onto c7cd341796...

Current branch diff-target is up to date.
Changes applied before test
commit 4390bea8dbd89016ba87010d785ed55e16d6038c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:43:20 2020 +0100

    Normalize Codemeta documents by expanding them.
    
    This causes all fields to be an array of objects, instead of being
    a mix of concrete values, arrays, and objects.
    
    ElasticSearch requires this consistency to be able to dynamically
    create mappings.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/56/ for more details.

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/search/elasticsearch.py
19

what about:

from swh.indexer.codemeta import expand as normalize_intrinsic_metadata

?

This revision is now accepted and ready to land.Dec 11 2020, 2:07 PM
swh/search/elasticsearch.py
27

I kinda found the following nicer to the eyes.

intrinsic_metadata = res.get("intrinsic_metadata")
if intrinsic_metadata:
    res["intrinsic_metadata"] = normalize_intrinsic_metadata(intrinsic_metadata)

What do you think?

Build is green

Patch application report for D4722 (id=16726)

Rebasing onto c7cd341796...

Current branch diff-target is up to date.
Changes applied before test
commit 407a00e8aa5a69350689a4a02609bf79a0e75ff3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:43:20 2020 +0100

    Normalize Codemeta documents by expanding them.
    
    This causes all fields to be an array of objects, instead of being
    a mix of concrete values, arrays, and objects.
    
    ElasticSearch requires this consistency to be able to dynamically
    create mappings.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/58/ for more details.