Page MenuHomeSoftware Heritage

Add method 'origin_get' endpoint
ClosedPublic

Authored by vlorentz on Oct 3 2022, 1:27 PM.

Details

Summary

This will allow getting the content of documents in ElasticSearch
in order to debug T4599.

Diff Detail

Repository
rDSEA Archive search
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8597 (id=31049)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit e0bafd2ba6522551adcf3ae3164d2fe238f4c8e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/342/ for more details.

I'm not too fond of the List[str] -> List[Dict[str, Any]] signature, with the "not found" requested documents just disappearing from the result list.

As there's no performance concerns for this debugging interface (and that's what the actual function does anyway), I think I'd prefer passing a single url to origin_get and getting an Optional[document] back. What do you think?

swh/search/tests/test_search.py
1329

spurious print

yeah, fair. Initially I made it batched to use the async API to be efficient, but asyncio annoyed me so I gave up

Build has FAILED

Patch application report for D8597 (id=31196)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit 1eb2dec4e694cae2270ff23179eac9d0a0275d9e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/343/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/343/console

Build is green

Patch application report for D8597 (id=31199)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit 033a805f056a3d132061f4f984059ee4f71600b0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/344/ for more details.

Looks fine except for the return type which I believe is too narrow?

swh/search/elasticsearch.py
389

I believe there's more than strs in the values of these returned dicts (e.g. I think visit_types is a list).

This revision is now accepted and ready to land.Oct 7 2022, 12:52 PM

Build is green

Patch application report for D8597 (id=31205)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit 370b3f1b5539be63914b0a12e431d707eb35f5ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/345/ for more details.

This revision was automatically updated to reflect the committed changes.