Page MenuHomeSoftware Heritage

Add method 'origin_get' endpoint
ClosedPublic

Authored by vlorentz on Oct 3 2022, 1:27 PM.

Details

Summary

This will allow getting the content of documents in ElasticSearch
in order to debug T4599.

Diff Detail

Repository
rDSEA Archive search
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32019
Build 50123: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50122: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8597 (id=31049)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit e0bafd2ba6522551adcf3ae3164d2fe238f4c8e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/342/ for more details.

I'm not too fond of the List[str] -> List[Dict[str, Any]] signature, with the "not found" requested documents just disappearing from the result list.

As there's no performance concerns for this debugging interface (and that's what the actual function does anyway), I think I'd prefer passing a single url to origin_get and getting an Optional[document] back. What do you think?

swh/search/tests/test_search.py
1330

spurious print

yeah, fair. Initially I made it batched to use the async API to be efficient, but asyncio annoyed me so I gave up

Build has FAILED

Patch application report for D8597 (id=31196)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit 1eb2dec4e694cae2270ff23179eac9d0a0275d9e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/343/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/343/console

Build is green

Patch application report for D8597 (id=31199)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit 033a805f056a3d132061f4f984059ee4f71600b0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/344/ for more details.

Looks fine except for the return type which I believe is too narrow?

swh/search/elasticsearch.py
389

I believe there's more than strs in the values of these returned dicts (e.g. I think visit_types is a list).

This revision is now accepted and ready to land.Oct 7 2022, 12:52 PM

Build is green

Patch application report for D8597 (id=31205)

Rebasing onto 070f015f4b...

Current branch diff-target is up to date.
Changes applied before test
commit 370b3f1b5539be63914b0a12e431d707eb35f5ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 3 13:27:32 2022 +0200

    Add method 'origin_get' endpoint
    
    This will allow getting the content of documents in ElasticSearch
    in order to debug

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/345/ for more details.

This revision was automatically updated to reflect the committed changes.