This will allow getting the content of documents in ElasticSearch
in order to debug T4599.
Details
- Reviewers
olasd - Group Reviewers
Reviewers - Maniphest Tasks
- T4599: Github descriptions are not used to search origins
- Commits
- rDSEA370b3f1b5539: Add method 'origin_get' endpoint
Diff Detail
- Repository
- rDSEA Archive search
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
Patch application report for D8597 (id=31049)
Rebasing onto 070f015f4b...
Current branch diff-target is up to date.
Changes applied before test
commit e0bafd2ba6522551adcf3ae3164d2fe238f4c8e2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Oct 3 13:27:32 2022 +0200 Add method 'origin_get' endpoint This will allow getting the content of documents in ElasticSearch in order to debug
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/342/ for more details.
I'm not too fond of the List[str] -> List[Dict[str, Any]] signature, with the "not found" requested documents just disappearing from the result list.
As there's no performance concerns for this debugging interface (and that's what the actual function does anyway), I think I'd prefer passing a single url to origin_get and getting an Optional[document] back. What do you think?
swh/search/tests/test_search.py | ||
---|---|---|
1329 | spurious print |
yeah, fair. Initially I made it batched to use the async API to be efficient, but asyncio annoyed me so I gave up
Build has FAILED
Patch application report for D8597 (id=31196)
Rebasing onto 070f015f4b...
Current branch diff-target is up to date.
Changes applied before test
commit 1eb2dec4e694cae2270ff23179eac9d0a0275d9e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Oct 3 13:27:32 2022 +0200 Add method 'origin_get' endpoint This will allow getting the content of documents in ElasticSearch in order to debug
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/343/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/343/console
Build is green
Patch application report for D8597 (id=31199)
Rebasing onto 070f015f4b...
Current branch diff-target is up to date.
Changes applied before test
commit 033a805f056a3d132061f4f984059ee4f71600b0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Oct 3 13:27:32 2022 +0200 Add method 'origin_get' endpoint This will allow getting the content of documents in ElasticSearch in order to debug
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/344/ for more details.
Looks fine except for the return type which I believe is too narrow?
swh/search/elasticsearch.py | ||
---|---|---|
389 | I believe there's more than strs in the values of these returned dicts (e.g. I think visit_types is a list). |
Build is green
Patch application report for D8597 (id=31205)
Rebasing onto 070f015f4b...
Current branch diff-target is up to date.
Changes applied before test
commit 370b3f1b5539be63914b0a12e431d707eb35f5ed Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Oct 3 13:27:32 2022 +0200 Add method 'origin_get' endpoint This will allow getting the content of documents in ElasticSearch in order to debug
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/345/ for more details.