Improve the Elastic Search, http api get query to retrieve only new or updated origins since the last lister execution.
Related T2833
Details
Diff Detail
- Repository
- rDLS Listers
- Branch
- cpan-incremental
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 32756 Build 51321: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 51320: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D8824 (id=31816)
Rebasing onto e1f3f87c73...
Current branch diff-target is up to date.
Changes applied before test
commit f71934515462eedddc510a8b39bba7ae6a3fc97e Author: Franck Bret <franck.bret@octobus.net> Date: Wed Nov 9 15:37:16 2022 +0100 Cpan: Implement incremental mode Improve the Elastic Search, http api get query to retrieve only new or updated origins since the last lister execution.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/855/ for more details.
@franckbret, as explained in my inline comment we cannot use the date filtering on the release index of CPAN elasticsearch.
The only incremental mode we can implement here is to filter the ListedOrigininstances sent to the scheduler according to the
last_updatevalue, if it is greater than the date from the lister state, we can yield it.
Nevertheless, I am not sure if it is worth it as a full listing takes around 10 minutes, which is pretty fast.
swh/lister/cpan/lister.py | ||
---|---|---|
190–201 | We cannot use that filter here as we are querying the release index of CPAN elasticsearch. |
Abandon revision because in this case we can not really get advantages of an incremental mode