Use http api lastUpload argument in search query to retrieve new or
updated origins since last run
Related to T4597
Differential D8663
Hackage: Implement incremental mode franckbret on Oct 12 2022, 10:13 AM. Authored by
Details
Use http api lastUpload argument in search query to retrieve new or Related to T4597
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D8663 (id=31286)Rebasing onto 05cd1de1cd... First, rewinding head to replay your work on top of it... Applying: Hackage: Implement incremental mode Changes applied before testcommit 2390eefdd68e0adb58e758bc691b8ce95464a986 Author: Franck Bret <franck.bret@octobus.net> Date: Wed Oct 12 10:08:38 2022 +0200 Hackage: Implement incremental mode Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related T4597 See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/787/ for more details. Comment Actions for entry in page: last_update = iso8601.parse_date(entry["lastUpload"]) if not self.earliest_update or last_update > self.earliest_update: self.earliest_update = last_update This makes self.earliest_update the latest lastUpload, not the earliest. Either way, I don't think this is the right way to do it, because:
Assuming Hackage has no lag between the lastUpload timestamp and the moment packages are available, you should use the time the lister started. Comment Actions Ok earliest is a too ambiguous term. I wanted to have something to store the max update_date of a complete listing.
For now the operator is greater than, but do you mean it should be dynamic at lister instantiation?
ok got it, will use the time the lister started Comment Actions Build is green Patch application report for D8663 (id=31304)Rebasing onto 05cd1de1cd... First, rewinding head to replay your work on top of it... Applying: Hackage: Implement incremental mode Changes applied before testcommit 88f8389a8506e03e311f801609febb231eb03f5a Author: Franck Bret <franck.bret@octobus.net> Date: Wed Oct 12 10:08:38 2022 +0200 Hackage: Implement incremental mode Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related to T4597 See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/788/ for more details. Comment Actions lastUpload is documented as working on dates rather than datetime, so we are going to need to do the same stuff as with NuGet. Also, this code is sending datetimes, instead of dates, so the API ignores the filter entirely. For example: curl "https://hackage.haskell.org/packages/search" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"page": 0, "sortColumn": "default", "sortDirection": "ascending", "searchQuery": "(lastUpload > 2022-11-10T01:00:00Z)"}' -X POST | jq . | head { "numberOfResults": 15644, "pageContents": [ { "description": "Examples of 3D graphics programming with OpenGL", "downloads": 18, "lastUpload": "2016-07-22T14:26:23.038905Z", "maintainers": [ { "display": "WolfgangJeltsch",
Comment Actions buuuut you are using a strict inequality, so you need to subtract one day, in order not to miss uploads submitted after the previous run of the lister but on the same day. Also, you should apply .astimezone(tz=timezone.utc) before converting to date, because the database is not guaranteed to return timestamps in UTC even when they were written in UTC. (Sorry for the back-and-forth; hopefully I'm done now.) Comment Actions Use greater than or equal instead of strict comparison when building http api query params for incremental listing Comment Actions The http api is able to do greater than equal filtering. Now the filter operator is ">=" instead of ">". Comment Actions Build is green Patch application report for D8663 (id=31843)Rebasing onto ea146ce297... Current branch diff-target is up to date. Changes applied before testcommit d475c1529985162f7c4e0b098e73f958b349decb Author: Franck Bret <franck.bret@octobus.net> Date: Wed Oct 12 10:08:38 2022 +0200 Hackage: Implement incremental mode Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related to T4597 See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/859/ for more details. Comment Actions sweet. One last thing: could you make tests check the request body is as expected? See https://requests-mock.readthedocs.io/en/latest/history.html Comment Actions Improve test for incremental listing, ensure the http searchQuery/lastUpload value is a is a date Comment Actions Build is green Patch application report for D8663 (id=31865)Rebasing onto ea146ce297... Current branch diff-target is up to date. Changes applied before testcommit 518cb4b9761dde8b38c2fe99e268268ced15cb5f Author: Franck Bret <franck.bret@octobus.net> Date: Wed Oct 12 10:08:38 2022 +0200 Hackage: Implement incremental mode Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related to T4597 See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/861/ for more details. Comment Actions Build is green Patch application report for D8663 (id=31924)Rebasing onto 6ad61aec23... First, rewinding head to replay your work on top of it... Applying: Hackage: Implement incremental mode Changes applied before testcommit e48a1817ab3f340f64e8d4ce4df2dfe52ae474fa Author: Franck Bret <franck.bret@octobus.net> Date: Wed Oct 12 10:08:38 2022 +0200 Hackage: Implement incremental mode Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related to T4597 See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/864/ for more details. Comment Actions Build is green Patch application report for D8663 (id=31925)Rebasing onto 6ad61aec23... Current branch diff-target is up to date. Changes applied before testcommit 065b3f81a1e8769095251c6dbd88f8661a577366 Author: Franck Bret <franck.bret@octobus.net> Date: Wed Oct 12 10:08:38 2022 +0200 Hackage: Implement incremental mode Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related to T4597 See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/865/ for more details. |