- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 8 2023
Jul 9 2021
Deployed and running so closing.
Deployed in production as well and triggered a run:
From a quick test, it looks like the "Project and release activity details" feed can go back multiple years without any issue, allowing us to backfill the data for all known origins, before adding the incremental behavior to the lister.
Deployed in staging and triggered a run:
Build is green
- Adapt according to last suggestion about stub
- and from the irc discussion, adding a sleep call to avoid the first throttling due to the initialization of the xmlrpc client [1]
Build is green
Attend to the major part of the review (thx)
In D5977#154053, @ardumont wrote:
Looks good, thanks a lot!
Looks good, thanks a lot!
Build is green
- Handle throttling appropriately (adapt throttling decorators [1])
- Rework logic to still deal with pages of results so the write in dbs occur incrementally
Running through docker, i actually need to change a few things:
- throttling needs to change (D5983)
- actual logic of sending all origins in one go won't be ok for the first run (no flush prior to actually finish the run...) so i'll need to rework this (in-progress) [1]
Build is green
Rebase
Build is green
- Rebase
- Refactor
Build is green
Adapt according to review:
- This rewrote completely the lister and stops using the simple api [1]
- Uses the client.changelog_last_serial and client.changelog_since_serial endpoint
Jul 8 2021
From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.
I think the properly paginated way of doing with the current PyPI XMLRPC api is to call:
- changelog_last_serial() to get the highest serial, to be used as a termination condition. Currently returns 2168587
- changelog_since_serial(<current_serial>) in a loop until the last serial returned is higher than the one set as termination condition. Looks like this returns 50k results per call.
(this will make us miss the last few updates that happened since the lister started, but this is probably marginal).
Jul 7 2021
In D5977#153851, @olasd wrote:Looking at the size of the changelog (2-ish million entries for 50k-ish pages means 50-ish requests), I /think/ the lister could always be running in incremental mode, rather than having to maintain two modes in the long run.
From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.
Nice, that lister should perform better once that feature deployed to production.
Build is green
Adapt according to judicious remarks (thanks ;)
Thanks!
Nice, that lister should perform better once that feature deployed to production. I added a first batch of inline comments.
Build is green
Jun 21 2021
Nov 26 2019
Nov 20 2019
Nov 19 2019
Oct 13 2019
Oct 5 2019
Oct 3 2019
Oct 1 2019
Current work is in the swh-loader-repository within the package-loader branch.