Page MenuHomeSoftware Heritage
Feed Advanced Search

Aug 26 2021

ardumont added a comment to T3502: Date overflow error in scheduler journal client.

status:

  • scheduler v0.17.1 deployed on production [1] (db migrated) and staging.
  • then swh-scheduler-journal-client service restarted.
Aug 26 2021, 11:26 AM · System administration, Scheduling utilities
ardumont added a comment to T3502: Date overflow error in scheduler journal client.

I bumped the priority since scheduler runners (next-gen) are depending on the results of journal client (scheduler metrics as well).

Aug 26 2021, 10:10 AM · System administration, Scheduling utilities
ardumont changed the status of T3502: Date overflow error in scheduler journal client from Open to Work in Progress.
Aug 26 2021, 9:57 AM · System administration, Scheduling utilities

Aug 25 2021

ardumont added a revision to T3502: Date overflow error in scheduler journal client: D6136: journal_client: Ensure queue position does not overflow.
Aug 25 2021, 6:19 PM · System administration, Scheduling utilities
vsellier triaged T3502: Date overflow error in scheduler journal client as High priority.
Aug 25 2021, 6:10 PM · System administration, Scheduling utilities
olasd added a comment to T3403: Use forge URL network location as default lister instance name.

I've duplicated the credentials for the relevant forges, and updated the following instance names:

Aug 25 2021, 2:38 PM · Scheduling utilities, Lister

Aug 13 2021

ardumont added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.
  • Refactor a bit the journal client to update a docstring and inline one function (done, that'd be the 2 previous commits mentioned here just below that comment ^).
  • Deactivate failing visits (delegating to listers the act of activating back those origins which gets live again). I have diffs which deal with this that needs some rebase and work according to latest change (I need to get back to it) [1].
  • Deploy the current scheduler implementation (master) when that previous point is done. (That's gonna be my goal to reach prior to some vacation break).
Aug 13 2021, 4:44 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont moved T3471: production: Deploy swh.scheduler v0.17 from deployed/landed/monitoring to done on the System administration board.
Aug 13 2021, 3:49 PM · System administration, Archive coverage, Scheduling utilities
ardumont closed T3471: production: Deploy swh.scheduler v0.17, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, as Resolved.
Aug 13 2021, 3:48 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont closed T3471: production: Deploy swh.scheduler v0.17 as Resolved.
Aug 13 2021, 3:48 PM · System administration, Archive coverage, Scheduling utilities
ardumont updated subscribers of T3471: production: Deploy swh.scheduler v0.17.

including the next-gen scheduler runner not yet puppetized [4]

All got done except this part ^.
This needs first the following:

  • D5809 to be rebased on latest master branch (v0.17)
  • the saatchi venv (in swhscheduler home) to be updated with it
Aug 13 2021, 3:48 PM · System administration, Archive coverage, Scheduling utilities
ardumont moved T3471: production: Deploy swh.scheduler v0.17 from code-review/await-feedback/pause to in-progress on the System administration board.
Aug 13 2021, 10:33 AM · System administration, Archive coverage, Scheduling utilities
ardumont moved T3471: production: Deploy swh.scheduler v0.17 from in-progress to code-review/await-feedback/pause on the System administration board.
Aug 13 2021, 10:33 AM · System administration, Archive coverage, Scheduling utilities
ardumont changed the status of T3471: production: Deploy swh.scheduler v0.17, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, from Open to Work in Progress.
Aug 13 2021, 10:33 AM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont changed the status of T3471: production: Deploy swh.scheduler v0.17 from Open to Work in Progress.
Aug 13 2021, 10:33 AM · System administration, Archive coverage, Scheduling utilities
ardumont edited projects for T3471: production: Deploy swh.scheduler v0.17, added: System administration; removed Sprint 2021 01.
Aug 13 2021, 10:33 AM · System administration, Archive coverage, Scheduling utilities
ardumont added a comment to T3471: production: Deploy swh.scheduler v0.17.

including the next-gen scheduler runner not yet puppetized [4]

Aug 13 2021, 10:32 AM · System administration, Archive coverage, Scheduling utilities

Aug 12 2021

ardumont moved T3471: production: Deploy swh.scheduler v0.17 from Backlog to in-progress on the Sprint 2021 01 board.
Aug 12 2021, 8:41 AM · System administration, Archive coverage, Scheduling utilities
ardumont added a comment to T3471: production: Deploy swh.scheduler v0.17.

Following actions in order:

Aug 12 2021, 8:41 AM · System administration, Archive coverage, Scheduling utilities

Aug 9 2021

ardumont closed T3456: staging: Deploy scheduler v0.17 as Resolved.
Aug 9 2021, 11:07 AM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont closed T3456: staging: Deploy scheduler v0.17, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, as Resolved.
Aug 9 2021, 11:07 AM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont moved T3456: staging: Deploy scheduler v0.17 from code-review/await-feedback/pause to deployed/landed/monitoring on the System administration board.
Aug 9 2021, 11:07 AM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont triaged T3471: production: Deploy swh.scheduler v0.17 as High priority.
Aug 9 2021, 11:06 AM · System administration, Archive coverage, Scheduling utilities

Aug 6 2021

ardumont moved T3456: staging: Deploy scheduler v0.17 from in-progress to code-review/await-feedback/pause on the System administration board.
Aug 6 2021, 3:11 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont updated the task description for T3456: staging: Deploy scheduler v0.17.
Aug 6 2021, 3:11 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a comment to T3456: staging: Deploy scheduler v0.17.

Ensure the journal client is doing its new job

Aug 6 2021, 3:10 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont updated the task description for T3456: staging: Deploy scheduler v0.17.
Aug 6 2021, 3:09 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont updated the task description for T3456: staging: Deploy scheduler v0.17.
Aug 6 2021, 12:12 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont changed the status of T3456: staging: Deploy scheduler v0.17, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, from Open to Work in Progress.
Aug 6 2021, 12:11 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont changed the status of T3456: staging: Deploy scheduler v0.17 from Open to Work in Progress.
Aug 6 2021, 12:11 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a project to T3456: staging: Deploy scheduler v0.17: System administration.
Aug 6 2021, 12:11 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont updated the task description for T3456: staging: Deploy scheduler v0.17.
Aug 6 2021, 11:10 AM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities

Aug 5 2021

ardumont claimed T3456: staging: Deploy scheduler v0.17.
Aug 5 2021, 3:25 PM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities

Aug 4 2021

ardumont triaged T3456: staging: Deploy scheduler v0.17 as High priority.
Aug 4 2021, 10:10 AM · System administration, Sprint 2021 01, Archive coverage, Scheduling utilities

Aug 3 2021

ardumont removed a revision from T3403: Use forge URL network location as default lister instance name: D6052: Install update-metrics as a service called daily.
Aug 3 2021, 2:32 PM · Scheduling utilities, Lister
ardumont added a revision to T3403: Use forge URL network location as default lister instance name: D6052: Install update-metrics as a service called daily.
Aug 3 2021, 2:25 PM · Scheduling utilities, Lister
ardumont added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Deactivate failing visits (delegating to listers the act of activating back those
origins which gets live again). I have diffs which deal with this that needs some
rebase and work according to latest change (I need to get back to it) [1].

Aug 3 2021, 8:58 AM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jul 30 2021

ardumont changed the status of T2345: Improve handling of recurrent loading tasks in scheduler from Open to Work in Progress.

(^ for a while ;)

Jul 30 2021, 3:55 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Status on this, after the recent refactoring we did with @olasd to simplify the actual
implementation (backend and journal client). There remains to:

Jul 30 2021, 3:54 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jul 22 2021

ardumont closed D5919: Start handling of recurrent loading tasks in scheduler.
Jul 22 2021, 2:22 PM · Scheduling utilities
olasd accepted D5919: Start handling of recurrent loading tasks in scheduler.
Jul 22 2021, 11:29 AM · Scheduling utilities

Jul 13 2021

anlambert added a revision to T3403: Use forge URL network location as default lister instance name: D5989: pattern: Use URL network location as instance name when not provided.
Jul 13 2021, 12:40 PM · Scheduling utilities, Lister
anlambert added a comment to T3403: Use forge URL network location as default lister instance name.

It seems the remaining lister instances to process are the phabricator ones that also need credentials.
This is what we currently have in the listers table in scheduler database.

Jul 13 2021, 11:47 AM · Scheduling utilities, Lister

Jul 9 2021

olasd changed the status of T3403: Use forge URL network location as default lister instance name from Open to Work in Progress.

I've updated the listers with no credentials:

Jul 9 2021, 3:37 PM · Scheduling utilities, Lister
ardumont added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Updated stats in descending order on the no_last_update column:

Jul 9 2021, 3:11 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Relatedly to this task, some work has been started to make the pypi lister list its
origins with the last_update information in the diff D5977 / T3399 (review got done
and the implementation needs to be improved but still ;).

Jul 9 2021, 3:05 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont closed T3399: Improve PyPI lister to pull last update information when running incrementally, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, as Resolved.
Jul 9 2021, 2:52 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont closed D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:59 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 12:56 PM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.
  • Adapt according to last suggestion about stub
  • and from the irc discussion, adding a sleep call to avoid the first throttling due to the initialization of the xmlrpc client [1]
Jul 9 2021, 12:53 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 12:21 PM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:19 PM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:18 PM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Attend to the major part of the review (thx)

Jul 9 2021, 12:18 PM · Scheduling utilities, Origin-Pypi
olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:11 PM · Scheduling utilities, Origin-Pypi
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Looks good, thanks a lot!

Jul 9 2021, 12:06 PM · Scheduling utilities, Origin-Pypi
olasd accepted D5977: Make PyPI lister incremental and complete in regards to last_update.

Looks good, thanks a lot!

Jul 9 2021, 11:57 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:47 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:45 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:41 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:40 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:15 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 11:15 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.
  • Handle throttling appropriately (adapt throttling decorators [1])
  • Rework logic to still deal with pages of results so the write in dbs occur incrementally
Jul 9 2021, 11:12 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:10 AM · Scheduling utilities, Origin-Pypi
ardumont planned changes to D5977: Make PyPI lister incremental and complete in regards to last_update.

Running through docker, i actually need to change a few things:

  • throttling needs to change (D5983)
  • actual logic of sending all origins in one go won't be ok for the first run (no flush prior to actually finish the run...) so i'll need to rework this (in-progress) [1]
Jul 9 2021, 10:59 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 10:18 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Rebase

Jul 9 2021, 10:15 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 10:11 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.
  • Rebase
  • Refactor
Jul 9 2021, 10:08 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 10:03 AM · Scheduling utilities, Origin-Pypi
ardumont retitled D5977: Make PyPI lister incremental and complete in regards to last_update from Add incremental PyPI lister which fetches last_update per origin to Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 10:01 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 10:01 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Adapt according to review:

  • This rewrote completely the lister and stops using the simple api [1]
  • Uses the client.changelog_last_serial and client.changelog_since_serial endpoint
Jul 9 2021, 10:00 AM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 8:52 AM · Scheduling utilities, Origin-Pypi

Jul 8 2021

ardumont added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Status on the latest development for this task, "Baseline for the recurrence of origin
visits" chapter has been implemented in the following stacked diffs (in review):

Jul 8 2021, 12:30 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a revision to T2345: Improve handling of recurrent loading tasks in scheduler: D5980: journal_client: Disable origins when too many visited attempts failed.
Jul 8 2021, 11:26 AM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.

I think the properly paginated way of doing with the current PyPI XMLRPC api is to call:

  • changelog_last_serial() to get the highest serial, to be used as a termination condition. Currently returns 2168587
  • changelog_since_serial(<current_serial>) in a loop until the last serial returned is higher than the one set as termination condition. Looks like this returns 50k results per call.

(this will make us miss the last few updates that happened since the lister started, but this is probably marginal).

Jul 8 2021, 9:26 AM · Scheduling utilities, Origin-Pypi

Jul 7 2021

olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.
In D5977#153851, @olasd wrote:

Looking at the size of the changelog (2-ish million entries for 50k-ish pages means 50-ish requests), I /think/ the lister could always be running in incremental mode, rather than having to maintain two modes in the long run.

Jul 7 2021, 6:04 PM · Scheduling utilities, Origin-Pypi
olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.

Jul 7 2021, 6:02 PM · Scheduling utilities, Origin-Pypi
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Nice, that lister should perform better once that feature deployed to production.

Jul 7 2021, 5:45 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 7 2021, 5:43 PM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Adapt according to judicious remarks (thanks ;)

Jul 7 2021, 5:39 PM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 7 2021, 5:38 PM · Scheduling utilities, Origin-Pypi
ardumont added a revision to T2345: Improve handling of recurrent loading tasks in scheduler: D5978: Add a successive_visits counter to origin visit stats.
Jul 7 2021, 5:26 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Thanks!

Jul 7 2021, 5:24 PM · Scheduling utilities, Origin-Pypi
anlambert added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Nice, that lister should perform better once that feature deployed to production. I added a first batch of inline comments.

Jul 7 2021, 5:18 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 7 2021, 4:35 PM · Scheduling utilities, Origin-Pypi
ardumont added a reviewer for D5977: Make PyPI lister incremental and complete in regards to last_update: olasd.
Jul 7 2021, 4:34 PM · Scheduling utilities, Origin-Pypi

Jul 6 2021

ardumont published D5919: Start handling of recurrent loading tasks in scheduler for review.
Jul 6 2021, 3:58 PM · Scheduling utilities

Jul 1 2021

ardumont added a revision to T2345: Improve handling of recurrent loading tasks in scheduler: D5956: Introduce new scheduling policy to grab origins without last update.
Jul 1 2021, 12:34 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a revision to T2345: Improve handling of recurrent loading tasks in scheduler: D5950: journal_client: Compute next position for origin visit.
Jul 1 2021, 10:14 AM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jun 29 2021

ardumont updated the task description for T2967: Write journal client subcribed to origin_visit_status topics .
Jun 29 2021, 2:31 PM · Sprint 2021 01, Scheduling utilities

Jun 23 2021

ardumont added a revision to T2345: Improve handling of recurrent loading tasks in scheduler: D5919: Start handling of recurrent loading tasks in scheduler.
Jun 23 2021, 6:11 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a revision to T2345: Improve handling of recurrent loading tasks in scheduler: D5914: backend: Auto-generate origin visit stats upsert query.
Jun 23 2021, 3:32 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
anlambert triaged T3403: Use forge URL network location as default lister instance name as Normal priority.
Jun 23 2021, 11:47 AM · Scheduling utilities, Lister

Jun 21 2021

ardumont updated the task description for T2345: Improve handling of recurrent loading tasks in scheduler.
Jun 21 2021, 5:50 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd added a revision to T2444: Implement the scheduling policy for the recurrent visit scheduler: D5809: Direct scheduling of origin visits in celery.
Jun 21 2021, 4:04 PM · Sprint 2021 01, Scheduling utilities
olasd added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Summary of the data available in the listed_origins table, broken down by lister and "known state" of origins:

Jun 21 2021, 2:27 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jun 16 2021

olasd closed T3000: Investigate scheduler journal client discrepancies as Resolved.

That's all been working consistently for months now, closing!

Jun 16 2021, 3:35 PM · Scheduling utilities