Page MenuHomeSoftware Heritage

Origins without last visits are scheduled too often
Closed, MigratedEdits Locked

Description

Visits from the save code now have no last_update and they get picked too often for rescheduling.
Given that, we end up having origins with too many visits [1].
Which creates other kind of side-effects (timeout when browsing all visits, T<xyz> to add when it's created).

Possible hints given by @olasd [2]

[1] F5409393

[2]

15:18 <+ardumont> olasd: regarding (origin visits, save code now - so origins without last_update -, recurrent scheduling, and for queues that are near empty all the time
15:18 <+ardumont> we got too much scheduling happening
15:19 <+ardumont> as seen in that screenshot https://forge.softwareheritage.org/F5409393
15:19 <+ardumont> any ideas out of your hat as to what we can change to limit this?
15:20 <+ardumont> (those must be specifically true for our "end-to-end check" origins especially svn and mercurial whose queues are near empty)
15:21 <+ardumont> ( anlambert: might be interested too )
15:24 <+olasd> two things here: the recurrent_scheduler should probably use the `NO_ORIGINS_SCHEDULED_BACKOFF` if the number of origins scheduled is lower than a given threshold, instead of checking if it is 0
15:25 <+olasd> (these origins seem to be completing in mere seconds, so they get picked up at every iteration of the loop)
15:25 <+olasd> (defeating this backoff)
15:27 <+olasd> and we could probably add a `last_visit_cooldown` or somesuch to grab_next_visits, to never pick up a visit for something that's been visited less than "some time ago" (compare to the `scheduled_cooldown` which only filters 
15:30 <+olasd> (either one of these should work)

Event Timeline

ardumont triaged this task as Normal priority.Feb 24 2022, 3:32 PM
ardumont created this task.
ardumont updated the task description. (Show Details)
ardumont added a subscriber: anlambert.