Page MenuHomeSoftware Heritage
Paste P292

estimation of gitlab ingestion remaining days
ActivePublic

Authored by ardumont on Aug 24 2018, 11:49 AM.
#+BEGIN_SRC emacs-lisp
(defun swh-percent (current total)
(* 100 (/ current (* total 1.0))))
(swh-percent (+ 232301 27052 15531) 576787) ; 47.65780088663579 %
(defun swh-speed-repo-per-hour (yesterday today)
(let ((diff (- today yesterday)))
(/ diff 24)))
(swh-speed-repo-per-hour (+ 198884 23560 11848) (+ 232301 27052 15531)); 1691 r/h
(defun swh-remains-in-days (remaining-repos speed-repo-per-hour)
(/ (/ remaining-repos speed-repo-per-hour) 24.0))
(let ((speed (swh-speed-repo-per-hour
(+ 198884 23560 11848) (+ 232301 27052 15531)))
(remain-repos (+ 272934 28940 29)))
(swh-remains-in-days remain-repos speed)); 7.416666666666667 days
#+END_SRC
#+RESULTS:
: 7.416666666666667

Event Timeline

Numbers extracted from scheduler db.

total repositories: 576787

  • Thu Aug 23 11:24:12 CEST 2018

11:24:12 softwareheritage-scheduler@db:5432=> select t.type, t.policy, t.status, tr.status, count(*) from task t full outer join task_run tr on t.id=tr.task where type='origin-update-git' and
priority='high' group by t.type, t.policy, t.status, tr.status;
┌───────────────────┬───────────┬────────────────────────┬────────────┬────────┐
│ type │ policy │ status │ status │ count │
├───────────────────┼───────────┼────────────────────────┼────────────┼────────┤
│ origin-update-git │ recurring │ next_run_not_scheduled │ eventful │ 198884 │
│ origin-update-git │ recurring │ next_run_not_scheduled │ uneventful │ 23560 │
│ origin-update-git │ recurring │ next_run_not_scheduled │ failed │ 11848 │
│ origin-update-git │ recurring │ next_run_not_scheduled │ ¤ │ 312485 │
│ origin-update-git │ recurring │ next_run_scheduled │ scheduled │ 29995 │
│ origin-update-git │ recurring │ next_run_scheduled │ started │ 15 │
└───────────────────┴───────────┴────────────────────────┴────────────┴────────┘
(6 rows)

Time: 7077.891 ms (00:07.078)

(+ 198884 23560 11848) ; 234292
(+ 312485 29995) ; 342480

  • Fri Aug 24 11:17:29 CEST 2018

11:17:29 softwareheritage-scheduler@db:5432=> select t.type, t.policy, t.status, tr.status, count(*) from task t full outer join task_run tr on t.id=tr.task where type='origin-update-git' and
priority='high' group by t.type, t.policy, t.status, tr.status;
┌───────────────────┬───────────┬────────────────────────┬────────────┬────────┐
│ type │ policy │ status │ status │ count │
├───────────────────┼───────────┼────────────────────────┼────────────┼────────┤
│ origin-update-git │ recurring │ next_run_not_scheduled │ eventful │ 232301 │
│ origin-update-git │ recurring │ next_run_not_scheduled │ uneventful │ 27052 │
│ origin-update-git │ recurring │ next_run_not_scheduled │ failed │ 15531 │
│ origin-update-git │ recurring │ next_run_not_scheduled │ ¤ │ 272934 │
│ origin-update-git │ recurring │ next_run_scheduled │ scheduled │ 28940 │
│ origin-update-git │ recurring │ next_run_scheduled │ started │ 29 │
└───────────────────┴───────────┴────────────────────────┴────────────┴────────┘
(6 rows)

Time: 5718.446 ms (00:05.718)

(+ 232301 27052 15531) ; 274884
(+ 272934 28940) ; 301874

Comments:

  • P292$10: I take a shortcut here (i do not compute the snapshot times difference and estimate it roughly to match 24 hours ;)
  • currently done repositories are the ones whose status are either: eventful, uneventful, failed.
  • remaining repositories to inject are the ones whose status are either: null, scheduled, started.