Page MenuHomeSoftware Heritage

Save Code Now: End to End monitoring
Closed, ResolvedPublic

Description

  • Select a random, previously successful origin (for all supported origin types)
  • Submit via Web and via API
  • [... wait for completion ...]

Check completion:

  • search origin URL
  • check if last visit successful and visit_date >= submit_date
  • browse origin (TBD)
  • check UI status for save code now request

Event Timeline

vlorentz triaged this task as High priority.Dec 3 2019, 3:17 PM
vlorentz lowered the priority of this task from High to Normal.Dec 3 2019, 5:50 PM
vlorentz changed the task status from Open to Work in Progress.Dec 11 2019, 4:12 PM

On a related note, it may be useful to regularly report requests that did not complete (either as success or failure) in a reasonable amount of time after being scheduled.

Damn, I tried to deploy the end to end save code now check but computer says no [1]

Indeed, we got 2 webapps in production now so the service name clashes... I'll dig in
more on monday though...

In the mean time, this error should not be too blocking on pergamon.

[1]

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: A duplicate resource was found while collecting exported resources, with the type and title Profile::Icinga2::Objects::E2e_checks_savecodenow[End-to-end SaveCodeNow Check - cran-scrabble with type svn in production] on node pergamon.softwareheritage.org
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

In the mean time, this error should not be too blocking on pergamon.

Fixed.

Cleaned up the duplicated the resource in the puppetdb db. Then deactivated the
duplicated resources export on the webapp1 node. Then fixed typos which finally happened
(because of the lazyness of puppet which is great ;) And upgraded pergamon with the
latest package which includes the actual check... And finally deployed and working...
[1]

[1]

19:58 <swhbot> icinga RECOVERY: service staging Check save-code-now parmap with type git end-to-end on pergamon.softwareheritage.org is OK: SAVECODENOW OK - Save code now request for origin ('git', 'https://github.com/rdicosmo/parmap') took 10.41s and succeeded.
19:58 <+ardumont> finally
19:58 <swhbot> icinga RECOVERY: service production Check save-code-now cran-scrabble with type svn end-to-end on pergamon.softwareheritage.org is OK: SAVECODENOW OK - Save code now request for origin ('svn', 'https://github.com/cran/SCRABBLE') took 10.31s and succeeded.
19:59 <swhbot> icinga RECOVERY: service production Check save-code-now parmap with type git end-to-end on pergamon.softwareheritage.org is OK: SAVECODENOW OK - Save code now request for origin ('git', 'https://github.com/rdicosmo/parmap') took 11.01s and succeeded.

It's all a vast joke to just prevent me from going to week end (or fixing my plumbing
really, whatever comes first ;)

There is no hg origin for now but this is configurable so as soon as one one is found, this will be checked as well.

On a related note, it may be useful to regularly report requests that did not complete (either as success or failure) in a reasonable amount of time after being scheduled.

That's what this kind of check does (we have one for standard deposit and metadata only deposit ,1 for vault cooking directory and now one for each save code now per loading type the save code now supports, git, svn, hg for now).
It will warn if either threshold of ingestion is exceeded or if the visit fails altogether.

All checks green both for production/staging and hg/svn/git,