Page MenuHomeSoftware Heritage

swh-web/coverage: Add origin count for each referenced code provider
ClosedPublic

Authored by anlambert on Feb 4 2019, 4:53 PM.

Details

Summary

For each referenced code provider in the archive coverage list, count
the associated number of origins and display it in the coverage widget.

As this operation takes some time (between 1 and 2 minutes to get
all counts), execute it once per day and cache its results to database.
The cached counts will then be served instead of executing the
underlying long storage queries each time.

Depends on D1075

Related T1463

Diff Detail

Repository
rDWAPPS Web applications
Branch
coverage-origin-count
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4009
Build 5256: tox-on-jenkinsJenkins
Build 5255: arc lint + arc unit

Event Timeline

anlambert created this revision.Feb 4 2019, 4:53 PM
anlambert edited the summary of this revision. (Show Details)Feb 4 2019, 4:53 PM

Nitpick on the code style of swh/web/misc/coverage.py: each dict in the list should have an extra comma (so that line doesn't need to change next time we add a key-value to that dict).

swh/web/assets/src/bundles/webapp/webapp.css
434

As the extra height is for the text, it should be calc(65px + 1em). Or we could remove this property of .swh-coverage and set it to .swh-coverage-logo instead.

swh/web/misc/coverage.py
80

Add an extra slash at the end (you don't want to match https://hal.archives-ouvertes.fr.foobar.com)

vlorentz added inline comments.Feb 4 2019, 5:20 PM
swh/web/misc/coverage.py
80

Same for gitlab.com, gitlab.inria.fr, and pypi.org.

Since the storage allows regexps, we can make use of them to make sure origin_url_pattern are prefixes, eg: ^https://framagit.org/ instead of https://framagit.org/. (And something like `[a-z]+://[^/]+.googlecode.com/)

vlorentz added inline comments.Feb 4 2019, 5:24 PM
swh/web/templates/coverage.html
45–59

Why use Javascript to retrieve these counters instead of doing it while rendering the page?

anlambert edited the summary of this revision. (Show Details)Feb 5 2019, 2:06 PM
anlambert marked 3 inline comments as done.Feb 5 2019, 4:38 PM

Since the storage allows regexps, we can make use of them to make sure origin_url_pattern are prefixes, eg: ^https://framagit.org/ instead of https://framagit.org/. (And something like `[a-z]+://[^/]+.googlecode.com/)

Effectively, results will be more accurate when using regexps bu the count queries will take a little bit longer to execute.
But as count results are cached and are only executed once a day, I do not have any objection using regexps.

swh/web/assets/src/bundles/webapp/webapp.css
434

Thanks for the tip! It works great among all browsers.

swh/web/misc/coverage.py
80

ack

swh/web/templates/coverage.html
45–59

Because when count results are not in cache or when the cache expires, the count queries need to be executed again and this can take a couple of minutes.

So to avoid having to wait until the queries get executed, display the coverage page and update the count labels
once the results are available.

vlorentz marked an inline comment as done.Feb 5 2019, 4:40 PM
vlorentz added inline comments.
swh/web/templates/coverage.html
45–59

ok

anlambert updated this revision to Diff 3397.Feb 5 2019, 6:23 PM

Update:

  • address vlorentz comments
  • rework cache management for origin counts to avoid sending the same count query twice to the storage database
anlambert updated this revision to Diff 3414.Feb 6 2019, 4:43 PM

Update:

  • slightly rework cache mechanism: return previous count value instead of -1 when a new count query is currently processing
  • only display origin counts in the UI if all have been computed
vlorentz accepted this revision.Feb 7 2019, 3:30 PM
This revision is now accepted and ready to land.Feb 7 2019, 3:30 PM
anlambert updated this revision to Diff 3545.Feb 12 2019, 2:51 PM

Update: Rebase, bump storage version, add configuration key to enable/disable the origin counts in the coverage page

This revision was automatically updated to reflect the committed changes.