This would have caught T3502 earlier too.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 3 2021
Aug 26 2021
Aug 3 2021
The computation of those metrics will be executed in production on a regular basis, probably each day, to keep them up to date.
Jul 29 2021
Jul 23 2021
In T3127#67581, @anlambert wrote: I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.? Indeed there is something weird here as we have more than one million gitlab.com origins in database. softwareheritage=> select count(*) from origin where url like 'https://gitlab.com/%'; count --------- 1023499 (1 row) Looks like something was missed when computing lister metrics from scheduler database, this needs further investigations.Indeed, please do look into this, thanks.
Jul 22 2021
In T3127#67581, @anlambert wrote:I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?
Indeed there is something weird here as we have more than one million gitlab.com origins in database.
softwareheritage=> select count(*) from origin where url like 'https://gitlab.com/%'; count --------- 1023499 (1 row)Looks like something was missed when computing lister metrics from scheduler database, this needs further investigations.
Jul 21 2021
I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?
I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?
And we know we had some 1.5m origins for Google code, why only 700k shown here?
Instead, we could split the coverage widget into two tabs
- one giving a high level overview of the archived origins, similar to what we have now with logos and counters
- one giving the details of all forges we archived so far, displayed in a table as you suggested with relevant metrics and links to search origins for a given forge
Jul 19 2021
I think we could also get an accurate count of deposit origins (HAL, IPOL) using swh-deposit API
Jul 16 2021
Only one nit about the display. Using modal windows/popover will mean that there will be no easy way to have, as a user, the full list: one will have to click on each logo one by one, which could be quite annoying. Would it be possible to have a page with a rendering of the table above? (not sure if we want all columns, but at least the last update time and the number of origins per forge instance looks relevant and interesting to me). It coule be either in addition of what you propose (e.g., as a "coverage details" link, leading to the full page), or as a replacement of it (e.g., by making each forge icon just a link to the relevant anchor within the table on the "coverage details" page).
Thanks for this update, great work!
Jul 13 2021
Some reports of what have been done so far and some future directions regarding the display of those data in swh-web.
Jul 9 2021
Precise metrics about listed origins and their counts will be retrieved from the scheduler database, no need to backfill origins with swh-counters then, closing this.
Jun 23 2021
In T3127#66684, @anlambert wrote:As @olasd said in a previous comment, even if we compute the metrics, we will miss counters about origins not tight to a lister
(googlecode and gitorious for instance). So I am thinking again about an hybrid approach using the swh-counters metrics
implemented yersteday which gives a rough estimation on the number of origins by network location (as visit statuses are not
processed, only origins) and the scheduler metrics.
In T3127#66673, @ardumont wrote:I guess the cli to update metrics is executed periodically in production ?
I don't think that they are yet but that just got a priority increase now ;)
I guess the cli to update metrics is executed periodically in production ?
The existing scheduler metrics are probably not complete enough for all we want to display (we should review them so they are), but the swh.scheduler journal client already gathers all the information needed, so we > should be able to compute all that we need from the scheduler tables.
After more thoughts about all those metrics, we could revamp the coverage widget into two tabs:
- one tab displaying metrics about loaded origins with detailed counts by forge and links to search interface to browse them
- one tab displaying metrics about listed origins from the data extracted from the scheduler database
In T3127#66665, @ardumont wrote:For information, discussing with @olasd, he reminded me that we had already a cli entrypoint [1]
to compute stats about what we want scheduler side.What's missing implementation wise would be to expose an endpoint to actually display said information.
So, the question is, even though the implementation swh.counter started, do we really want that there
or this ^ scheduler side would be enough?
Sorry @anlambert, I was late at Monday's meeting and I completely missed this in your weekly plan, I would have pointed this out earlier.
For information, discussing with @olasd, he reminded me that we had already a cli entrypoint [1]
to compute stats about what we want scheduler side.
Jun 22 2021
In T3127#66631, @rdicosmo wrote:Nice to see this moving forward!
These entries in the counter log look suspicious, though, they are not origins:
b'atlassian@bitbucket.org' 2 b'taylorhakes@github.com' 2 b'bunnyhero@bitbucket.org' 1 b'dtrebbien@bitbucket.org' 1 b'eldargab@github.com' 1 b'git@github.com' 1 b'schierlm@git.code.sf.net' 1 b'tomakehurst@github.com' 1 b'wenshao@github.com' 1 b'zimbra-mirror@bitbucket.org' 1
Nice to see this moving forward!
Regarding this, to ease the mapping between a lister and an instance name, we may want to rework the instance names in the scheduler
model (listers table) so that the value is actually the netloc of the origin.
Great work! Awesome.
After some analysis, the data we need to properly implement this are:
- the set of lister names and their instance names in order to organize origins by forge types (gitlab, cgit, sourceforge, ...)
- a precise or estimated count for the origins listed by a given lister instance
May 28 2021
Now what's missing here (not sure how hard it is) is the mean and max ingestion time
of save code now requests (time between they being accepted and the loader task is
over)
Apr 23 2021
Apr 20 2021
Note that there is the same transient vs cumulative discrepency on the "Accepted requests" graph.
I think the "submitted requests per visit type / status" graph should be split in 2 parts. Both accepted and rejected are cumulative values that will indefinitely grow, while pending are transient value aiming at staying near zero, so it makes no sense to have them on the same graph.
Since there is already a graph dedicated to pending requests, then pending reas should just be removed from the submitted reas graph.
Note that there is the same transient vs cumulative discrepency on the "Accepted requests" graph.
In T1481#63785, @douardda wrote:I think the "submitted requests per visit type / status" graph should be split in 2 parts. Both accepted and rejected are cumulative values that will indefinitely grow, while pending are transient value aiming at staying near zero, so it makes no sense to have them on the same graph.
I think the "submitted requests per visit type / status" graph should be split in 2 parts. Both accepted and rejected are cumulative values that will indefinitely grow, while pending are transient value aiming at staying near zero, so it makes no sense to have them on the same graph.
Apr 12 2021
Apr 9 2021
I've tentatively updated the save code now dashboard [1]
with that ^ new metric deployed in staging and production instances.
Apr 8 2021
Apr 7 2021
As a heads up, we can already determine some basic metrics out of the postgres db.
process a "save code now" request (including "take snapshot now")
The archive computes its own prometheus metrics regarding save code now [1].
Also, the save code now model exposes a request_date and a visit_date [2].
So a first approximation on this would be to use those 2 fields and expose a new adapted metric.
Mar 15 2021
Mar 14 2021
Mar 4 2021
Feb 10 2021
Feb 5 2021
It seems there were some huge queries the last few days [1], the script needed to be adapted to use Long instead of Integers :
apache_logs-2021.01.14: { "error" : { "root_cause" : [ { "type" : "script_exception", "reason" : "runtime error", "script_stack" : [ "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)", "java.base/java.lang.Integer.parseInt(Integer.java:652)", "java.base/java.lang.Integer.parseInt(Integer.java:770)", "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ", " ^---- HERE" ], "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;", "lang" : "painless", "position" : { "offset" : 96, "start" : 0, "end" : 125 } } ], "type" : "script_exception", "reason" : "runtime error", "script_stack" : [ "java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)", "java.base/java.lang.Integer.parseInt(Integer.java:652)", "java.base/java.lang.Integer.parseInt(Integer.java:770)", "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ", " ^---- HERE" ], "script" : "ctx._source.bytes = ctx._source.bytes instanceof java.lang.String ? Integer.parseInt(ctx._source.bytes) : ctx._source.bytes; ctx._source.response = ctx._source.response instanceof java.lang.String ? Integer.parseInt(ctx._source.response) : ctx._source.response;", "lang" : "painless", "position" : { "offset" : 96, "start" : 0, "end" : 125 }, "caused_by" : { "type" : "number_format_exception", "reason" : "For input string: \"4633815064\"" } }, "status" : 400 }
Feb 4 2021
The opened apache indexes are currently being migrated with the P940's script.
The log parsing is ok.
An elasticsearch datasource was created on grafana so we can now create some graphs based on the logs on elasticsearch.
A simple dashboard to display some statistics based on the apache log was initiated[1], it appears the design is not as simple as in kibana and have some limitations but it still allows to have basic information centralized in grafana.
Feb 2 2021
Configuration deployed for the webapp on all servers, the logs have now the duration, which is parsed on the elasticseach entries :
Jan 29 2021
Nov 17 2020
The varnish logs should be also ingested to elasticsearch to have fine grained statistics.
Nov 3 2020
Oct 26 2020
Oct 16 2020
This can be closed now.
Sep 22 2020
I think the second point mostly happened: the storage is returning statistics to the loader, but the loaders don't generally collect them.
We've definitely improved on this (notably using proper hostnames for the instance label on prom metrics). I think we should make this task more actionable if we want to keep it open.
Apr 21 2020
I'm pretty sure this is done now ;p
Feb 15 2020
Jan 27 2020
Jan 23 2020
Deployed.
Jan 22 2020
Adapting the puppet manifest so we can discriminate issues per environment in sentry.
Vault check deployed!
Deposit check deployed!
debian package this