Changeset View
Standalone View
swh/web/common/origin_save.py
Show First 20 Lines • Show All 616 Lines • ▼ Show 20 Lines | |||||
_accepted_save_requests_gauge = Gauge( | _accepted_save_requests_gauge = Gauge( | ||||
name=ACCEPTED_SAVE_REQUESTS_METRIC, | name=ACCEPTED_SAVE_REQUESTS_METRIC, | ||||
documentation="Number of accepted origin save requests", | documentation="Number of accepted origin save requests", | ||||
labelnames=["load_task_status", "visit_type"], | labelnames=["load_task_status", "visit_type"], | ||||
registry=SWH_WEB_METRICS_REGISTRY, | registry=SWH_WEB_METRICS_REGISTRY, | ||||
) | ) | ||||
# Metric on the delay of save code now request per status and visit_type. This is the | |||||
anlambert: Looking at the value computed below, this does not correspond to a save request duration but… | |||||
Done Inline Actionsindeed. ardumont: indeed. | |||||
# time difference between the save code now is requested and the time it got ingested. | |||||
ACCEPTED_SAVE_REQUESTS_DELAY_METRIC = "swh_web_save_requests_delay_seconds" | |||||
Not Done Inline Actionsyou need to change that identifier as it is the same as the gauge above. anlambert: you need to change that identifier as it is the same as the gauge above. | |||||
Done Inline Actionsugh right, how did i miss that?! ardumont: ugh right, how did i miss that?! | |||||
Done Inline Actionsah i know, i renamed it from the original name with counter to gauge. ardumont: ah i know, i renamed it from the original name with counter to gauge.
And i missed the clash.
| |||||
_accepted_save_requests_delay_gauge = Gauge( | |||||
name=ACCEPTED_SAVE_REQUESTS_DELAY_METRIC, | |||||
documentation="Save Requests Duration", | |||||
labelnames=["load_task_status", "visit_type"], | |||||
registry=SWH_WEB_METRICS_REGISTRY, | |||||
) | |||||
def compute_save_requests_metrics() -> None: | def compute_save_requests_metrics() -> None: | ||||
"""Compute a couple of Prometheus metrics related to | """Compute Prometheus metrics related to origin save requests: | ||||
origin save requests""" | |||||
- Number of submitted origin save requests | |||||
- Number of accepted origin save requests | |||||
- Save Code Now requests delay between request time and actual time of | |||||
successful/failed ingestion | |||||
anlambertUnsubmitted Not Done Inline Actionsactual time of visit by Software Heritage visits are created at the beginning of the loading so final status should not be mentioned here. anlambert: actual time of visit by Software Heritage
visits are created at the beginning of the loading… | |||||
anlambertUnsubmitted Not Done Inline Actionsor simply "actual time of ingestion" anlambert: or simply "actual time of ingestion" | |||||
""" | |||||
request_statuses = ( | request_statuses = ( | ||||
SAVE_REQUEST_ACCEPTED, | SAVE_REQUEST_ACCEPTED, | ||||
SAVE_REQUEST_REJECTED, | SAVE_REQUEST_REJECTED, | ||||
SAVE_REQUEST_PENDING, | SAVE_REQUEST_PENDING, | ||||
) | ) | ||||
load_task_statuses = ( | load_task_statuses = ( | ||||
Show All 12 Lines | def compute_save_requests_metrics() -> None: | ||||
for labels in labels_set: | for labels in labels_set: | ||||
_submitted_save_requests_gauge.labels(*labels).set(0) | _submitted_save_requests_gauge.labels(*labels).set(0) | ||||
labels_set = product(load_task_statuses, visit_types) | labels_set = product(load_task_statuses, visit_types) | ||||
for labels in labels_set: | for labels in labels_set: | ||||
_accepted_save_requests_gauge.labels(*labels).set(0) | _accepted_save_requests_gauge.labels(*labels).set(0) | ||||
duration_load_task_statuses = ( | |||||
SAVE_TASK_FAILED, | |||||
SAVE_TASK_SUCCEEDED, | |||||
) | |||||
for labels in product(duration_load_task_statuses, visit_types): | |||||
_accepted_save_requests_delay_gauge.labels(*labels).set(0) | |||||
for sor in SaveOriginRequest.objects.all(): | for sor in SaveOriginRequest.objects.all(): | ||||
if sor.status == SAVE_REQUEST_ACCEPTED: | if sor.status == SAVE_REQUEST_ACCEPTED: | ||||
_accepted_save_requests_gauge.labels( | _accepted_save_requests_gauge.labels( | ||||
load_task_status=sor.loading_task_status, visit_type=sor.visit_type | load_task_status=sor.loading_task_status, visit_type=sor.visit_type | ||||
).inc() | ).inc() | ||||
_submitted_save_requests_gauge.labels( | _submitted_save_requests_gauge.labels( | ||||
status=sor.status, visit_type=sor.visit_type | status=sor.status, visit_type=sor.visit_type | ||||
).inc() | ).inc() | ||||
if ( | |||||
sor.loading_task_status in (SAVE_TASK_SUCCEEDED, SAVE_TASK_FAILED) | |||||
Done Inline ActionsDo you think it's sensible enough? Or should i just add that check within the if conditional just above? ardumont: Do you think it's sensible enough? Or should i just add that check within the if conditional… | |||||
Not Done Inline ActionsOh right, you should not assert on visit_date as it can be None before a visit is found. anlambert: Oh right, you should not assert on visit_date as it can be `None` before a visit is found. | |||||
and sor.visit_date is not None | |||||
and sor.request_date is not None | |||||
): | |||||
delay = sor.visit_date.timestamp() - sor.request_date.timestamp() | |||||
_accepted_save_requests_delay_gauge.labels( | |||||
load_task_status=sor.loading_task_status, visit_type=sor.visit_type | |||||
).inc(delay) | |||||
anlambertUnsubmitted Not Done Inline ActionsI am wondering if computing the average delay time would not be a better choice here. If I understand correctly, you send to prometheus the cumulative delay times for all requests with a swh visit. Any thoughts on this ? anlambert: I am wondering if computing the average delay time would not be a better choice here.
If I… | |||||
ardumontAuthorUnsubmitted Done Inline ActionsYes, but i'll let prometheus/grafana do the computations, the rate (as screenshot above) and the avg for example. ardumont: Yes, but i'll let prometheus/grafana do the computations, the rate (as screenshot above) and… | |||||
anlambertUnsubmitted Not Done Inline ActionsAck, I guess you can get the number of requests by querying the other metrics (number of requests per visit type and statuses), right ? anlambert: Ack, I guess you can get the number of requests by querying the other metrics (number of… | |||||
ardumontAuthorUnsubmitted Done Inline ActionsI don't think we need the number of requests, we want a tendency which we can have using I've updated the diff description with screenshots to demo it as well. But yeah, i think we can compose other metrics if need be. [1] https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_timedescription ardumont: I don't think we need the number of requests, we want a tendency which we can have using
the… | |||||
anlambertUnsubmitted Not Done Inline Actions
If we want an accurate average delay time, yes we do. But this can be computed from Prometheus so current metric looks good to me. anlambert: > I don't think we need the number of requests, we want a tendency which we can have using
>… |
Looking at the value computed below, this does not correspond to a save request duration but rather to the delay between the request submission and the swh visit so ACCEPTED_SAVE_REQUESTS_DELAY_METRIC feels like a better name.