Page MenuHomeSoftware Heritage

Web API: per-user accounting
Closed, MigratedEdits Locked

Description

We are using authenticated users for both individuals and for institutions, and we can assign to them non-default rate limits.
We want to be able to do some accounting of who-uses-which-endpoints-much over a long enough time period (e.g., one year).
That will be helpful in general (to answer questions likes: which endpoint is over/underused for specific use cases) and also in view of seeing who over/underuses rate limits (e.g., to identify the need of having more generous rate limits for specific use cases).

To this end we need to store per-user logs about Web API usage, with a decent retention policy, and also have a way to query them.

(As discussed yesterday with @anlambert and @vsellier, implementing this might require, as a sub-task, forwarding first django web app logs to kibana, which is not currently the case. Please file that as a separate sub-task if that's actually needed.)

Event Timeline

zack triaged this task as Low priority.May 7 2021, 9:48 AM
zack created this task.

@anlambert @vsellier: question about this, in order to document the status quo.
Currently, where are the django web app logs stored and for how long are they kept?

Currently, where are the django web app logs stored and for how long are they kept?

Django webapp logs are currently not really well managed.

They are currently dumped into a non rotated logfile on moma and only contain info related to requests with errors (meaning general access is not logged at all).

We were considering with @vsellier to redirect those logs into systemd in order to ease their ingestion by Logstash.

Logs format and levels should also be changed in order to track authenticated users access.

Django webapp logs are currently not really well managed.
They are currently dumped into a non rotated logfile on moma and only contain info related to requests with errors (meaning general access is not logged at all).
We were considering with @vsellier to redirect those logs into systemd in order to ease their ingestion by Logstash.
Logs format and levels should also be changed in order to track authenticated users access.

I recall some work on this has been done for that, currently apache logs are pushed through
logstash to elasticsearch (but i don't think the format got changed yet though).

There are boards on grafana gunicor-swh-webapp [1], gunicorn-swh-deposit [2] which exploits those logs.

[1] http://kibana0.internal.softwareheritage.org:5601/goto/5242ef5e080731a742603d76e4c8f7d7

[2] http://kibana0.internal.softwareheritage.org:5601/goto/cdce946ede05a52a927415feb74f8284

Django webapp logs are currently not really well managed.
They are currently dumped into a non rotated logfile on moma and only contain info related to requests with errors (meaning general access is not logged at all).
We were considering with @vsellier to redirect those logs into systemd in order to ease their ingestion by Logstash.
Logs format and levels should also be changed in order to track authenticated users access.

I recall some work on this has been done for that, currently apache logs are pushed through
logstash to elasticsearch (but i don't think the format got changed yet though).

There are boards on grafana gunicor-swh-webapp [1], gunicorn-swh-deposit [2] which exploits those logs.

[1] http://kibana0.internal.softwareheritage.org:5601/goto/5242ef5e080731a742603d76e4c8f7d7

[2] http://kibana0.internal.softwareheritage.org:5601/goto/cdce946ede05a52a927415feb74f8284

Those are apache logs and not django application logs.

We want to use django logs here as it enables to get info about authenticated users.

right, ok. We actually do not have any of those in production (i don't recall having seen those at all, i don't even know what that would look like).

That will be helpful in general (to answer questions likes: which endpoint is over/underused for specific use cases) and also in view of seeing who over/underuses rate limits (e.g., to identify the need of having more generous rate limits for specific use cases).

Thanks for starting this. Here are some more metrics we need to have in an easy to use dashboard:

  • how many different users are using a particular service or class of service over a given period of time (resolution can be day/week/month/year)
  • what is the behaviour of a given user or class of users over a given period of time (what services are used, which frequencies, etc.)
  • evolution of the number of distinct users over time (we want to show that adoption is growing, possibly with some nice geographical information)

For the retention policy, is there something that would prevent us from keeping the data indefinitely?

anlambert raised the priority of this task from Low to Normal.May 27 2021, 3:07 PM