It enables to return the origin counts per visit type.
It also enables to get all available visit types dynamically in
other components like swh-web.
The underlying elasticsearch query has been tested on production
cluster and it is pretty efficient.
(swh) ✔ ~/swh/swh-environment/swh-search [count-visit-types L|⚑ 3] 18:27 $ ssh -L 9200:192.168.100.86:9200 search-esnode4.internal.softwareheritage.org Linux search-esnode4 5.10.0-0.bpo.5-amd64 #1 SMP Debian 5.10.24-1~bpo10+1 (2021-03-29) x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Wed Aug 25 16:26:42 2021 from 192.168.101.15 anlambert@search-esnode4:~$
anlambert@carnavalet:~/tmp$ time curl -X POST http://localhost:9200/origin-production/_search?pretty -H 'Content-Type: application/json' -d ' { "aggs" : { "not_blocklisted" : { "filter": { "bool": { "must_not": [ {"term": {"blocklisted": true}} ] } }, "aggs": { "visit_types": { "terms" : { "field" : "visit_types", "size": 1000 } } } } }, "size" : 0 }' { "took" : 940, "timed_out" : false, "_shards" : { "total" : 90, "successful" : 90, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 10000, "relation" : "gte" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "not_blocklisted" : { "doc_count" : 162289904, "visit_types" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "git", "doc_count" : 154006431 }, { "key" : "npm", "doc_count" : 1660597 }, { "key" : "svn", "doc_count" : 679040 }, { "key" : "hg", "doc_count" : 415270 }, { "key" : "pypi", "doc_count" : 398714 }, { "key" : "deb", "doc_count" : 72303 }, { "key" : "cran", "doc_count" : 18019 }, { "key" : "ftp", "doc_count" : 1205 }, { "key" : "deposit", "doc_count" : 1114 }, { "key" : "tar", "doc_count" : 390 }, { "key" : "nixguix", "doc_count" : 2 } ] } } } } real 0m1,168s user 0m0,012s sys 0m0,005s
Related to T3441.