Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 2 2021

vsellier requested review of D5000: deposit: add request duration on access logs.
Feb 2 2021, 7:05 PM
vsellier added a revision to T2787: Improve access_logs parsing: D5000: deposit: add request duration on access logs.
Feb 2 2021, 7:05 PM · System administration, Metrics/monitoring
vsellier added a comment to T2975: Disk replacement on esnode1.
  • partition recreated :
# sfdisk -d /dev/sda | sfdisk -f /dev/sdb
  • zfs pool recreated with the wwn ids :
root@esnode1:/etc/zfs# zpool create -f elasticsearch-data -m /srv/elasticsearch/nodes -O atime=off -O relatime=on $(ls /dev/disk/by-id/wwn-*part4)
root@esnode1:/etc/zfs# zpool list
NAME                 SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
elasticsearch-data     7T   152K  7.00T        -         -     0%     0%  1.00x    ONLINE  -
  • server restarted to check everything is ok
  • allocation reactivated :
❯ export ES_NODE=192.168.100.61:9200 
❯ curl -H "Content-Type: application/json" -XPUT http://${ES_NODE}/_cluster/settings\?pretty -d '{                                                       18:11:28
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : null
    }
}'
{
  "acknowledged" : true,
  "persistent" : { },
  "transient" : { }
}
  • and in progress :
 ❯ curl -s http://$ES_NODE/_cat/health\?v; echo; curl -s http://$ES_NODE/_cat/allocation\?v\&s=node                                                       18:12:47
epoch      timestamp cluster          status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1612285969 17:12:49  swh-logging-prod green           3         3   8974 4487    2    0        0             0                  -                100.0%
Feb 2 2021, 6:15 PM · System administration
vsellier added a comment to T2958: Use all the disks on esnode2 and esnode3.
Feb 2 2021, 6:14 PM · System administration
vsellier added a comment to T2975: Disk replacement on esnode1.

The disk is replaced :

# smartctl -a /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.2-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
Feb 2 2021, 5:52 PM · System administration
vsellier added a comment to T2787: Improve access_logs parsing.

Configuration deployed for the webapp on all servers, the logs have now the duration, which is parsed on the elasticseach entries :

Feb 2 2021, 3:39 PM · System administration, Metrics/monitoring
vsellier committed rSPSITE249f747e9c35: apache: Add the request duration on access logs (authored by vsellier).
apache: Add the request duration on access logs
Feb 2 2021, 3:12 PM
vsellier committed rSPSITEcc35baf50c73: logstash: Add support an optional duration on apache logs (authored by vsellier).
logstash: Add support an optional duration on apache logs
Feb 2 2021, 3:12 PM
vsellier committed rSPSITE8e5ca3287738: webapp: improve access log parsing (authored by vsellier).
webapp: improve access log parsing
Feb 2 2021, 3:12 PM
vsellier closed D4989: Add request durations in access logs and improve logstash's integer parsing.
Feb 2 2021, 3:12 PM
vsellier committed rSPSITE908a635fff3d: webapp: code format (authored by vsellier).
webapp: code format
Feb 2 2021, 3:12 PM
vsellier closed D4974: logstash: fix first puppet run and configuration updates.
Feb 2 2021, 3:12 PM
vsellier committed rSPSITE2cf48d29a464: logstash: fix first puppet run and configuration updates (authored by vsellier).
logstash: fix first puppet run and configuration updates
Feb 2 2021, 3:12 PM
vsellier added a comment to D4989: Add request durations in access logs and improve logstash's integer parsing.

lgtm

Please also update the deposit.pp which can benefit from this as well ;)

Feb 2 2021, 10:19 AM
vsellier updated the diff for D4989: Add request durations in access logs and improve logstash's integer parsing.

Remove wrong float conversion on grok pattern

Feb 2 2021, 10:15 AM
vsellier requested review of D4989: Add request durations in access logs and improve logstash's integer parsing.
Feb 2 2021, 9:55 AM
vsellier added a revision to T2787: Improve access_logs parsing: D4989: Add request durations in access logs and improve logstash's integer parsing.
Feb 2 2021, 9:55 AM · System administration, Metrics/monitoring

Feb 1 2021

vsellier added a comment to T2975: Disk replacement on esnode1.

esnode1 is ready to be stopped :

❯ curl -s http://$ES_NODE/_cat/allocation\?v\&s=node                                                                                                             18:07:54
shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
  1482                                                                                         UNASSIGNED
     0           0b     1.7tb        5tb      6.7tb           25 192.168.100.61 192.168.100.61 esnode1
  3767        3.7tb     3.7tb        3tb      6.7tb           55 192.168.100.62 192.168.100.62 esnode2
  3713        3.6tb     3.6tb      3.1tb      6.7tb           54 192.168.100.63 192.168.100.63 esnode3

It will be left in the cluster until the work starts to keep 3 voting nodes in case of a problem on the other nodes in the interval.

Feb 1 2021, 6:10 PM · System administration
vsellier closed D4987: cgit: remove the repository urls's trailing /.
Feb 1 2021, 5:50 PM
vsellier committed rDLS8e4dd178f1df: cgit: remove the repository urls's trailing / (authored by vsellier).
cgit: remove the repository urls's trailing /
Feb 1 2021, 5:50 PM
vsellier requested review of D4987: cgit: remove the repository urls's trailing /.
Feb 1 2021, 5:37 PM
vsellier added a revision to T3013: Deploy remaining next-gen listers on staging: D4987: cgit: remove the repository urls's trailing /.
Feb 1 2021, 5:34 PM · System administration, Lister
vsellier committed rSENVc66bb65143a9: vagrant: increase staging.webapp memory (authored by vsellier).
vagrant: increase staging.webapp memory
Feb 1 2021, 3:45 PM
vsellier committed rSENV05d1a18442cc: vagrant: allow network communications between all vms (authored by vsellier).
vagrant: allow network communications between all vms
Feb 1 2021, 3:45 PM
vsellier updated the task description for T3009: Manage backfiller configuration in puppet.
Feb 1 2021, 12:10 PM · System administration
vsellier triaged T3009: Manage backfiller configuration in puppet as Normal priority.
Feb 1 2021, 12:09 PM · System administration
vsellier added a comment to T2975: Disk replacement on esnode1.

esnode1 unallocation started :

❯ export ES_NODE=192.168.100.61:9200
❯ curl -H "Content-Type: application/json" -XPUT http://${ES_NODE}/_cluster/settings\?pretty -d '{ 
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "192.168.100.61"
    }
}'
{
  "acknowledged" : true,
  "persistent" : { },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "exclude" : {
            "_ip" : "192.168.100.61"
          }
        }
      }
    }
  }
}
Feb 1 2021, 11:40 AM · System administration
vsellier closed T2944: Deploy swh-search v0.4.1, a subtask of T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit, as Resolved.
Feb 1 2021, 10:06 AM · Journal, Archive search
vsellier closed T2944: Deploy swh-search v0.4.1 as Resolved.

The backfill is done.

Feb 1 2021, 10:06 AM · System administration, Journal, Archive search
vsellier added a comment to T2912: Next generation archive counters.

This is the results for the count of the directories and revisions (the content is still running, so there is some fresh statistics) :

Feb 1 2021, 10:02 AM · Roadmap 2021, System administration, Monitoring, Web app

Jan 29 2021

vsellier requested review of D4974: logstash: fix first puppet run and configuration updates.
Jan 29 2021, 5:06 PM
vsellier added a revision to T2787: Improve access_logs parsing: D4974: logstash: fix first puppet run and configuration updates.
Jan 29 2021, 5:05 PM · System administration, Metrics/monitoring
vsellier committed rSENV8f095c62cb58: vagrant: declare logstash node (authored by vsellier).
vagrant: declare logstash node
Jan 29 2021, 4:38 PM
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

The journal_client has almost ingested the topics[1] it listens. It took some more time because a backfill of the origin_visit_status was launched for T2993.
It should be done by the end of the day.

Jan 29 2021, 2:44 PM · System administration, Journal, Archive search
vsellier moved T2939: Replace out of order disks on db1.staging and storage1.staging from Weekly backlog to Backlog on the System administration board.
Jan 29 2021, 2:34 PM · System administration
vsellier changed the status of T2787: Improve access_logs parsing from Open to Work in Progress.
Jan 29 2021, 2:34 PM · System administration, Metrics/monitoring
vsellier added a project to T2787: Improve access_logs parsing: System administration.
Jan 29 2021, 2:33 PM · System administration, Metrics/monitoring
vsellier moved T2958: Use all the disks on esnode2 and esnode3 from deployed/landed/monitoring to done on the System administration board.
Jan 29 2021, 12:21 PM · System administration
vsellier moved T2903: Test different disk configuration on esnode1 from deployed/landed/monitoring to done on the System administration board.
Jan 29 2021, 12:21 PM · System administration
vsellier moved T2905: Deploy swh-search for production from deployed/landed/monitoring to done on the System administration board.
Jan 29 2021, 12:21 PM · System administration, Journal, Archive search
vsellier moved T2920: Document staging infrastructure from in-progress to done on the System administration board.
Jan 29 2021, 12:21 PM · Documentation, System administration, Staging environment
vsellier closed T2920: Document staging infrastructure as Resolved.
  • Inventory updated to ensure all the components are associated to the staging environment
  • Staging page on the intranet updated [1]
  • Staging section on the network page [2] on the intranet updated
Jan 29 2021, 12:20 PM · Documentation, System administration, Staging environment
vsellier added a comment to T2912: Next generation archive counters.

I'm not sure to understand, the hyperloglog function is precisely used to deduplicate the messages based on their keys (at least in the poc).

Jan 29 2021, 12:10 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier added a comment to T2912: Next generation archive counters.

For information, the poc was launched on the content topic of production, the results seems to be acceptable with a count a little more important on the redis counter, probably due to some messages sent to kafka but not persisted in the database .

Jan 29 2021, 11:12 AM · Roadmap 2021, System administration, Monitoring, Web app

Jan 28 2021

vsellier added a comment to T2975: Disk replacement on esnode1.

Ticket opened via the dell support.
The disk should be delivered the Monday 1st February 2021, the DSI is informed

Jan 28 2021, 5:55 PM · System administration
vsellier changed the status of T2975: Disk replacement on esnode1 from Open to Work in Progress.
Jan 28 2021, 3:44 PM · System administration
vsellier closed T3001: Webapp is not displaying the origin type on the search results as Resolved.

The fix is deployed on webapp1 and solved the problem.

Jan 28 2021, 3:33 PM · Storage manager, Web app
vsellier closed D4963: webapp1: use the same deployment pattern than moma.
Jan 28 2021, 3:19 PM
vsellier committed rSPSITEb82b0d93c2ec: webapp1: use the same deployment pattern than moma (authored by vsellier).
webapp1: use the same deployment pattern than moma
Jan 28 2021, 3:18 PM
vsellier requested review of D4963: webapp1: use the same deployment pattern than moma.
Jan 28 2021, 3:10 PM
vsellier added a comment to T3001: Webapp is not displaying the origin type on the search results.

The storage version v0.21.1 is deployed in staging, the problem looks fixed :

❯ curl -s  https://webapp.staging.swh.network/api/1/origin/https://gitlab.com/miwc/miwc.github.io.git/visit/latest/\?require_snapshot\=true | jq ''
{
  "origin": "https://gitlab.com/miwc/miwc.github.io.git",
  "date": "2020-12-07T18:21:58.967952+00:00",
  "type": "git",
  "visit": 1,
  "status": "full",
  "snapshot": "759b36e0e3e81e8cbf601181829571daa645b5d2",
  "metadata": {},
  "origin_url": "https://webapp.staging.swh.network/api/1/origin/https://gitlab.com/miwc/miwc.github.io.git/get/",
  "snapshot_url": "https://webapp.staging.swh.network/api/1/snapshot/759b36e0e3e81e8cbf601181829571daa645b5d2/"
}
Jan 28 2021, 2:36 PM · Storage manager, Web app
vsellier closed T2988: Improve cgit lister to add last modification date of the repos as Resolved.
Jan 28 2021, 2:10 PM · CGit lister, Lister
vsellier closed D4960: Correctly return origin_visit_status.type value everywhere.
Jan 28 2021, 2:01 PM
vsellier committed rDSTO76de53cb261f: Correctly return origin_visit_status.type value everywhere (authored by vsellier).
Correctly return origin_visit_status.type value everywhere
Jan 28 2021, 2:01 PM
vsellier requested review of D4960: Correctly return origin_visit_status.type value everywhere.
Jan 28 2021, 12:23 PM
vsellier added a revision to T3001: Webapp is not displaying the origin type on the search results: D4960: Correctly return origin_visit_status.type value everywhere.
Jan 28 2021, 12:12 PM · Storage manager, Web app
vsellier added projects to T3001: Webapp is not displaying the origin type on the search results: Web app, Storage manager.
Jan 28 2021, 12:11 PM · Storage manager, Web app
vsellier changed the status of T3001: Webapp is not displaying the origin type on the search results from Open to Work in Progress.
Jan 28 2021, 12:11 PM · Storage manager, Web app
vsellier created P930 (An Untitled Masterwork).
Jan 28 2021, 10:30 AM

Jan 27 2021

vsellier added a comment to T2920: Document staging infrastructure.

This is a tryout to generate a global schema of the staging environment (P929):

Jan 27 2021, 6:09 PM · Documentation, System administration, Staging environment
vsellier created P929 Staging infrastructure.
Jan 27 2021, 6:07 PM
vsellier accepted D4956: launchpad: Actually mock the anonymous login to launchpad.

It seems to be ok :)

Jan 27 2021, 4:32 PM
vsellier committed rDSNIP0fe3238bdabf: counters: batch redis calls (authored by vsellier).
counters: batch redis calls
Jan 27 2021, 3:38 PM
vsellier committed rDSNIPe1076146c645: counters: add local counter to follow the message count (authored by vsellier).
counters: add local counter to follow the message count
Jan 27 2021, 3:38 PM
vsellier closed D4954: cgit: Don't stop the listing when a repository page is not available.
Jan 27 2021, 3:06 PM
vsellier committed rDLSf6f9f1ca28a9: cgit: Don't stop the listing when a repository page is not available (authored by vsellier).
cgit: Don't stop the listing when a repository page is not available
Jan 27 2021, 3:06 PM
vsellier added a comment to D4954: cgit: Don't stop the listing when a repository page is not available.

Thanks :)

Jan 27 2021, 3:05 PM
vsellier updated the diff for D4954: cgit: Don't stop the listing when a repository page is not available.

Use an exception to validate a repo page can be accessed

Jan 27 2021, 2:54 PM
vsellier closed D4953: cgit: Add support for last_update information during listing.
Jan 27 2021, 2:24 PM
vsellier committed rDLS91fcde83410d: cgit: Add support for last_update information during listing (authored by vsellier).
cgit: Add support for last_update information during listing
Jan 27 2021, 2:24 PM
vsellier updated the diff for D4954: cgit: Don't stop the listing when a repository page is not available.

rebase

Jan 27 2021, 2:19 PM
vsellier updated the diff for D4953: cgit: Add support for last_update information during listing.

Restore missing log when the date can't be parsed

Jan 27 2021, 2:18 PM
vsellier updated the diff for D4954: cgit: Don't stop the listing when a repository page is not available.

rebase

Jan 27 2021, 2:04 PM
vsellier updated the diff for D4953: cgit: Add support for last_update information during listing.

Remove useless variable

Jan 27 2021, 2:03 PM
vsellier requested review of D4954: cgit: Don't stop the listing when a repository page is not available.
Jan 27 2021, 12:47 PM
vsellier moved T2944: Deploy swh-search v0.4.1 from in-progress to deployed/landed/monitoring on the System administration board.
Jan 27 2021, 12:44 PM · System administration, Journal, Archive search
vsellier added a revision to T2988: Improve cgit lister to add last modification date of the repos: D4954: cgit: Don't stop the listing when a repository page is not available.
Jan 27 2021, 12:42 PM · CGit lister, Lister
vsellier updated the diff for D4953: cgit: Add support for last_update information during listing.
  • Reorder methods
  • Adapt date parsing according the review
Jan 27 2021, 12:41 PM
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

To decrease the time to recover the lag, several journal client were launched in // with :

/usr/bin/swh search --config-file /etc/softwareheritage/search/journal_client_objects.yml journal-client objects
Jan 27 2021, 10:00 AM · System administration, Journal, Archive search
vsellier committed rSPREa7c9c625d98c: Allocate more memory to search1 (authored by vsellier).
Allocate more memory to search1
Jan 27 2021, 9:40 AM

Jan 26 2021

vsellier updated the diff for D4953: cgit: Add support for last_update information during listing.

Inline unecessary indirection

Jan 26 2021, 6:52 PM
vsellier updated the diff for D4953: cgit: Add support for last_update information during listing.

Add missing test coverage

Jan 26 2021, 6:49 PM
vsellier requested review of D4953: cgit: Add support for last_update information during listing.
Jan 26 2021, 6:36 PM
vsellier added a revision to T2988: Improve cgit lister to add last modification date of the repos: D4953: cgit: Add support for last_update information during listing.
Jan 26 2021, 6:33 PM · CGit lister, Lister
vsellier changed the status of T2988: Improve cgit lister to add last modification date of the repos from Open to Work in Progress.
Jan 26 2021, 6:04 PM · CGit lister, Lister
vsellier accepted D4946: Install scheduler journal client to saatchi.

LGTM

Jan 26 2021, 12:45 PM
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Upgrading the index configuration to speedup the indexation :

% cat >/tmp/config.json <<EOF
{
  "index" : {
"translog.sync_interval" : "60s",
"translog.durability": "async",
"refresh_interval": "60s"
  }
}
EOF
% export ES_SERVER=192.168.100.81:9200
% export INDEX=origin            
% curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json 
{"acknowledged":true}%
Jan 26 2021, 10:31 AM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Production

  • puppet disabled
  • Services stopped :
root@search1:~# systemctl stop swh-search-journal-client@objects.service 
root@search1:~# systemctl stop gunicorn-swh-search
  • Index deleted and recreated
% export ES_SERVER=search-esnode1.internal.softwareheritage.org:9200
% curl -s http://$ES_SERVER/_cat/indices\?v 
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin Mq8dnlpuRXO4yYoC6CTuQw  90   1  151716299     38861934    260.8gb          131gb
% curl -XDELETE http://$ES_SERVER/origin
{"acknowledged":true}%    
% swh search --config-file /etc/softwareheritage/search/server.yml  initialize
INFO:elasticsearch:PUT http://search-esnode1.internal.softwareheritage.org:9200/origin [status:200 request:2.216s]
INFO:elasticsearch:PUT http://search-esnode3.internal.softwareheritage.org:9200/origin/_mapping [status:200 request:0.151s]
Done.
% curl -s http://$ES_SERVER/_cat/indices\?v                                        
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1          0            0     36.5kb         18.2kb
  • journal client's consumer group delete:
% export SERVER=kafka1.internal.softwareheritage.org:9092  
% ./kafka-consumer-groups.sh --bootstrap-server ${SERVER} --delete --group swh.search.journal_client
Deletion of requested consumer groups ('swh.search.journal_client') was successful.
  • journal client restarted
  • puppet enabled
Jan 26 2021, 9:39 AM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

The filter on visited origins is working correctly on staging. The has_visit flag looks good.
For example for the https://www.npmjs.com/package/@ehmicky/dev-tasks origin

{
  "_index" : "origin",
  "_type" : "_doc",
  "_id" : "019bd314416108304165e82dd92e00bc9ea85a53",
  "_score" : 60.56421,
  "_source" : {
    "url" : "https://www.npmjs.com/package/@ehmicky/dev-tasks",
    "sha1" : "019bd314416108304165e82dd92e00bc9ea85a53"
  },
  "sort" : [
    60.56421,
    "019bd314416108304165e82dd92e00bc9ea85a53"
  ]
}
swh=> select * from origin join origin_visit_status on id=origin where id=469380;
   id   |                       url                        | origin | visit |             date              | status  | metadata |                  snapshot                  | type 
--------+--------------------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:30:47.221937+00 | created |          |                                            | npm
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:41:59.435579+00 | partial |          | \xe3f24413d81fd3e9c309686fcfb6c8f5eb549acf | npm
Jan 26 2021, 9:16 AM · System administration, Journal, Archive search

Jan 25 2021

vsellier closed D4943: cgit lister: Add missing types on the init method.
Jan 25 2021, 6:59 PM
vsellier committed rDLSd62e77c1b495: cgit lister: Add missing types on the init method (authored by vsellier).
cgit lister: Add missing types on the init method
Jan 25 2021, 6:59 PM
vsellier requested review of D4943: cgit lister: Add missing types on the init method.
Jan 25 2021, 6:58 PM
vsellier added a revision to T2984: Port cgit lister to the new Lister API: D4943: cgit lister: Add missing types on the init method.
Jan 25 2021, 6:33 PM · Lister, CGit lister, Sprint 2021 01
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Staging

We are proceeding to a complete index rebuilding

Jan 25 2021, 5:44 PM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Regarding the index rebuilding process, using a naive approach with aliases with the old and the new index[1] returns duplicated results when the search is done.
Using an alias with only the old index, rebuilding a new index and switching the alias to the new index[2] can be a first approach with the default the old index will not be updated until the alias is switched to the new index.
It also requires the swh-search code is able to use different names for the read and write operations.

Jan 25 2021, 4:07 PM · System administration, Journal, Archive search
vsellier changed the status of T2920: Document staging infrastructure from Open to Work in Progress.
Jan 25 2021, 3:32 PM · Documentation, System administration, Staging environment
vsellier changed the status of T2944: Deploy swh-search v0.4.1, a subtask of T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit, from Open to Work in Progress.
Jan 25 2021, 3:32 PM · Journal, Archive search
vsellier changed the status of T2944: Deploy swh-search v0.4.1 from Open to Work in Progress.
Jan 25 2021, 3:32 PM · System administration, Journal, Archive search
vsellier renamed T2944: Deploy swh-search v0.4.1 from Deploy swh-search v0.4.1 in staging to Deploy swh-search v0.4.1.
Jan 25 2021, 3:32 PM · System administration, Journal, Archive search
vsellier accepted D4939: gitlab: Adapt celery task implementations to the new lister api.

LGTM

Jan 25 2021, 3:13 PM