Page MenuHomeSoftware Heritage

Deploy maven stack in production
Closed, ResolvedPublic

Description

Plan:

  • Deploy maven indexer exporter services (one per maven indexes: maven-central, ...)
  • Deploy frontend to expose maven indexer exporter results (*.fld files)
  • Deploy maven worker services: lister + loader
  • scheduler (saatchi): Restart swh-scheduler-schedule-recurrent service (so maven tasks are scheduled)
  • Add maven central so maven exporter scrapes its index and export it
  • Discover it failed (out of disk space) due to a forgotten step ¯\_(ツ)_/¯
  • T3746#82956: Prepare maven-exporter node to have enough disk to receive indices
  • Restart maven export for maven-central
  • T4330#90727: Wait for the end of it ^...
  • T4330#90730: Add maven central to the standard listing once ^ is done
  • D8421: Make sure lister workers consume the maven queues
  • T4330#90736: Checks

Event Timeline

ardumont triaged this task as Normal priority.Jun 16 2022, 10:21 AM
ardumont created this task.
ardumont changed the task status from Open to Work in Progress.Thu, Sep 8, 11:18 AM
ardumont moved this task from Backlog to in-progress on the System administration board.

Finally, export is done on maven central [1], the fld is computed [2]...
And it's also exposed, hence reachable from lister worker nodes.

[1]

Sep 08 11:24:15 maven-exporter run_maven_index_exporter.sh[146073]: * Make files modifiable by the end-user.
Sep 08 11:24:15 maven-exporter run_maven_index_exporter.sh[146073]: Docker Script execution finished on 2022-09-08 11:24:15.
Sep 08 11:24:15 maven-exporter run_maven_index_exporter.sh[146073]: INFO:__main__:Export directory has the following files:
Sep 08 11:24:15 maven-exporter run_maven_index_exporter.sh[146073]: INFO:__main__:  - _s.fld size 20699901033
Sep 08 11:24:15 maven-exporter run_maven_index_exporter.sh[146073]: INFO:__main__:Found fld file: _s.fld
Sep 08 11:24:15 maven-exporter run_maven_index_exporter.sh[146073]: INFO:__main__:Copying files to /publish/export.fld.
Sep 08 11:25:39 maven-exporter run_maven_index_exporter.sh[146073]: INFO:__main__:Script finished on 2022-09-08 11:25:39
Sep 08 11:25:42 maven-exporter run_maven_index_exporter.sh[146071]: + mv /var/www/maven_index_exporter/export.fld /var/www/maven_index_exporter/export-maven-central.fld

[2]

root@maven-exporter:~# ls -lah /var/www/maven_index_exporter/
total 1.8G
drwxr-xr-x 2 root root    6 Sep  8 11:25 .
drwxr-xr-x 4 root root 4.0K Sep  8 08:16 ..
-rwxrwxrwx 1 root root  442 Sep  8 08:36 export-atlassian-public.fld
-rwxrwxrwx 1 root root  63M Sep  8 08:18 export-clojars.fld
-rwxrwxrwx 1 root root  91M Sep  8 08:32 export-jboss.fld
-rwxrwxrwx 1 root root  20G Sep  8 10:49 export-maven-central.fld

[3]

root@maven-exporter:~# curl -s https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld | head
doc 0
  field 0
    name u
    type string
    value org.pustefixframework|pustefix-archetype-basic|0.18.0|NA|jar
  field 1
    name m
    type string
    value 1318436946815
  field 2

Schedule maven-central listing:

swhscheduler@saatchi:~$ curl -s https://repo1.maven.org/maven2/ | head -2
<!DOCTYPE html>
<html>
swhscheduler@saatchi:~$ curl -s https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld | head -2
doc 0
  field 0
swhscheduler@saatchi:~$ curl -s http://saatchi.internal.softwareheritage.org:5008/
<html>
<head><title>Software Heritage scheduler RPC server</title></head>
<body>
<p>You have reached the
<a href="https://www.softwareheritage.org/">Software Heritage</a>
scheduler RPC server.<br />
See its
<a href="https://docs.softwareheritage.org/devel/swh-scheduler/">documentation
and API</a> for more information</p>
</body>
</html>swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \
>   task add list-maven-full \
>     url=https://repo1.maven.org/maven2/ \
>     index_url=https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld
Created 1 tasks

Task 415251304
  Next run: today (2022-09-08T12:03:54.630698+00:00)
  Interval: 90 days, 0:00:00
  Type: list-maven-full
  Policy: recurring
  Args:
  Keyword args:
    index_url: 'https://maven-exporter.internal.softwareheritage.org/export-maven-central.fld'
    url: 'https://repo1.maven.org/maven2/'

Checks:

  • task has been scheduled by the scheduler runner process [1]
  • listing is being consumed by one worker [2]
  • 'maven' listed origins is steadily growing [3]
  • New 'maven' listed origins are getting scheduled for ingestion [4]
  • maven loaders are ingesting those [5]

[1]

root@saatchi:~# journalctl -xe -u swh-scheduler-runner.service  | grep -A1 maven
Sep 08 12:04:00 saatchi swh[1210080]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks list-maven-full
Sep 08 12:04:01 saatchi swh[1210080]: INFO:swh.scheduler.cli.admin.runner:Scheduled 1 tasks

[2]

root@pergamon:~# clush -b -w @prod-listers 'systemctl status swh-worker@lister' | grep maven | grep Received
Sep 08 12:14:51 worker10 python3[2300925]: [2022-09-08 12:14:51,477: INFO/MainProcess] Received task: swh.lister.maven.tasks.FullMavenLister[56d16483-d676-4b15-8a71-e4a8227e3157]

[3]

14:20:35 softwareheritage-scheduler@belvedere:5432=> select now(), visit_type, count(*) from listed_origins where lister_id='2b519d27-b0b0-442e-b340-b0d5017ea014' group by visit_type;
+-------------------------------+------------+-------+
|              now              | visit_type | count |
+-------------------------------+------------+-------+
| 2022-09-08 12:21:19.380732+00 | maven      |  1415 |
+-------------------------------+------------+-------+
(1 row)

Time: 238.210 ms
14:21:22 softwareheritage-scheduler@belvedere:5432=> select now(), visit_type, count(*) from listed_origins where lister_id='2b519d27-b0b0-442e-b340-b0d5017ea014' group by visit_type;
+-------------------------------+------------+-------+
|              now              | visit_type | count |
+-------------------------------+------------+-------+
| 2022-09-08 12:21:42.729403+00 | maven      |  1427 |
+-------------------------------+------------+-------+
(1 row)

Time: 18.782 ms

[4]

Sep 08 12:34:37 saatchi swh[1210191]: INFO:swh.scheduler.celery_backend.recurrent_visits:maven: 53 visits scheduled in queue swh.loader.package.maven.tasks.LoadMaven

[5]

root@pergamon:~# clush -b -w @prod-listers 'systemctl status swh-worker@loader_maven' | grep "Received\|succeeded" | head

Sep 08 12:38:38 worker03 python3[2164719]: [2022-09-08 12:38:38,257: INFO/ForkPoolWorker-14] Task swh.loader.package.maven.tasks.LoadMaven[b7b05fc1-f673-4da1-9b3e-ba8cdf7bc7f0] succeeded in 20.646365012042224s: {'status': 'eventful', 'snapshot_id': '6cb2ba6d63a096dc66fe5c22677be53ca9b0e09d'}
Sep 08 12:38:38 worker03 python3[2074432]: [2022-09-08 12:38:38,262: INFO/MainProcess] Received task: swh.loader.package.maven.tasks.LoadMaven[bdb9c538-0577-4df6-85d1-2b78942fa06b]
Sep 08 12:39:34 worker03 python3[2164719]: [2022-09-08 12:39:34,655: INFO/ForkPoolWorker-14] Task swh.loader.package.maven.tasks.LoadMaven[60eaa929-d9b8-4ce1-8ffd-e7b70f74d61b] succeeded in 56.391186997061595s: {'status': 'eventful', 'snap
shot_id': 'f23dfb7dd725d4f714ffdfcf38f2ea555df01d41'}
Sep 08 12:39:34 worker03 python3[2074432]: [2022-09-08 12:39:34,671: INFO/MainProcess] Received task: swh.loader.package.maven.tasks.LoadMaven[0029c978-4285-49b4-ad3d-c91ed89998ee]
Sep 08 12:39:41 worker03 python3[2164719]: [2022-09-08 12:39:41,880: INFO/ForkPoolWorker-14] Task swh.loader.package.maven.tasks.LoadMaven[bdb9c538-0577-4df6-85d1-2b78942fa06b] succeeded in 7.204680480062962s: {'status': 'eventful', 'snaps
hot_id': '445101e4715d8415d228ed6ff1c96f8d48a95229'}
Sep 08 12:39:41 worker03 python3[2074432]: [2022-09-08 12:39:41,884: INFO/MainProcess] Received task: swh.loader.package.maven.tasks.LoadMaven[6cb88082-f19e-4a08-aae6-61247c9484fd]
Sep 08 12:39:44 worker03 python3[2164719]: [2022-09-08 12:39:44,433: INFO/ForkPoolWorker-14] Task swh.loader.package.maven.tasks.LoadMaven[0029c978-4285-49b4-ad3d-c91ed89998ee] succeeded in 2.5496441069990396s: {'status': 'eventful', 'snapshot_id': '8bdc02042046d481b3448b7abf12e83339095dd1'}
Sep 08 12:39:47 worker03 python3[2074432]: [2022-09-08 12:39:47,930: INFO/MainProcess] Received task: swh.loader.package.maven.tasks.LoadMaven[066ea81a-9c0a-4dbe-9d22-4a8929281876]
Sep 08 12:39:54 worker03 python3[2164833]: [2022-09-08 12:39:54,471: INFO/ForkPoolWorker-15] Task swh.loader.package.maven.tasks.LoadMaven[6cb88082-f19e-4a08-aae6-61247c9484fd] succeeded in 6.516929866047576s: {'status': 'eventful', 'snaps
hot_id': '80e2b2758fa3133d2706ca26ceb8583204f01b0d'}
Sep 08 12:39:54 worker03 python3[2074432]: [2022-09-08 12:39:54,477: INFO/MainProcess] Received task: swh.loader.package.maven.tasks.LoadMaven[2a8298d9-1938-43b8-8d2a-acd33aeffdc4]
ardumont claimed this task.
ardumont moved this task from deployed/landed/monitoring to done on the System administration board.